전체 글 (150) 썸네일형 리스트형 [pandas] Introduction to Pandas 1. Importing Pandas library The Numpy library provides useful operations for performing algebraic operations and has the advantage of fast execution time. However, Numpy's ndarray has the disadvantage of being able to recognize only numerical data and storing only the same data type. Pandas library allow us to work on different data types. import pandas as pd 2. Reading csv format file using Pan.. [os] User Underlying Operating System 1. What is OS Module? The OS module provides a portable way of using operating system dependent functionality. 2. Importing OS module import os # check all attributes and methods print(dir(os)) 3. Useful methods of OS module # Get current working directory print(os.getcwd(), '\n') # Change working direcotry os.chdir('...') # List directory : List all files and directory in current working direct.. [sys] The way to import module not found 1. Using sys.path The Python sys.path is a list of directories where the Python interpreter searches for modules. When a module is claimed in a project file, it will search through each one of the directories in the list. So, we can easily load Python files in any directory by modifying element of sys.path. 1.1 Check a list of sys.path import sys print(sys.path) 1.2 Append directories to sys.pat.. [OOP] Implement Logic Gates using Class Inheritance 1. What is Class Inheritance? Inheritance is the ability for one class to be related to another class. Python child classes can inherit characteristic data and behavior from a parent class. Above structure is called Inheritacne hierarchy. This implies that lists inherit important characteristics from sequence, namely the ordering of the underlying data and the operations such as concatenation, r.. [numpy] Processing Datasets, Boolean, and Datatypes in Numpy 1. Datasets in Numpy 1.1 Load csv file into ndarray The numpy.genfromtxt() method stores numeric data inside a text file in ndarray. import numpy as np file = np.genfromtxt('file.csv', delimiter=',') fisrt_five = file[:5,:] The result stored in the ndarray is denoted by scientific notation and nan. np.nan stands for not a number and means character data inside a csv file. In addition, since each.. [numpy] Arithmetics with Numpy Arrays 1. Basic Arithmetics 1.1 Adding values The ndarray with the same length generate new ndarrays by adding elements of each index. In Python's case, the numpy packages is much more efficient in terms of algebric operations because the elements for each list must be extracted and added. The reason why the numpy object is much faster than the Python list that the numpy package is written in the C lan.. [numpy] Basic operations of Numpy array 1. What is Numpy? Numpy is a general-purpose array-processing package. It provides a high-performance multidimensional array object and tools for working with these arrays.The Numpy is an abbreviation for Numpy Python and is a Python package used in algebric calculations. import numpy as np 2. Array 2.1 Creating an Array object Numpy's core data structure is ndarray, which has a similar structur.. [Theorem] Differences between conn.commit and conn.autocommit In PostgreSQL, all user activity occurs in a transaction. If autocommit mode is enabled, each SQL statement forms a single transcation on its on. In this case, no rollback is possible. Source from : https://stackoverflow.com/questions/51880309/what-does-autocommit-mean-in-postgresql-and-psycopg2 https://stackoverflow.com/questions/57761557/is-there-any-reason-not-to-use-autocommit https://stacko.. [Theorem] Difference between VACUUM and VACUUM FULL The vacuum of the PostgreSQL system is a routine maintenance process. The differences between VACUUM and VACUUM FULL is as follows : VACUUM VACUUM FULL 1. It only delets the dead tuples in the table, and there is no real physical deletion; 2. During the vacuum process, the data table can be accessed normaly. 1. Physically delete dead tuples in the table to free up space to the operating system; .. [psycopg2] Vacuuming Postgres Database 1. Speed problem after deleting all rows in table In the case of UPDATE and DELETE, one of the representative desctrutive queries, if we upload it back to the table after executing the cluases, we can see that the contents of the data have not chagned, but the upload speed has been slower than before. import psycopg2 conn = psycopg2.connect(dbname='db', user='ur', password='pw') ccursor = conn.c.. [psycopg2] More efficient Index Scan 1. Bitmap Index/Heap Scan Index Scan has the advantage of showing much faster efficiency that Seq Scan. In SQL, there are cases where a WHERE cluase uses a compound conditional statement rather than a single conditional statement. At this time, if one column is designated as INDEX and the other column is not designated as INDEX, it can be seen that the nested structure of Bitmap Index Scan and B.. [psycopg2] Efficient query with Index Scan 1. What is Index Scan? import psycopg2 import json conn = psycopg2.connect(dbname="db", user="ur", password="pw") cursor = conn.cursor() cursor.execute("EXPLAIN (ANALYZE, FORMAT josn) SELECT * FROM table WHRE id = nn;") query_plan = cursor.fetchone() print(json.dumps(query_plan, indent=2)) # Result of query_plan in json [ [ { "Plan": { "Node Type": "Index Scan", "Parallel Aware": false, "Scan Di.. [psycopg2] Debugging Postgres Queries 1. Exection of queries The steps to perform an SQL query statement are as follows: Query parsing : The query statement is divided into correct grammar units and transformed into query trees if there are no errors. Query rewrite : If there is a special rule identified in the system catalog, reapply it to the query tree. Query planning : Planner and Optimizer optimize query tree and send it to exe.. [OOP] Creating Fraction Class using OOP 1. Purpose of creating Fraction Class Our purpose on this plot is to create a Python class called "Fraction". The operations for the Fraction class is as follows : Add, Subtract, Multiply, and Divide. And also, this class can show fractions using the standard "slash" form. 2. Defining Class Fraction needs two pieces of data, the numerator and the denominator. In OOP, all classes should provide c.. [psycopg2] Exploring Postgres Database Internals 1. Exploring Internal tables 1.1 Check whole table in database Inside Postgres engine, there is a set of internal tables used to manage the overall structue of the database. These tables are located inside the information_schema or system catalogs table set. The table includes information on data, the name of the table, and the type stored in the database. For example, cursor.description is an a.. [psycopg2] Database Management 1. Database Management 1.1 Create Database Postgres can create multiple databases inside the server. The databse can be created with CREATE DATABASE clause, and the OWNER option is used to specify who owns the databse. If the OWNER option is not specified, the current user user is the owner. import psycopg2 conn = psycopg2.connect(dbname="db", user="ur") conn.autocommit = True cursor = conn.curs.. [psycopg2] User Management 1. Connection to Postgres Server Unlike SQLite3, Postgres is a client-server model, so different users access databases with different priveleges. For the security of the user, Postgres can access the database server by setting the password. A superuser can adjust the permissions of other users while having full permissions on the Postgres server, similar to the sudo permissions of the file syst.. [URL] Useful URLs for Data Engineering Title Description URL 1 Data Engineering Wiki A url link for data warehousing, python learning, data modeling. https://dataengineering.wiki/Learning+Resources 2 Data Glossary Explore Data Engineering Concepts https://glossary.airbyte.com/term/data-engineering-concepts/ 3 Python Standard modules Inroduce the Python standard library https://docs.python.org/3/library/ 4 5 6 7 [psycopg2] Loading and Extracting Data with Tables 1. Ways of Loading Data with Tables 1.1 Using mogrify method In psycopg2, external data can be imported into cursor.execute through the %s placeholders. When the placeholder is used in the cursor.execute() method, the value in Python form is automatically changed into Postgres type. This change occurs through the cursor.mogrify() method. The object converted through cursor.mogrify is a bytes obj.. [psycopg2] Checking encoding type of table The encoding are support in PostgreSQL allows us to store text in a variety of encodings, including single-bytes encoding and multiple-byte encoding. All supported encoding can be used transparently by clients and server. We can find encoding of table in PostgresSQL by inspecting the connection.encoding parameters. To change encoding, there is a method called connection.set_client_encoding(). im.. 이전 1 2 3 4 5 6 7 8 다음