1. Ways of Loading Data with Tables
1.1 Using mogrify method
In psycopg2, external data can be imported into cursor.execute through the %s placeholders. When the placeholder is used in the cursor.execute() method, the value in Python form is automatically changed into Postgres type. This change occurs through the cursor.mogrify() method. The object converted through cursor.mogrify is a bytes object.
# make mogrified data using mogrify method
mogrified_data = cur.mogrify("(%s, ..., %s)", tuples)
# decode mogrified_data
mogrified_string = mogrified_data.decode(conn.encoding)
Mogrified values allow us to enter multiple rows in the database table at once.
import csv
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
with open("file.csv", mode = 'r') as file :
next(file)
reader = csv.reader(file)
rows = [row for row in reader]
mogrified_rows = [cursor.mogrify("(%s, %s, %s, %s, %s, %s, %s, %s, %s)", row) for row in rows]
decoded_rows = [row.decode(conn.encoding) for row in mogrified_rows]
insert_string = ",".join(decoded_rows)
cursor.execute("INSERT INTO table VALUES " + insert_string + ";")
conn.commit()
conn.close()
1.2 Using copy method
Altough cursor.mogrify() method is faster than the way placeholder is used, it eventually reads and decodes each row one by one, and then performs a join operation to perform a query statement. The cursor.copy_from() method can move the entier data faster than the cursor.mogrify() method for copying all rows.
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
with open("file.csv", mode = 'r') as file :
next(file)
cursor.copy_from(file, "table", sep = ',')
When the cursor.copy_from() method reads a csv file, it recognizes it as a segnemted column if there is an "," in each row, so errors can occur. cursor.copy_expert() can faciliate data loading by writing detailed options.
import csv
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
with open('file.csv', mode = 'r') as file :
cursor.copy_expert("COPY table FROM STDIN CSV WITH HEADER", file)
conn.commit()
conn.close()
COPY [table] FROM STDIN WITH CSV DELIMITER [ ] HEADER
- [table] : The name of the table that we want to load data on
- FROM : Specify that data should be entered from an external location
- STDIN : It symbolizes that it is transmitted over a connection from a client or server
- WITH : List of options separated by spaces after
- DELIMITER : Separator
- HEADER : Specify that the first row is header
Similarly, the opposite operation can be performed.
1.3 Time Comparison of loading data
- To enter each row as open using indicators
- To upload all rows at once using cursor.mogrify()
- To upload the entire data to a table at once using cursor.copy_expert()
import csv
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
# Multiple single insert statements
def multiple_inserts():
with open("file.csv", "r") as f:
next(f)
reader = csv.reader(f)
for row in reader:
cur.execute("INSERT INTO table VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s);", row)
conn.rollback()
# Multiple mogrify insert
def mogrified_insert():
with open("file.csv", "r") as f:
next(f)
reader = csv.reader(f)
mogrified = [
cur.mogrify("(%s, %s, %s, %s, %s, %s, %s, %s, %s)", row).decode(conn.encoding)
for row in reader
]
mogrified_values = ",".join(mogrified)
cur.execute("INSERT INTO table VALUES " + mogrified_values + ";")
conn.rollback()
# Copy expert method
def copy_expert():
with open("file.csv", "r") as f:
cur.copy_expert("COPY table FROM STDIN WITH CSV HEADER;", f)
conn.rollback()
# Add your code under
import timeit
time_multiple_inserts = timeit.timeit(multiple_inserts, number = 1)
time_mogrified_insert = timeit.timeit(mogrified_insert, number = 1)
time_copy_expert = timeit.timeit(copy_expert, number = 1)
print(f"Consumed times using multiple insert : {time_multiple_inserts}")
print(f"Consumed times using mogrified insert : {time_mogrified_insert}")
print(f"Consumed times using copy expert : {time_copy_expert}")
"""
Consumed times using multiple insert : 3.008526957128197
Consumed times using mogrified insert : 1.0372675461694598
Consumed times using copy expert : 0.15513571677729487
"""
2. Ways of Extracting Data with Tables
cursor.copy_expert() can copy external data into the database, but it can also store tables in the database as external data. At this time, the FROM argument may be changed to TO and STDIN may be changed to STDOUT and used.
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
with open("file.csv", "w") as file :
cursor.copy_expert("COPY table TO STDOUT WITH CSV HEADER;", file)
conn.close()
3. Transfering Data between Tables
There are two methods of transmitting data between two tables. The first is to create a temporary file to preserve data from an exsiting table, and save the data to an external repository with cursor.copy_expert("COPY TO STDOUT") and offload to the table using cursor.copy_expert("COPY FROM STDIN") method.
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
# the query for you to create the empty table copy
create_string = "CREATE TABLE table_copy ( ... );"
with open('temp.csv', 'w') as file :
cursor.copy_expert("COPY ign_reviews TO STDOUT WITH CSV HEADER;", file)
cursor.execute(create_string)
with open('temp.csv', 'r') as file :
cursor.copy_expert("COPY ign_reviews_copy FROM STDIN WITH CSV HEADER;", file)
conn.commit()
conn.close()
However, this method causes time and space efficiency problems when there are more than millions of rows of data. The second method is to transfer data between tables inside PostgreSQL. In the INSERT INTO clause, VALUES may be written as a subquery and data of a table to be transmitted may be transmitted.
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
cursor.execute("CREATE TABLE table_copy ( ... ));")
cursor.execute("""
INSERT INTO table_copy (col1, col2, col3)
SELECT col1, col2, col3 FROM table;
""")
conn.commit()
conn.close()
'DB > PostgreSQL' 카테고리의 다른 글
[psycopg2] Database Management (0) | 2022.09.29 |
---|---|
[psycopg2] User Management (0) | 2022.09.29 |
[psycopg2] Checking encoding type of table (0) | 2022.09.28 |
[psycopg2] SQL Injection and Ways to defened it (0) | 2022.09.28 |
[psycopg2] Optimizing Data Types of Tables (0) | 2022.09.27 |