1. Check data type of table
Inadequate data type schema cause errors when we write queries. The cursor.description attribute of the pyscopg2 package outputs the data type of the database table. The cursor.description property consists of a column object consisting of name and type_code. The name has the name of the column, and the type_code has an integer value corresponding to the data type.
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
query = "SELECT * FROM table LIMIT 0;"
cursor.execute(query)
print(cursor.description)
There is special built-in table we can use when we want to know type name corresponding to type_code. pg_catalog.pg_type table consists of a type_code(oid) and a data type name(typename). typename is the name of the data type, and oit represents an integer value of the data type.
Name | Type | Description |
oid | oid | Row identifier |
typename | name | The name of data type |
Since pg_catalog.pg_type is also table, it is possible to write a query confirming a specific code integer value.
query = "SELECT typename FROM pg_catalog.pg_type WHERE oid=nn;"
cursor.execute(query)
type_of_nn = cursor.fetchone()[0]
print(f"The typename of type code {nn} is {type_of_nn}")
2. Optimizing data type of table
If we check the data type of the column inside the pg_catalog.pg_type table and find the cause of the error, it is necessary to change it to a data type that does not cause the error. There is also a notification name of the data type inside the pg_catalog.pg_type table. The reason why it is important to change data type appropriately is that it can increase the efficiency of disk space and prevent the speed and error of queries.
2.1 Optimizing numerical values
Postgres Internal Name | Name | Storage Size | Description | Range |
int2 | smallint | 2 bytes | small-range integer | - 32768 ~ 32767 |
int4 | integer | 4 bytes | typical choice for integer | - 2147483648 ~ 2147483647 |
int8 | bigint | 8 bytes | large-range integer | - 9.223372e+18 ~ 9.223372e+18 |
numeric | decimal/numeric | variable | user-specified precision | up to 16383 digits after the decimal point |
float4 | real | 4 bytes | variable-precision | 6 decimal digits |
float8 | double precision | 8 bytes | variable-precision | 15 decimal digits |
int4 | serial | 4 bytes | automatic incrementing integer | 1 ~ 2147483647 |
int8 | bigserial | 8 bytes | large autoincrementing integer | 1 ~ 9.223372e+18 |
Trough ALTER COLUMN ~ TYPE clause, we can change one column's type to a desired data type. The full workflow of checking and changing data type is same as below :
# Check type of all columns
import psycpg2
conn = psycopg.connect("dbname=db user=ur")
cursor = conn.cursor()
query = "SELECT * FROM [table_name] LIMIT 0;"
cursor.execute(query)
print(cursor.description)
# Check name of type occurs error
query = "SELECT typename FROM pg_catalog.pg_type WHERE oid = nn;"
cursor.execute(query)
type_of_nn = cursor.fetchone()[0]
print(type_of_nn)
# Change datatype of column
query = "ALTER TABLE table ALTER COLUMN col1 TYPE int8;"
cursor.execute(query)
# Commit changes to database
conn.commit()
# Shut transaction block
conn.close()
2.2 Optimizing text values
Name | Description |
character varying(n), varchar(n) | variable-length with limit |
character(n), char(n) | fixed-length, blanked padded |
text | variable unlimited length |
For text data, any length of string may exist inside the table. This flexibility leads to waste of space needed to store data. For space efficiency, it is possible to check the unique values of each column and size the data based on the lngest string to prevent waste. In the case of string data, the caracter size of the data type may be limited by inputting (N) into parentheses.
# Check max size of unique value
import csv
with open([file_name].csv, 'r') as file:
next(file)
reader = csv.reader(file)
unique_words = set()
for row in reader:
check_col = row[n]
unique_words.add(score_phrase)
max_len = 0
for unique_word in unique_words :
if len(unique_word) >= max_len :
max_len = len(unique_word)
print(max_len)
# Change Type of text value
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
query = "ALTER TABLE table ALTER COLUMN col1 TYPE varchar(11);"
cur.execute(query)
# Commit changes
conn.commit()
conn.close()
2.3 Optimizing using enumerated datatype
Using enumerated data type is more efficient than checking all unique values to determine the largest length of string. Enumerated type is a data type that forms a set of named values and is an identifier that serves as a constant for the language. Enumerated data are assigned 4 bytes to each index and stored in the pg_enum table. Once stored, the enumerated data allows the Postgres engine to immediately check and store values.
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
query = """
CREATE TYPE enum_name AS ENUM(
'val1', 'val2', 'val3', ...
);
ALTER TABLE table_name
ALTER COLUMN column TYPE enum_name
USING CAST(col_name AS enum_name);
"""
cursor.excute(query)
conn.commit()
conn.close()
For a boolean type that exists as True/False, use the boolean type name. In the case of the boolean type, it is recognized as the enumerated type inside Postgres.
ALTER TABLE [table_name]
ALTER COLUMN [col_name] TYPE boolean
USING [col_name]::boolean
2.4 Optimizing datetime values
Like datetime objects inside Python, date data types exists inside SQL. The workflow code that delets the existing divided columns, creates a new data type column, and insert the values is as follows.
import datetime
import psycopg2
import csv
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()
query1 = "ALTER TABLE table ADD COLUMN new_datetime;"
query2 = """
ALTER TABLE table
DROP COLUMN col1,
DROP COLUMN col2;
"""
cursor.execute(query1)
cursor.execute(query2)
with open('file.csv', 'r') as file:
next(file)
reader = csv.reader(file)
for row in reader:
year = int(row[1])
month = int(row[2])
day = int(row[3])
date = datetime.date(year, month, day)
row = row[0] + row[4:]
row.append(date)
query = "INSERT INTO table VALUES (%s, %s, %s, %s, %s, %s, %s);"
cursor.execute(query, row)
conn.commit()
conn.close()
'DB > PostgreSQL' 카테고리의 다른 글
[psycopg2] Checking encoding type of table (0) | 2022.09.28 |
---|---|
[psycopg2] SQL Injection and Ways to defened it (0) | 2022.09.28 |
[psycopg2] SQL Transactions (1) | 2022.09.26 |
[psycopg2] Executing Postgres query in Python script (0) | 2022.09.26 |
[psycopg2] How to use Postgres in Python (0) | 2022.09.26 |