[psycopg2] Optimizing Data Types of Tables

1. Check data type of table

Inadequate data type schema cause errors when we write queries. The cursor.description attribute of the pyscopg2 package outputs the data type of the database table. The cursor.description property consists of a column object consisting of name and type_code. The name has the name of the column, and the type_code has an integer value corresponding to the data type.

import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor() 

query = "SELECT * FROM table LIMIT 0;"
cursor.execute(query) 
print(cursor.description)

There is special built-in table we can use when we want to know type name corresponding to type_code. pg_catalog.pg_type table consists of a type_code(oid) and a data type name(typename). typename is the name of the data type, and oit represents an integer value of the data type.

Name	Type	Description
oid	oid	Row identifier
typename	name	The name of data type

Since pg_catalog.pg_type is also table, it is possible to write a query confirming a specific code integer value.

query = "SELECT typename FROM pg_catalog.pg_type WHERE oid=nn;"
cursor.execute(query)
type_of_nn = cursor.fetchone()[0]
print(f"The typename of type code {nn} is {type_of_nn}")

2. Optimizing data type of table

If we check the data type of the column inside the pg_catalog.pg_type table and find the cause of the error, it is necessary to change it to a data type that does not cause the error. There is also a notification name of the data type inside the pg_catalog.pg_type table. The reason why it is important to change data type appropriately is that it can increase the efficiency of disk space and prevent the speed and error of queries.

2.1 Optimizing numerical values

Postgres Internal Name	Name	Storage Size	Description	Range
int2	smallint	2 bytes	small-range integer	- 32768 ~ 32767
int4	integer	4 bytes	typical choice for integer	- 2147483648 ~ 2147483647
int8	bigint	8 bytes	large-range integer	- 9.223372e+18 ~ 9.223372e+18
numeric	decimal/numeric	variable	user-specified precision	up to 16383 digits after the decimal point
float4	real	4 bytes	variable-precision	6 decimal digits
float8	double precision	8 bytes	variable-precision	15 decimal digits
int4	serial	4 bytes	automatic incrementing integer	1 ~ 2147483647
int8	bigserial	8 bytes	large autoincrementing integer	1 ~ 9.223372e+18

Trough ALTER COLUMN ~ TYPE clause, we can change one column's type to a desired data type. The full workflow of checking and changing data type is same as below :

# Check type of all columns
import psycpg2
conn = psycopg.connect("dbname=db user=ur")
cursor = conn.cursor()

query = "SELECT * FROM [table_name] LIMIT 0;"
cursor.execute(query)
print(cursor.description)

# Check name of type occurs error
query = "SELECT typename FROM pg_catalog.pg_type WHERE oid = nn;"
cursor.execute(query)
type_of_nn = cursor.fetchone()[0]
print(type_of_nn)

# Change datatype of column
query = "ALTER TABLE table ALTER COLUMN col1 TYPE int8;"
cursor.execute(query)

# Commit changes to database
conn.commit()

# Shut transaction block
conn.close()

2.2 Optimizing text values

Name	Description
character varying(n), varchar(n)	variable-length with limit
character(n), char(n)	fixed-length, blanked padded
text	variable unlimited length

For text data, any length of string may exist inside the table. This flexibility leads to waste of space needed to store data. For space efficiency, it is possible to check the unique values of each column and size the data based on the lngest string to prevent waste. In the case of string data, the caracter size of the data type may be limited by inputting (N) into parentheses.

# Check max size of unique value 
import csv
with open([file_name].csv, 'r') as file:
    next(file)
    reader = csv.reader(file)
    unique_words = set()
    for row in reader:
        check_col = row[n]
        unique_words.add(score_phrase)
    
max_len = 0
for unique_word in unique_words :  
    if len(unique_word) >= max_len : 
        max_len = len(unique_word)
print(max_len)

# Change Type of text value 
import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()

query = "ALTER TABLE table ALTER COLUMN col1 TYPE varchar(11);"
cur.execute(query)

# Commit changes 
conn.commit()
conn.close()

2.3 Optimizing using enumerated datatype

Using enumerated data type is more efficient than checking all unique values to determine the largest length of string. Enumerated type is a data type that forms a set of named values and is an identifier that serves as a constant for the language. Enumerated data are assigned 4 bytes to each index and stored in the pg_enum table. Once stored, the enumerated data allows the Postgres engine to immediately check and store values.

import psycopg2
conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()

query = """
CREATE TYPE enum_name AS ENUM(
    'val1', 'val2', 'val3', ... 
); 

ALTER TABLE table_name 
ALTER COLUMN column TYPE enum_name
USING CAST(col_name AS enum_name);
"""
cursor.excute(query)

conn.commit()
conn.close()

For a boolean type that exists as True/False, use the boolean type name. In the case of the boolean type, it is recognized as the enumerated type inside Postgres.

ALTER TABLE [table_name]
ALTER COLUMN [col_name] TYPE boolean 
USING [col_name]::boolean

2.4 Optimizing datetime values

Like datetime objects inside Python, date data types exists inside SQL. The workflow code that delets the existing divided columns, creates a new data type column, and insert the values is as follows.

import datetime
import psycopg2
import csv

conn = psycopg2.connect("dbname=db user=ur")
cursor = conn.cursor()

query1 = "ALTER TABLE table ADD COLUMN new_datetime;" 
query2 = """
ALTER TABLE table 
DROP COLUMN col1, 
DROP COLUMN col2;
"""

cursor.execute(query1)
cursor.execute(query2) 

with open('file.csv', 'r') as file: 
    next(file) 
    reader = csv.reader(file)
    for row in reader: 
        year = int(row[1]) 
        month = int(row[2]) 
        day = int(row[3]) 
        date = datetime.date(year, month, day)
        row = row[0] + row[4:]
        row.append(date)
        query = "INSERT INTO table VALUES (%s, %s, %s, %s, %s, %s, %s);"
        cursor.execute(query, row) 
        
conn.commit() 
conn.close()

'DB > PostgreSQL' 카테고리의 다른 글

[psycopg2] Checking encoding type of table (0)	2022.09.28
[psycopg2] SQL Injection and Ways to defened it (0)	2022.09.28
[psycopg2] SQL Transactions (1)	2022.09.26
[psycopg2] Executing Postgres query in Python script (0)	2022.09.26
[psycopg2] How to use Postgres in Python (0)	2022.09.26

See the forest

[psycopg2] Optimizing Data Types of Tables

1. Check data type of table

2. Optimizing data type of table

2.1 Optimizing numerical values

2.2 Optimizing text values

2.3 Optimizing using enumerated datatype

2.4 Optimizing datetime values

'DB > PostgreSQL' 카테고리의 다른 글

티스토리툴바

[psycopg2] Optimizing Data Types of Tables

1. Check data type of table

2. Optimizing data type of table

2.1 Optimizing numerical values

2.2 Optimizing text values

2.3 Optimizing using enumerated datatype

2.4 Optimizing datetime values

'DB > PostgreSQL' 카테고리의 다른 글

'DB/PostgreSQL' Related Articles

티스토리툴바