본문 바로가기

DB/PostgreSQL

[psycopg2] Exploring Postgres Database Internals

1. Exploring Internal tables

1.1 Check whole table in database

Inside Postgres engine, there is a set of internal tables used to manage the overall structue of the database. These tables are located inside the information_schema or system catalogs table set. The table includes information on data, the name of the table, and the type stored in the database.

 

For example, cursor.description is an attribute that retrieves information from an internal table. The system schema of the pg_catalog contains all the information about the table in the database. The pg_tables table typically provides the following information.

 

Name Type Description
schemaname name The name of schema containing table
tablename name The name of table
tableowner name The name of table's owner

 

import psycopg2
conn = psycopg2.connect(dbname="hud", user="hud_admin", password="hud_pwd") 
cursor = conn.cursor() 

cursor.execute("SELECT tablename FROM pg_catalog.pg_tables ORDER BY tablename;")
table_names = cursor.fetchall() 

print(f"The number of tables in database : {len(table_names)}")

for table in table_names: 
    print(table)

 

1.2 Check table in public schema

Postgres divides the internal table into three schemas. 

 

  • pg_catalog : system catalog tables
  • information_schema : information catalog tables
  • public : default schema for user created tables

 

The table generated by the user through the CREATE TABLE command statement is automatically assigned to the puclic schema. Therefore, to find the table name according to the type inside the pg_tables, use pg_catalog.pg_tables. 

 

import psycopg2
conn = psycopg2.connect(dbname="hud", user="hud_admin", password="hud_pwd") 
cursor = conn.cursor() 

cursor.execute("""
    SELECT tablename 
      FROM pg_catalog.pg_tables 
     WHERE schemaname NOT IN ('pg_catalog', 'information_schema') 
     ORDER BY tablename; 
""")

table_names = fursor.fetchall() 
print(f"The number of table user created : {table_names}")

for table in table_names: 
    print(table)

 

1.3 Check types of column in tables

When we need to check the type of internal data for two or more tables identified by the public schema, we can use for loop to write efficient code. However, the second positional argument of the cursor.execute() method must be used because SQL injection problems occur to use string formatting.

 

However, if the table name is written as string data and entered, the Postgres engine does not recognize the table name written with double quotes. The AsIs module in psycopg2.extensions package converts data so that Postgres can recognize the input string as it is in table name.

 

# Get description of each table 
import psycopg2
conn = psycopg2.connect(dbname="hud", user="hud_user", password="hud_pwd") 
cursor = conn.cursor() 

from psycopg2.extensions import AsIs
col_descriptions = {} 

for table in table_names: 
    # We've done this in Optimizing data type in table section 
    cursor.execute("SELECT * FROM %s LIMIT 0;", (AsIs(table))
    col_description[table] = cursor.description

conn.close()

 

Since cursor.description returns the data type as type_code, it is necessary to switch to a human-readable typename. The pg_type of pg_catalog schema is a table that stores type_code:typename.

 

import psycopg2
conn = psycopg2.connect(dbname="hud", user="hud_admin", password="hud_pwd") 
cursor = conn.cursor() 

cursor.execute("SELECT oid, typename FROM pg_catalog.pg_type;") 
type_mappings = {int(oid) : typename for oid, typename in cursor.fetchall()}

 

2. Readable Descriptions

In summary, the Postgres database is divided into public, system catalog, and information schema, and each schema stores table about user-generated tables, system catalog information, and table information. The pg_table in the pg_catalog schema saves the table, pg_catalog.pg_type has information about the data type, and the cursor descriptions returns the data type of the entered table.

 

The workflow of making readable descriptions is as following :

 

  1. Extract the table name from inside the schema named 'public' and store it in table_names.
  2. Create col_descriptions that store the table name and type_code according to the table.
  3. Generates a type_mappings dictionary matched by type_code : typename.
  4. According to the table name, it has the main key of 'column' and 'number of rows', and inside the column key, it creates a list that stores the column name, type, and internal size in dictionary form.

 

import psycopg2
conn = psycopg2.connect(dbname="hud", user="hud_admin", password="hud_pwd") 
cursor = conn.cursor() 

# Find table in public schema 
cursor.execute("""
    SELECT tablename 
      FROM pg_catalog.pg_tables 
     WHERE schemaname NOT IN ('pg_catalog', 'information_schema') 
     ORDER BY tablename; 
""")
table_names = cursor.fetchall() 

# Make description dictionary store table as key 
from psycopg2.extensions import AsIs
col_description = {} 
for table in table_names: 
    cursor.execute("SELECT * FROM %s LIMIT 0;", (AsIs(table[0]),))
    col_description[table] = cursor.description
    
# Make type_code:typename mapping dictionary
cursor.execute("SELECT oid, typename FROM pg_catalog.pg_type;")
type_mappings = {int(oid) : typename for oid, typename in cursor.fetchall()}

# Make final readable description 
readable_description = {} 
for table in table_names: 
    cursor.execute("SELECT COUNT(*) FROM %s, (AsIs(table[0]),))
    readable_description[table] = {
        'columns' : [{
            'name' : col.name,
            'type' : type_mappings[col.type_code], 
            'internal_size' : col.internal_size
        } for col in col_description[table],
        'number_of_rows' : cursor.fetchone()[0]]
        
    }

 

3. Store result in JSON format

Metadata extracted from the Postgres database for the public table was stored in readable_description. A format in which data stored in a dictionary type is easily read is called a JSON format. The command to store dictionary as JSON format is the json.dumps() method, and the method to store JSON format as dictionary is the json.loads().

 

import json

json_description = json.dumps(readable_description, indent=4) 
print(json_description)