본문 바로가기

Data Science/Pandas

[pandas] Basic Data Exploration

1. Basic Exploratory Data Analysis

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python probramming language.

 

The most important thing of pandas library is DataFrame. DataFrame have table object consisted of rows and columns.

 

import pandas as pd 

# Save filpath to variable for easier access
melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'

# read the data and store data in DataFrame titled melbourne_data
melbourne_data = pd.read_csv(mebourne_file_path)

# print a summary of the data in Melbourne data
melbourne_data.describe()

 

2. Interpreting Data Description

The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column :

 

  • count : The number of non-empty values
  • mean : The average (mean) value
  • std : The standard deviation
  • min : the minimum value
  • 25% : The 25% percentile
  • 50% : The 50% percentile
  • 75% : The 75% percentile
  • max : The maximum value

 

'Data Science > Pandas' 카테고리의 다른 글

[pandas] Optimizing DataFrame's Memory  (0) 2022.10.11
[pandas] Introduction to Pandas  (0) 2022.10.04
[pandas] Useful personal function for EDA  (0) 2022.09.18
[pandas] Cut rows based on integer  (0) 2022.09.18
[pandas] Set options  (0) 2022.09.18