Data Science/Pandas
[pandas] Basic Data Exploration
See_the_forest
2022. 9. 19. 19:36
1. Basic Exploratory Data Analysis
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python probramming language.
The most important thing of pandas library is DataFrame. DataFrame have table object consisted of rows and columns.
import pandas as pd
# Save filpath to variable for easier access
melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
# read the data and store data in DataFrame titled melbourne_data
melbourne_data = pd.read_csv(mebourne_file_path)
# print a summary of the data in Melbourne data
melbourne_data.describe()
2. Interpreting Data Description
The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column :
- count : The number of non-empty values
- mean : The average (mean) value
- std : The standard deviation
- min : the minimum value
- 25% : The 25% percentile
- 50% : The 50% percentile
- 75% : The 75% percentile
- max : The maximum value