본문 바로가기

Data Science

(74)
[Theorem] Neural Network 1. What is Neural Network Polynomial terms in linear regression and logistic regression, we have heavy features to set hypothesis. For example, if we have \(50 \times 50\) pixel images, then total pixels becomes 2500. So total features of logistic regression becomes \(n = 2500 + \alpha\) (very big, when applying polynomial term). If we have too many features, we can have overfitting problem and ..
[Theorem] Regularization 1. Regularization of Logistic Regression Because we don't know how many theta can affect overfitting, we make all theta become small. $$ \left(J(\theta )=\frac{1}{2m}\sum _{i=1}^m(h_{\theta }(x^{(i)})-y^{(i)})^2+\lambda \sum _{j=1}^m\theta _j^2)\right) $$ \(\lambda\) is called the regularization parameter which controls a trade off between two different goals. The first goal is that we would lik..
[Theorem] Overfitting 1. Overfitting in Linear Regression When degree of freedom is low, \(H(x)\) can only predict output in simple way and can't predict every case of x. This called 'underfitting' or 'high bias'. When degree of freedom is proper(not too low and not too high), predicting output is pretty well. When degree of freedom is high, model can predict output well, but can't generalize well to predict new data..
[Theorem] Logistic Regression 1. What is Classification Problem? Usually classification have two discrete output zero and one which first one is 'negative output', the other is a 'positive output'. For example, in classification for spam mail, zero means mail is not spam mail, one means mail is spam mail. $$ y \in 0, 1 $$ Multivariate classification have multiple discrete output. $$ y \in 0, 1, 2, ... $$ 2. Logistic Regressi..
[Theorem] Multivariate Linear Regression 1. Multivariate Hypothesis feet(x1) number of rooms(x2) Built Age(x3) Price of House 1412 5 30 3520 1530 3 45 2420 642 2 56 1238 \(x^{i}_{j}\) : value of feature j in ith training example \(x^i\) : the input features of the ith training example \(m\) : the number of training examples \(n\) : the number of features if \(x_{2}^{2}\), it means 45, if \(x_2\), it means [30, 45, 56] 3 dimensional vec..
[Theorem] Linear Regression 1. What is Hypothesis function? In Supervised Learning, we use 'Regression Algorithm' when we meet problem such as predicting continuous output. Using knowing data x, y in linear regression, we can predict \(y(n)\) when we have \(x(n)\) and function of \((x, y)\). Below is the function of \((x, y)\) when we have one variable. $$ H_{\theta}(x)=Y=\theta _0 + \theta _1 X $$ \(m\) : number of record..
[plotly] Layout components of plotly 1. Updating or Modfying Figures mad with Plotly Express If none of built-in plotly arguments allow us to customize the figure the way we need to, we can use the update_* and add_* methods on the plotly.graph objects.Figure object returned by the PX function to make any further modifications to the figure. 2. Usecase of those methods import plotly.express as px df = px.data.tips() fig = px.histog..
[plotly] Graph Objects in Python 1. What is Figure objects? The plotly Python package exists to create, manipulate and render graphical figures represented by data structures also referred to as figures. Figures can be represented in Python either as dicts or as instances of the plotly.graph_objects.Figure class, and are serialized as text in JSON before being passed to Plotly.js. Figure({ 'data': [{'hovertemplate': 'x=%{x} y=%..
[plotly] Plotly Express in Python 1. What is Plotly Express? The plotly.express module contains functions that can create entire figures at once, and is reffered to as Plotly Express or PX. Plotly Express is a built-in part of the plotly library, and is the recommended starting point for creating most common figures. Every Plotly Express function uses graph objects internally and returns a plotly.graph_objects.Figure instance. 2..
[plotly] iplot in Python 1. What is iplot? iplot() uses the Cufflinks wrapper over plotly that runs Matplotlib under the hood. It's seems to be the easiest way to get iteractive plots with simple one line code. 2. Differences between iplot and plot iplot is iteractive plot. Plotly takes Python code and makes beautiful looking JavaScript plots. plot coommand is Matplotlib which is more old-school. It creates static chart..
[plotly] Getting Started with Plotly in Python 1. What is Plotly? The plotly Python library is an interactive, open-source plotting library that supppors over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases. plotly also enables Python users to create beautiful interactive web-based visualization that can be displayed in Jupyter notebooks, saved to standalone HTML file..
[pandas] Useful personal function for EDA 1. Check missing records def missing(df) : missing_number = df.isnull().sum().sort_values(ascending = False) missing_percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending = False) missing_values = pd.concat([missing_number, missing_percent], axis = 1, keys = ['Missing_number', 'Missing_percent']) return missing_values 2. Grouping columns by its feature def categorize(df) : Quan..
[pandas] Cut rows based on integer To cut rows based on integer and convert its type into category, there are two method : pd.cut() : Set boundary while we cut rows based on integer pd.qcut() : Set automatic boundary while we cut rows based on integer. After using this method, final datatype of columns become Categorical class. bins = [1, 20, 30, 50, 70, 100] labels = ["미성년자", "청년", "중년", "장년", "노년"] titanic['age_cat'] = pd.cut(t..
[pandas] Set options Pandas has an options API configure and customize global behavior related to DataFrame display, date behavior and more. The most using options for dataframe are below : import pandas # Max column views pd.options.display.max_columns = 999 # Suppress scientific notation pd.set_option('display.float_format', lambda x: '%.5f' % x)