본문 바로가기

Data Science

(74)
[R] Non-Linear Support Vector Machine 1. What is Non-Linear SVMs? In practice, we can be faced with non-linear class boundaries. We need to consider enlarging the feature space using functions of the predictors such as quadratic and cubic terms, in order to address this non-linearity. In the case of the support vector classifier, we can also enlarge the feature space using quadratic, cubic, and even higher-order polynomial functions..
[R] Support Vector Machine 1. What is Support Vector Machine? Find a plane that separate the classes in feature space. Soften what we mean by separates and enrich and enlarge the feature space so that separation is possible. Three methods for SVM Maximum Margin Classifier Support Vector Classifier(SVC) Support Vector Machine(SVM) 1.1 Hyperplane A hyperplane in p dimensions is a flat affine subspace of dimension p-1. Gener..
[R] Tree-Based Methods : Bayesian Additive Tree 1. What is Bayesian Additive Tree? Bayesian additive regression trees(BART) is another ensemble method that uses decision trees as its building blocks. BART methods combines other ensemble methods : Each tree is constructed in a random manner as in bagging and random forest. Each tree tries to capture signals not yet accounted for by the current model as in boosting. BART method can be viewed as..
[R] Tree-Based Methods : Boosting 1. What is Boosting? Boosting is a general approach that can be applied to many statistical learning methods for regression or classification. Boosting grow trees sequentially. The boosting approach learns slowly. Given the current model, we fit a decision tree to the residuals from the model. We then add this new decision tree into the fitted function in order to update the residuals. Each of t..
[R] Tree-Based Methods : Random Forest 1. What is Random Forest? Random forests provide an improvement over bagged trees by way of a small tweak that decorrelates the trees. This reduces the variance when we average the trees. When building these decision trees, each time a split in a trees is considered, a random selection of m predictors is chosen as split candidates from the full set of p predictors. A fresh selection of m predict..
[R] Tree-Based Methods : Bagging 1. Ensemble methods An ensemble methods is an approach that combines many simple building ensemble block models in order to obtain a single and potentially very powerful model. These simple building block models are sometimes known as weak learners. Methods Bagging Random Forest Boosting Bayesian additive regression trees 2. Bagging 2.1 Boostrap methods Referring (\(X_1, ..., X_n\)) as populatio..
[R] Tree-Based Methods : Advantages and Disadvantages of Tree Advantages Trees are very easy to explain to people. Decision trees more closely mirror human decision-making than do the regression and other classification approaches. Trees can be displayed graphically, and are easily interpreted. Trees can easily handle qualitative predictors without the need to create dummy variables. Disadvantages Trees do not have the same level of predictive accuracy as ..
[R] Tree-Based Methods : Classification Decision Tree 1. What is Classification Decision Tree? Predict a qualitative response rather than a quantitative one. Predict that each observation belongs to the most commonly occuring class. Use recursive binary splitting to grow a classification tree. Use classification error rate(missclassification rate) as evaluation metrics. Splitting metrics The classification error rate : \(Error = 1 - max_{k}(\hat{p}..
[R] Tree-Based Methods : Regression Decision Tree 1. Regression Decision Tree 1.1 [Ex] Finding optimal value \(\alpha\) using CV # Import library and dataset library(tree) data(Hitters) # Training models miss
[R] Tree-Based Methods : Decision Tree 1. Tree-Based Methods Tree-based methods for regression and classification involve stratifying or segmenting the predictor space into a number of simple regions The set of splitting rules used to segment the predictor space can be summarized in a tree. Tree-based methods are simple and useful for interpretation. Bagging and Random Forests grow multiple trees which are combined to yield a single ..
[R] Non-Linear Models : Local Regression, GAM 1. What is Local Regression Local regression or Local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Local regression computes the fit at target upoint \(x_0\) using only the regression nearby training obervation. With a sliding weight function, we fit separate linear fits over the range of \(x\) by weighted least squa..
[R] Non-Linear Models : Splines 1. What is Splines? Regression splines are more flexible than polynomials and step functions, and are actually an extension of the two. The divide the range of X into K distinct regions. For each region, a polynomial function is fit to the data, however, the polynomials are constrained so that they join smoothly at the region boundaries or knots. For regression splines, instead of fitting a high..
[R] Non-Linear Models : Step Functions 1. What is Step functions? Polynomial regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. Cubic regression uses three variables $X_1$, $X_2$, $X_3$ as predictors. This is a simple way to provide a non-linear fit to the data. Step functions cut the range of a variable into K distinct regions in order to produce a qualitat..
[R] Non-Linear Models : Polynomials 1. What is Non-Linear Models Kinds of non-linear models : Polynomials Step functions Splines Local regression Generalized Additive Models(GAM) Population : \(y_i = \sum_{j=1}^p f_j(x_j) + \epsilon_i\) Inference : \(\hat{y}_i = \sum_{j=1}^p \hat{f}_j(x_j)\) 2. Polynomial Regression Not really intersted in the coefficients. More interested in fitted function values. \(\hat{f}(x_0) = \hat{\beta_0} ..
[R] Classification Problem : QDA, Naive Bayes, KNN 1. QDA(Quardratic Discriminant Analysis) QDA assumes that each class has its own covariance matrix, \(X ~ N(\mu_k, \sum _ k)\) LDA vs QDA Probability : \(P(y_i=k|x)\) X : \(N(\mu_k,\sum)\) vs \(N(\mu_k, \sum_k)\) Parameters : \(\mu_1, ..., \mu_k$ vs $\mu_1, ..., \mu_k, \sum_1, ..., \sum_k\) Num of grids : \(PK + \frac{P(P+1)}{2}$ vs $PK + K\frac{P(P+1)}{2}\) 1.1 [Ex] LDA vs QDA Classification Er..
[R] Assessment of the Performance of Classifier 1. Two types of Missclassification errors We can change the two error rates by changing the thershold from 0.5 to some other value in [0, 1]. \(\hat{Pr}(Default = Yes|Balance, Student) \ge \alpha\) \(\alpha\) is a threshold. If \(\alpha\) ↑, FN increase while FP decrease. If \(\alpha\) ↓, FN decrease while FP increase. 2. [Ex] Changes in errors along with Thresholds thresholds
[R] Classification Problem : LDA(Linear Discriminant Analysis) 1. Discriminant Analysis Bayes Theorem : \(Pr(Y=k|X=x) = \frac{\pi_k f_k(x)}{\sum_{l=1}^K \pi_l f_l(x)}\) The density for \(X\) in class \(k\) : \(f_k(x) = \frac{1}{\sqrt{2\pi}\sigma_k}e^{-\frac{(x-\mu_k)^2}{2\sigma_k^2}}\) The prior probability for class \(k\) : \(\pi_k = Pr(Y=k)\) If the distribution of the \(X\) is approximately normal, LDA and QDA is more stable. 1.1 LDA(Linear Discriminant ..
[R] Classification Problem : Logistic Regression 1. What is Classification Problem? 1.1 Classification The response variable \(Y\) : Qualitative, Categorical feature vector \(X\) ~ qualitative response \(Y\) : \(C(X) \in C\) Interested in estimating the probabilities : \(P(Y=1|X) , P(Y=0|X)\) 1.2 [Ex] Credit Card Default Data # Import Dataset library(ISLR) data(Default) summary(Default) attach(Default) # Visualization of dataset plot(income ~ ..
[R] Elastic-Net Regression 1. Regularization Methods : Elastic-Net Lasso penalty function(\(l1\)-norm) If \(p > n\), lasso selects at most \(n\) variables. Lasso is indifferent to highly correlated variables and tends to pick only one variable. Ridge penalty function(\(l2\)-norm) If cannot perform variable selection. Shrinks correlated features to each other. Elastic-net regularization Combine Lasso and Ridge \(p_{\lambda..
[R] Regularization Methods 1. Formular of Regularization Methods $$ Q_{\lambda}(\beta_0, \beta) = -l(\beta_0, \beta) + p_{\lambda}(\beta)$$ 2. The negative log-likelihood function Quantitative outcome: least square loss function Binary outcome: logistic likelihood Matched case-control outcome: conditional logistic likelihood Count outcome: Poisson likelihood Qualitative outcome: Multinomial likelihood Survival outcome: Co..