The topics covered in this repo are:
-----------------------------------------------Unsupervised Learning--------------------------------------------------
-
Non-Parametric Methods - Density estimation, outlier detection and nearest-neighbor algorithm.
-
Bayesian Desicion - Used to find association rules in Market Basket Analysis - Bread -> Butter and Beyond - Data mining in Supermarkets
-
K-Means Clustering - Compact data(Also includes Elbow and Silhouette method to find optimal number of clusters)
-
Spectral Clustering - Connectivity based data clustering
-----------------------------------------------Supervised Learning-----------------------------------------------------
-
Decision Tress - finding the contents of each node and calculating split entropy of each split in decision tree to analyse the optimal split.
-
Multinomial Logistic Regression - performing multinomial logistic regression and also calculating the odd ratios.
-
Identifying and Profiling the Clusters in Chicago Pothole data ->Performed clustering of 18K observations using K-Means Clustering and determined number of clusters using Silhouette and Elbow charts. Profiled the clusters using Classification tree and criteria of Gini’s value.
-
Comparing Logistic Regression model and Classification tree model to predict how likely a policy-holder will file a claim using RASE, AUC metrics and ROC curve.