Course description:This course provides an accessible overview of the field of data mining and statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This course presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this course is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.
Prerequisites: Probability and Mathematical Statistics, R programming skill
Reference Book:
James G, Witten D, Hastie T, et al. An introduction to statistical learning. New York: springer, 2013.
Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.
方匡南 兰伟. 数据挖掘与机器学习. 高等教育出版社.2024. 2
方匡南. 数据科学. 电子工业出版社. 2018
Contents:
1. Introduction ch1 Introduction1.pdf ch1 introduction2.pdf
2. Statistical Learning. ch2 statistical_learning.pdf
3. Linear Regression ch3 linear regression.pdf
4. Classification ch4 linear classification.pdf
5. Resampling Methods
6. Linear Model Selection and Regularization ch6 model_selection.pdf
7. Moving Beyond Linearity ch7 nonlinearity .pdf
8. Tree-Based Methods ch8 trees.pdf
9. Support Vector Machines ch9 SVM .pdf
10. Unsupervised Learning ch10 unsupervised.pdf
11. Deep Learning Ch11_Deep_Learning.pdf
Data: data.zip
案例:
人力资源分析——员工离职意愿预测 人力资源分析——员工离职意愿预测.zip
延伸阅读文献:
Multiple test:Testing covariates in high dimension liner regression with common factors.pdf
Two-part model: identification of porportionality structure with two-part model.pdf
SCAD: scad runze li.pdf
elastic-net: zou-elastic net.pdf
lasso: Regression-Shrinkage-and-Selection-via-the-Lasso.pdf
adaptive lasso: adaptive lasso.pdf
group lasso:group lasso.pdf
structured sparse logistic regression: Structured sparse logistic regression with application to lung cancer prediction using breath volatile biomarkers.pdf
integrative sparse PCA: iSPCA.pdf
Nonparametric Beta Regression:NonparametricAdditiveBetaRegre.pdf
Structured sparse SVM: Structured sparse SVM with ordered features.pdf.zip
Functional Biclustering:Biclustering Analysis of Functionals using Penalization Fusion.pdf
Clustered Distributed learning: Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.pdf
Quiz : 发到邮箱datamining_under@163.com, 邮件标题:quiz/Homwork#+姓名+学号