【课程】Statistical Learning(研究生)

Statistical Machine Learning(研究生)

Course description:This course provides an accessible overview of the field of data mining and statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This course presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this course is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.


Prerequisites: Probability and Mathematical Statistics, R programming skill

Class time&place: Mon:10.10-11.50  N202

                               Wed: 10.10-11.50  N202

Course Text:

1.     Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer. 

2.     James G, Witten D, Hastie T, et al. An introduction to statistical learning. New York: springer, 2013.

3.    方匡南数据科学.电子工业出版社.2018.6


1.          Introduction

2.          Statistical Learning 

3.          Linear Regression

4.          Classification

5.          Resampling Methods

6.          Linear Model Selection and Regularization

7.          Moving Beyond Linearity

8.         Decision Tree

9.         Ensemble Learning

10.   Support Vector Machines

11.       Unsupervised Learning

12.     Neural Network



ch1 introduction.pdf

ch2 statistical_learning.pdf

ch3 linear_regression.pdf

ch4 classification2.pdf

ch5 cv_boot2.pdf

ch6 model_selection.pdf

ch7 nonlinear 2.pdf

ch8 决策树.pdf

ch9. 集成学习.pdf

ch10 支持向量机.pdf

ch11 聚类分析.pdf

ch12 neural network.pdf

ch13 Convolutional Neural Network.pdf


SCAD: scad runze li.pdf

elastic-net: zou-elastic net.pdf

lasso: Regression-Shrinkage-and-Selection-via-the-Lasso.pdf

adaptive lasso: adaptive lasso.pdf

group lasso:group lasso.pdf

structured sparse logistic regression: Structured sparse logistic regression with application to lung cancer prediction using breath volatile biomarkers.pdf

Two-part model: identification of porportionality structure with two-part model.pdf

integrative sparse PCA: iSPCA.pdf








电子版作业请发送到 dataminingxmu@163.com,邮件标题: quiz/Homwork 3+姓名+学号.

期末考 : 

  期末考主要是考核project,分为两部分,即presentation 和最终的project分析报告。


  说明:(1)每个小组1-2人 (2) project题目自选,可以做方法创新也可以做应用案例

1. Presentation:  


   (2)以PPT或者latex slides形式汇报

   (3)方法创新的需要报告 研究动机,文献综述,研究方法,模拟,(如有理论证明更好),应用案例,总结

   (4)应用案例需要报告    研究动机(尤其是研究意义),文献综述,数据说明,不同方法的应用比较,总结


2. Project报告:


      (2 )  code文档(为了可重复分析结果,最好用R做,也可以接受python等)




« 上一篇 下一篇 »