Data Mining(本科)

Course description:This course provides an accessible overview of the field of data mining and statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This course presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this course is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.


Prerequisites: Probability and Mathematical Statistics, R programming skill

Course Text:

James G, Witten D, Hastie T, et al. An introduction to statistical learning. New York: springer, 2013.

方匡南 朱建平 姜叶飞. R数据分析.  电子工业出版社.2015. 2

方匡南. 数据科学. 电子工业出版社. 2018


1.          Introduction   ch1 introduction.pdf

2.          R programming   第2.2 讲 数据读写与编程.pdf   第2.1讲 R语言入门与基本数据分析.pdf

3.          Linear Regression  ch3 linear_regression.pdf

4.          Classification

5.          Resampling Methods

6.          Linear Model Selection and Regularization

7.          Moving Beyond Linearity

8.          Tree-Based Methods

9.          Support Vector Machines

10.       Unsupervised Learning

Quiz :

       1. quiz 1   quiz1.pdf   quiz1.rar

       2.  quiz 2  quiz2_linear regression.rar

     发到邮箱datamining_under@163.com,  邮件标题:quiz#+姓名+学号

« 上一篇