數(shù)據(jù)挖掘是專業(yè)選修課程,旨在為學(xué)生構(gòu)建大數(shù)據(jù)時代所需的數(shù)據(jù)分析知識體系,提升其專業(yè)技能。課程基于高等數(shù)學(xué)、概率統(tǒng)計理論和線性代數(shù)基礎(chǔ)知識,聚焦于從現(xiàn)實場景中收集數(shù)據(jù),并通過統(tǒng)計方法和數(shù)據(jù)挖掘技術(shù)進行數(shù)據(jù)預(yù)處理、分析與知識發(fā)現(xiàn)。學(xué)生將學(xué)習(xí)如何使用相關(guān)軟件解決數(shù)據(jù)分析問題的綜合能力。 課程主要講授數(shù)據(jù)挖掘的基本理論、經(jīng)典算法與前沿應(yīng)用。包括數(shù)據(jù)的定義、流程、數(shù)據(jù)預(yù)處理、描述性統(tǒng)計分析、數(shù)據(jù)可視化、關(guān)聯(lián)分析、聚類、分類、數(shù)值預(yù)測等幾個主要的部分,其中數(shù)據(jù)預(yù)處理包括數(shù)據(jù)的標(biāo)準(zhǔn)化、空缺值處理、噪音數(shù)據(jù)處理、數(shù)據(jù)規(guī)約;關(guān)聯(lián)分析主要講授關(guān)聯(lián)規(guī)則、Aproiri算法、FP-growth算法;聚類部分主要講授K-means、層次聚類分析、DBSCAN等方法;分類部分主要講授決策樹、樸素貝葉斯、支持向量機等方法;數(shù)值預(yù)測主要講授回歸方法、回歸樹與決策樹、K近鄰數(shù)值預(yù)測等方法。該課程通過案例分析、上機練習(xí)提高學(xué)生應(yīng)用數(shù)據(jù)挖掘方法解決實際問題的動手能力。 (課程英文介紹) Data Mining is a professional elective course aimed at building students' knowledge system of data analysis required in the era of big data and enhancing their professional skills. The course is based on advanced mathematics, probability and statistics theory, and basic knowledge of linear algebra, focusing on collecting data from real-world scenarios and conducting data preprocessing, analysis, and knowledge discovery through statistical methods and data mining techniques. Students will learn the comprehensive ability to use relevant software to solve data analysis problems.
The course mainly teaches the basic theory, classical algorithms, and cutting-edge applications of data mining. It includes several main parts such as data definition, process, data preprocessing, descriptive statistical analysis, data visualization, correlation analysis, clustering, classification, and numerical prediction. Data preprocessing includes data standardization, missing value processing, noise data processing, and data reduction; Association analysis mainly teaches association rules, Apriori algorithm, and FP growth algorithm; The clustering section mainly teaches methods such as K-means, hierarchical clustering analysis, DBSCAN, etc; The classification section mainly teaches methods such as decision trees, naive Bayes, support vector machines, etc; Numerical prediction mainly teaches regression methods, regression trees and decision trees, K-nearest neighbor numerical prediction, and other methods. This course enhances students' hands-on ability to apply data mining methods to solve practical problems through case analysis and hands-on exercises. |