多元統(tǒng)計(jì)與數(shù)據(jù)挖掘是管理科學(xué)、大數(shù)管管理與應(yīng)用的專業(yè)核心主干專業(yè)課,將為大數(shù)據(jù)時(shí)代學(xué)生構(gòu)建數(shù)據(jù)分析知識(shí)結(jié)構(gòu)奠定基礎(chǔ),提升專業(yè)能力;該課程以高等數(shù)學(xué)、概率與統(tǒng)計(jì)理論為基礎(chǔ),其核心是針對(duì)現(xiàn)實(shí)不同場(chǎng)景的實(shí)際問(wèn)題收集數(shù)據(jù),并通過(guò)統(tǒng)計(jì)方法和數(shù)據(jù)挖掘的技術(shù)進(jìn)行數(shù)據(jù)預(yù)處理、數(shù)據(jù)挖掘與知識(shí)發(fā)現(xiàn),能用相關(guān)的軟件解決數(shù)據(jù)分析問(wèn)題,提升學(xué)生對(duì)數(shù)據(jù)的敏感性與探查能力,對(duì)于培養(yǎng)學(xué)生觀察、分析實(shí)際問(wèn)題、數(shù)據(jù)分析、建模的能力與素質(zhì)有很大的幫助。 課程主要講授數(shù)據(jù)挖掘的基本理論與概念、多元統(tǒng)計(jì)的各個(gè)分支及應(yīng)用。包括DM的定義、流程、數(shù)據(jù)預(yù)處理、描述性統(tǒng)計(jì)分析、數(shù)據(jù)可視化、關(guān)聯(lián)分析、聚類、分類、異常點(diǎn)分析等幾個(gè)主要的部分,其中數(shù)據(jù)預(yù)處理包括數(shù)據(jù)的標(biāo)準(zhǔn)化、空缺值處理、噪音數(shù)據(jù)處理、數(shù)據(jù)規(guī)約;數(shù)據(jù)規(guī)約部分介紹主成分分析和因子分析;關(guān)聯(lián)分析主要講授關(guān)聯(lián)規(guī)則、Aproiri算法、相關(guān)分析;聚類部分主要講授K-means和層次聚類分析;分類部分主要講授判別分析、決策樹(shù)、基于優(yōu)化的分類、多元回歸等算法與應(yīng)用;多元統(tǒng)計(jì)部分講授多元正態(tài)分別、判別分析、方差分析等。該課程通過(guò)案例分析、上機(jī)練習(xí)提高學(xué)生應(yīng)用數(shù)據(jù)挖掘和多元統(tǒng)計(jì)的方法解決實(shí)際問(wèn)題的動(dòng)手能力。
(課程介紹英文版) Multivariate statistics and data mining are core professional courses in management science, big data management, and application. They will lay the foundation for students to build a knowledge structure of data analysis and enhance their professional abilities in the era of big data; This course is based on advanced mathematics, probability, and statistical theory. Its core is to collect data for practical problems in different real-life scenarios, and to use statistical methods and data mining techniques for data preprocessing, data mining, and knowledge discovery. Students can use relevant software to solve data analysis problems, enhance their sensitivity and exploration ability to data, and greatly help cultivate their abilities and qualities in observing, analyzing practical problems, data analysis, and modeling. The course mainly teaches the basic theories and concepts of data mining, various branches and applications of multivariate statistics. It includes several main parts such as the definition, process, data preprocessing, descriptive statistical analysis, data visualization, correlation analysis, clustering, classification, and outlier analysis of DM. Among them, data preprocessing includes data standardization, missing value processing, noise data processing, and data reduction; The data specification section introduces principal component analysis and factor analysis; Association analysis mainly teaches association rules, Apriori algorithm, and correlation analysis; The clustering section mainly teaches K-means and hierarchical clustering analysis; The classification section mainly teaches algorithms and applications such as discriminant analysis, decision trees, optimization based classification, and multiple regression; The section on multivariate statistics covers topics such as multivariate normality, discriminant analysis, and analysis of variance. This course enhances students' hands-on ability to apply data mining and multivariate statistics methods to solve practical problems through case analysis and computer practice. |