Databases and Data Mining

PART A: Theory
(i) Introduction to Databases. SQL. (ii) Preparing Data. The importance of data pre-processing and clearing. (iii) Missing data imputation. (iv) Introduction to supervised learning: decision trees, lazy learners, Bayesian classifiers, Ensembles of classifiers. (v) Introduction to regression: Multiple linear regression, Model Trees, Neural Networks. (vi) Dimensionality Reduction. Feature selection process. Principal Component Analysis with SVD. (vii) Un-supervised learning, Clustering. k-means algorithm. Hierarchical Clustering models, Density clustering. (viii) Association rules, Sparse matrices. (ix) Introduction to Big Data. Computational Methods for Large Data (Hadoop and MapReduce).

PART B: Laboratory
(i) Introduction to the R language for Data Science. (ii) Data Frames. Select data from a Data Frame and convert them to a Table. (iii) Introduction to SQL. Queries. Queries on multiple tables with the JOIN. (iv) Connection with R (SQLite). (v) Usage of R packages: sqldf, lattice, ggplot2, dplyr, party, C50, Rattle, mlr, randomForest, rpart, caret, factoextra, cluster, fpc, arules, arulesViz, RHadoop

Division: Computational Mathematics and Informatics
Recommended Literature:

Program of Studies:
Postgraduate - MCDA
Semester: B
ECTS: 7.5
Hours per week (Lec/Tut/L): 2/0/1
Code: MCDA203
Erasmus students: No