Measures of location and variability. Visual techniques for presenting discrete and continuous data. Sampling distributions and the central limit theorem. Confidence Intervals (CI) for the parameters of one or two independent populations. Asymptotic CI for the mean, proportion (one sample) and the difference in means, proportions (two samples). Testing statistical hypotheses for parameters using CI. Special topics in CI and relative tests. Basic elements in testing statistical hypotheses. Likelihood Ratio Test (LRT). Asymptotic LRT, chi-square goodness of fit test (test of independence) and Kolmogorov-Smirnov (KS) test. Tests for normality. Order statistics and CI for the median and quantiles. Sign test for the median. Methods for comparing the distributions of two samples. One-way Analysis of Variance (ANOVA) for independent and dependent samples and relative tests. Basic principles of experimental design. Simple linear regression. Correlation coefficients and tests. Modelling two-dimensional variables: the bivariate normal distribution and the theory of copulas. Applications are presented using the language R.
PART A: Mathematical Programming
Art of Modeling: more than just mathematics. Introduction to Linear Programming. Linear Programming Applications (case studies in marketing, financial, business and management, etc.). The Simplex Method. Sensitivity Analysis. Duality and Post-Optimal Analysis. Other Algorithms for Linear Programming. Transportation Model and Its Variants (transshipment problem, assignment problem). Network Optimization Models (the shortest-path problem, the minimum spanning tree problem, the maximum flow problem, the minimum cost flow problem, project management with PERT/CPM, the network Simplex method). Goal Programming, Data Envelopment Analysis. Integer Linear Programming (types of Integer Linear Programming Models, modeling flexibility provided by 0-1 integer variables, the Branch-and-Bound Technique, the Cutting-Plane algorithm). Deterministic Dynamic Programming (recursive nature of Dynamic Programming computations, the shortest-path problem, Knapsack model, Equipment Replacement model, Inventory models, Workforce Size model, Traveling Salesman problem). Inventory Theory (static Economic-Order-Quantity models). Decision Analysis and Games (Utility Theory, Nash equilibrium, Cooperative Games, the bargaining set and related concepts, Algorithmic and Evolutionary Game Theory).
PART B: Numerical methods for non-linear unconstrained optimization
The problem of non-linear optimization: mathematical formulation, method categories, local and global optimum, mathematical background. No free lunch theorems for optimization. Conditions for existence of a minimum point. Iterative process, termination criteria. Line Search Methods. Step length determination strategies (exact and inexact). Inexact linear search strategies: Armijo, curvature, Wolfe, Strong Wolfe and Goldstein conditions. Backtracking line search. Methods: Steepest Descent, Newton, Line search Newton, Conjugate Gradient, Quasi Newton. Applications.
PART A: Service Engineering
Service sector is central in the life of post-industrial societies - more than 70% of the Gross National Product in most developed countries is due to this sector. Important examples are healthcare systems (hospitals), financial services (banks) and telephone and internet services. In concert with this state of affairs, there exists a growing demand for high-quality multi-disciplinary research in the field of services, as well as for a significant number of Service Engineers, namely scientifically-educated specialists that are capable of designing service systems, as well as solving multi-faceted problems that arise in their practice. The course will provide a framework for modeling service systems and techniques that are used to design, analyze, and operate service systems. Our teaching approach is data oriented: examples from various service sectors are presented at lectures and homework assignments, with the call center industry being the central application area. In this course, a service system is viewed as a stochastic network. Thus, the main theoretical framework is queuing theory, which primarily involves a large class of stochastic models. However, the subject matter is highly multi-disciplinary; hence alternative frameworks are useful as well, including ones from Statistics, Psychology, and Marketing.
PART B: Engineering Reliability
The mathematical theory of reliability has grown out of the continually increasing demands of technology. Reliability is the probability of a system performing its purpose adequately for a period of time intended under operating conditions encountered. The teaching of this part of the course concentrates on coherent system reliability, failure data analysis and maintenance policies. It will be developed the use of probability theory for the study of reliability and life time of the systems, via appropriate probabilistic models and statistical methods for studying reliability data.
Elements of theory of computation. Computational Intelligence. Machine Learning. Neural networks, fuzzy logic and evolutionary computation. Natural computing and computational intelligence. Elements of optimization for computational intelligence. Theoretical foundations and problems. No-free lunch theorem. Different aspects of optimization (combinatorial, global, local, constrained, etc.). Multi-objective optimization, problems and applications. Evolutionary computation and algorithms. Genetic algorithm. Basic principles and mechanisms (selection, crossover and mutation). Techniques of evolution. Genetic programming, grammatical evolution and evolutionary strategies. Different versions of genetic and evolutionary algorithms. Applications. Algorithms based on the social behavior of populations. Swarm intelligence. Particle swarm optimization. Basic approach and different versions. Issues related to initialization, convergence and exploration of the space of feasible solutions. Exploration and exploitation. Applications of particle swarm optimization. Models of computations based on paradigms such as ant colony, bee colony, mimetic and differential-evolution algorithms.
Neural networks and neural computation. Biological and artificial neurons. Structure, basic operation, stimulation and activation function of the neuron. Training, learning and generalization. Methods for training neural networks. Supervised training. Unsupervised training. Reinforcement learning. Applications of neural networks in science and technology. Classification and regression problems and issues. Linear and non-linear classifiers. Neural networks as classifiers optimizing a cost function. Perceptron and multi-layer perceptron. Support vector machines. Probabilistic neural networks. Recurrent neural networks, Boltzman machines, time delay networks, radial basis function neural networks. Unsupervised learning, vector quantization and Kohonen self-organizing maps. Deep learning networks and applications. Statistical learning theory. Neural network output interpretation. Specific issues on cellular neural networks, artificial immune systems and membrane computing.
(i) Short introduction to data bases and data base management systems. Relational DB and SQL. (ii) Storing methods, insertion, deletion and querying Data Bases. Ordered files, arrays, pointers, hashing, B trees and B+ trees (review). (iii) Problems and tradeoffs in big data sets in creating, updating and searching a DB. Input/output models, memory hierarchy, the Disk Access Model (DAM). Examples. (iv) Disk sorting algorithms. Merge Sort. Analysis. Divide and conquer. The DAM model, applications. Cache oblivious and non-oblivious algorithms. (v) Examples, models, analysis. Insertion/searching tradeoffs. Appropriate data structures. (vi) Application matters. Performance. (vii) Introduction/overview in data mining. Introduction to machine learning. (viii) Programming techniques for big data. MapReduce, Hadoop. Physical organization. Some algorithms on the model. (ix) Representation, LSH for texts. Distance measures. (x) The model. Sampling in runs. Data filtering. Estimation. (xi) Link analysis.
PART A: Theory
(i) Introduction to Databases. SQL. (ii) Preparing Data. The importance of data pre-processing and clearing. (iii) Missing data imputation. (iv) Introduction to supervised learning: decision trees, lazy learners, Bayesian classifiers, Ensembles of classifiers. (v) Introduction to regression: Multiple linear regression, Model Trees, Neural Networks. (vi) Dimensionality Reduction. Feature selection process. Principal Component Analysis with SVD. (vii) Un-supervised learning, Clustering. k-means algorithm. Hierarchical Clustering models, Density clustering. (viii) Association rules, Sparse matrices. (ix) Introduction to Big Data. Computational Methods for Large Data (Hadoop and MapReduce).
PART B: Laboratory
(i) Introduction to the R language for Data Science. (ii) Data Frames. Select data from a Data Frame and convert them to a Table. (iii) Introduction to SQL. Queries. Queries on multiple tables with the JOIN. (iv) Connection with R (SQLite). (v) Usage of R packages: sqldf, lattice, ggplot2, dplyr, party, C50, Rattle, mlr, randomForest, rpart, caret, factoextra, cluster, fpc, arules, arulesViz, RHadoop
Introduction to Bayesian Statistics. The basic concept of Bayesian Statistics and its main difference from classical Statistics. Advantages of Bayesian Statistics. The Bayes Theorem.
Prior distributions. Relative likelihood method, histogram method, fit distribution with a given functional form, conjugate prior distributions, non-informative prior distributions (vague, Jeffreys distributions), Bayes empirical analysis, hierarchical prior distributions.
Posterior distribution: Compute the posterior distribution using various prior distributions. Compute the posterior distribution on data sets extensively used in the bibliography
Bayesian Inference: Elements of Statistical Decision Theory and Bayesian Decision Theory: loss function, risk function, decision rules, Bayes risk, Bayes rule and Bayes decision. Bayes estimators (posterior mean and median), Credible sets, Hypothesis tests (Bayes Factor, Fit of prior distributions for simple hypotheses). Predictive distributions.
Simulation: Pseudo random number simulation, Inverse method, accept - reject method, Importance Sampling. Introduction to Markov Chain Theory, Introduction to Markov Chain Monte Carlo (MCMC) methods, Metropolis - Hastings algorithm, Gibbs Sampler, Hybrid Gibbs Sampler.
Survival and reliability analysis - Basic concepts. Censored and truncated data. Basic Functions: Reliability or Survival Function, Risk Function, Mean residual lifetime, etc.
Non-parametric estimation. Kaplan-Meier, Nelson-Aalen. Log-rank test. Graphical tests.
Parametric models and lifetime distributions. Gamma, Weibull, Gumbel, Lognormal and others. Maximum likelihood estimation. Goodness-of-fit tests.
Regression models. Proportional hazards model, accelerated time model and the semi-parametric Cox model. Diagnostic methods, Cox-Snell residuals, Schoenfeld residuals.
Special issues of survival and reliability analysis. Frailty models, longitudinal data etc.
Definition of Time Series. Components of Time Series. Methods of Time Series Analysis. Forecasting. Stationarity-Autocovarianve-Autocorellation-Partial Autocorellation. White Noise-Random Walk. Autoregressive Models AR(1), AR(2), AR(p). Moving Average Models ΜΑ(1), ΜΑ(2), ΜΑ(q). Mixed autoregressive/Moving Average Models ARMA(p,q). ARIMA(p,d,q). SARIMA (P,D,Q), x(p,d,q). Identification of ARIMA Models. Estimation of ARIMA Models, Diagnostic Test. Criterion of Model Selection. Forecasting with AR(1), MA(1), ARMA(1,1), ARMA(p,q), ARIMA(p,d,q). Confidence Interval of Forecasting.-Measures of Evaluation.
Box-Jenkins Methodology with SPSS.
Multivariate data. Data Matrices and Measurement Scales. Multivariate Random Variables and Samples. The Multivariate Normal Distribution. Sampling from Multivariate Normal and Statistical Inference.
One-way MANOVA for independent and dependent samples and related controls. Generalization of linear regression and its application in the interpretation-prediction of more than one dependent variables.
Principal Components Analysis. Finding the main components resulting from the analysis of tables (covariances and correlations respectively). Sample core components and statistical inference using large data samples.
Correspondence Analysis. Study of analysis in matrix tables (dual input tables).
Discriminant Analysis and Classification. Study of group separation rules. Hierarchical and Nonhierarchical Clustering Methods.
PART A: Theory
(i) Supervised Learning: Support Vector Machines, Ensemble Methods, Hyper-parameters optimization, Handing Imbalanced Datasets. (ii) Time Series Using Regression Methods: Model trees, Neural Networks. (iii) Semi-Supervised Learning: Self-trained models, Active Learning. (iv) Text Classification, Image Classification, Sound Classification. (v) Deep Learning: Convolutional Neural Networks, Recurrent Networks. (vi) Reinforcement Learning.
PART B: Laboratory
Python for Data Science, Python libraries: scikit-learn, orange, imbalanced-learn, pandas, statsmodels, h2o, libact, nltk, scikit-image, SpeechRecognition, tensorflow, keras, keras-rl.
Interval Analysis. The interval number. The interval arithmetic. The fundamental theorem of Interval Analysis for solving problems. The interval arithmetic for problems with many variables. Convergence of interval methods. Termination criteria. Basic interval methods. Basic characteristics of interval methods for global optimization problems. Acceleration devices. Basic interval methods for finding all global solutions of an objective function.
Data Science. Simple linear regression using interval arithmetic. Non-linear and multiple regression using intervals. Auto-regressive and/or moving average models for interval arithmetic. Principal Component Analysis (PCA) and Factor Analysis (FA) using interval variables. Statistical modelling. Structural Equation Modelling.
Applications. Application on real data, i.e. satisfaction questionnaires or financial (stock-market) data. Respondents profile. Application of Regression recursive trees in order to approximate statistical models