Insight, Impact, Innovation
HomeWhat we doIP Solution AcceleratorsData Science

Data Science

Descriptive Modelling and Business Insights

Descriptive modelling aims to find statistical interrelationships and structure in data so that businesses can understand what drives outcomes of interest and what levers and optimizations are at its disposal to enhance those outcomes.

  1. Structural equation models, generalised structural equation models, hierarchical and multilevel mod- els, random coefficient models, mediation analysis, confirmatory factor analysis, item-response theory, growth curve models, treatment effect models, survival analysis, time-to-conversion and attrition mod- elling
  2. Structural time series models, state-space models, vector autoregressions and structural vector au- toregressions, impulse response functions, cointegrated and cotrending models, Bayesian time series models
  3. Exploratory data analysis and unsupervised learning, linear and nonlinear dimensionality reduction and clustering, principal components and factor analysis, non-negative matrix factorization, one-class classifcation, outlier and novelty detection
  4. Probabilistic Graphical Models and Bayesian Networks, approximate and variational inference

Suggested readings: Koller and Friedman (2009), Kline (2011), Barber (2012), Harrell (2001)

Predictive Modelling and Real-Time Analytics

Predictive modelling aims to fit precise and flexible probabilistic relationships to structured and unstructured data which can be used as a basis for business decision-making.

  1. Supervised learning: Linear and nonlinear regression-based models, generalised linear (mixed) models, nonparametric and adaptive models (Restricted Boltzmann Machines, deep learning, neural networks, recursive partitioning methods, polynomial, spline and wavelet regression), semiparametric methods (generalised additive models, partially linear models, kernel regression), ensemble methods (bagging, boosting, gradient boosting, random forests, greedy and extreme learning machines). Modelling for cross-section, time series and spatial data, panel, clustered, functional and longitudinal data
  2. Semi-supervised learning: constrained and unconstrained learning from labelled and unlabelled data, experimental design and active learning, transductive learning

Suggested readings: Hastie et al. (2009), Mohri et al. (2012), Murphy (2012)

Large-Scale Learning and Optimization

Mathematically formulates business optimization problems with the aim to precisely estimate the correct operating configuration subject to business constraints. Scale existing descriptive and predictive modelling solutions to large-scale and high velocity data, using algorithms fine-tuned to efficiently store, process and optimize at that scale.

  1. Sparsity and regularization: Collaborative filtering and matrix completion algorithms, L1, L2, elastic net and SCAD penalties methods (LASSO, Elastic Net, Dantzig Selector, Grouped LASSO), shrinkage, model-averaging, stacked generalizations