## Multivariate Statistics

 Unit code: MATH48061 Credit Rating: 15 Unit level: Level 4 Teaching period(s): Semester 1 Offered by School of Mathematics Available as a free choice unit?: N

#### Requisites

Prerequisite

MATH48061 pre-requisites

Students are not permitted to take more than one of MATH38061 or MATH48061 for credit in the same undergraduate year.  Students are not permitted to take MATH48061 and MATH68061 for credit in an undergraduate programme and then a postgraduate programme.

#### Aims

To provide a modern overview of multivariate statistics including both the underlying math- ematical theory and practical considerations.

#### Overview

Almost all real data – from physical, biological, and social science, as well as industry and healthcare – involves recording observations of multiple variables. This course concerns the analysis of such multivariate data, from both a theoretical and practical viewpoint. Some techniques generalise on the univariate case – for example, maximum likelihood estimation. Others are new – for example principal component analysis.

#### Learning outcomes

On successful completion of the course students will be able to:

• Work with random vectors and matrices to derive results relevant to multivariate sta- tistical inference.
• Import multivariate data stored as plain text into statistical software, visualise the data and run the multivariate analysis techniques covered in the course on it.
• Use data or summary statistics of data to calculate sample mean vectors, variance- covariance matrices, and correlation matrices, as well as to define transformations to simplify analysis.
• Derive the principal components of data with a given covariance structure.
• Define the di¿erence between supervised and unsupervised learning, together with an algorithm for classification of data into two classes for each case.
• Perform unbiased estimation, maximum likelihood estimation and hypothesis testing for multivariate data.
• Derive key properties of the multivariate normal distribution and apply these to the analysis of multivariate data.
• Use contingency tables to test hypotheses and estimate e¿ect sizes for a variety of dis- crete multivariate models.

#### Assessment methods

• Other - 20%
• Written exam - 80%

#### Assessment Further Information

• Coursework, which will involve applying methods to real data.
• End of semester examination: three hours, weighting 80%

#### Syllabus

Mathematical foundations. Revision of vectors, matrices and random variables. New mate- rial on random vectors and random matrices.

Working with data. Constructing the n × p data matrix X from a data file. Sample mean vec- tor and covariance and correlation matrices. Unbiased estimation of population mean and variance-covariance. Transformation of data including Mahalanobis, standardisation and log- arithmic transformation. Visualisation of data including histograms, scatter plots, kernel den- sity plots and plot matrices.

Parametric multivariate statistics. The multivariate normal distribution, including marginal and conditional distributions. Other parametric distributions such as the multivariate log- normal, the multivariate t, and Gaussian mixtures. Maximum likelihood estimation and confi- dence regions for multivariate statistical models. Hypothesis testing and model selection.

Dimensional reduction. Detailed treatment of principal components analysis as well discus- sion of other methods.

Classification. Supervised versus unsupervised learning. Detailed treatment of discriminant analysis and k-means clustering, as well as discussion of other methods.

Discrete multivariate statistics.  Discrete multivariate sampling distributions.  Construction of contingency tables, and hypothesis testing for di¿erent independence and sampling null models. E¿ect sizes and confidence intervals.

• C. Chatfield and A. Collins. Introduction to Multivariate Analysis. Chapman & Hall / CRC Texts in Statistical Science. Taylor & Francis, 1981.

An introductory book slightly below the level of the course.

• A. C. Rencher. Multivariate Statistical Inference and Applications. Wiley Series in Prob- ability and Statistics. John Wiley & Sons, New York, 1998.

The main course text.

• Y. Bishop, S. E. Fienberg, and P. W. Holland.  Discrete Multivariate Analysis: Theory and Practice. Massachusetts Institute of Technology Press, Cambridge, 1975.

Covers the discrete case.

• S. Rogers and M. Girolami. A First Course in Machine Learning. CRC Press, Boca Raton, Florida, 2 edition, 2016.

Deals with aspects of machine learning relevant to this course.

# Feedback methods

Feedback will be provided throughout the course, including:

• In tutorials you will be able to ask for and receive feedback on your work and under- standing.