## Multivariate Statistics

 Unit code: MATH38061 Credit Rating: 10 Unit level: Level 3 Teaching period(s): Semester 1 Offered by School of Mathematics Available as a free choice unit?: N

#### Requisites

Prerequisite

Students are not permitted to take more than one of MATH38061 or MATH48061 for credit in the same or different undergraduate year.  Students are not permitted to take MATH48061 and MATH68061 for credit in an undergraduate programme and then a postgraduate programme.

#### Aims

To familiarise students with the ideas and methodology of certain multivariate methods together with their application in data analysis using the R statistical computing package.

#### Overview

In practice most sets of data are multivariate in that they consist of observations on several different variables for each of a number of individuals or objects. Indeed, such data sets arise in many areas of science, the social sciences and medicine and techniques for their analysis form an important area of statistics. This course unit introduces a number of techniques, some of which are generalisations of univariate methods, while others are completely new (e.g. principal component analysis). The course focuses on continuous multivariate data.

#### Learning outcomes

On successful completion of the course students will:

• be familiar with multivariate random vectors and their probability distributions;
• have acquired skills in data classification, dimensionality reduction techniques, inferential methods based on the multivariate Normal distribution as an underlying model;
• be aware of how the statistical package R can be used as a tool for multivariate data analysis and graphical presentation.

#### Assessment methods

• Other - 20%
• Written exam - 80%

#### Assessment Further Information

• Coursework: weighting 20%
• End of semester examination: two hours weighting 80%

#### Syllabus

• Introductory ideas and basic concepts - random vectors and their distribution, linear transformations (including the Mahalanobis transformation), sample statistics and their properties, overall measures of dispersion in p-space, distances in p-space, simple graphical techniques.
• Cluster Analysis - aims, hierarchical algorithms, the dendrogram.
• Principal component analysis - definition and derivation of population PC's, sample PC's, practical considerations, geometrical properties, examples.
• The Multivariate Normal (MVN) distribution - definition, properties, conditional distributions, the Wishart and Hotelling T-squared distributions, sampling distributions of the sample mean vector and covariance matrix, maximum likelihood estimation of the mean vector and covariance matrix.
• Hypothesis testing and confidence intervals (one sample procedures) - the generalized likelihood ratio test, tests on the mean vector, CI's for the components of the mean vector.
• Hypothesis testing and confidence intervals (two independent sample procedures) - tests on the difference between two mean vectors, testing equality of covariance matrices, CI's for the differences in the components of the mean vectors.
• Profile Analysis.

• Chatfield, C. and Collins, A. J., An Introduction to Multivariate Analysis, Chapman & Hall 1983.
• Krzanowski, W. J., Principles of Multivariate Analysis: A User's Perspective, Oxford University Press 1990.
• Johnson, R. A. and Wichern, D. W., Applied Multivariate Statistical Analysis 3rd edition, Prentice Hall 1992.

#### Feedback methods

Feedback tutorials will provide an opportunity for students' work to be discussed and provide feedback on their understanding.  Coursework or in-class tests (where applicable) also provide an opportunity for students to receive feedback.  Students can also get feedback on their understanding directly from the lecturer, for example during the lecturer's office hour.

#### Study hours

• Lectures - 22 hours
• Tutorials - 11 hours
• Independent study hours - 67 hours

#### Teaching staff

Thomas House - Unit coordinator