Longitudinal Data Analysis
|Unit level:||Level 4|
|Teaching period(s):||Semester 2|
|Offered by||School of Mathematics|
|Available as a free choice unit?:||N
- MATH38001 - Statistical Inference (Compulsory)
- MATH38141 - Regression Analysis (Optional)
Additional RequirementsMATH48132 pre-requisites
Students should have done the pre-requisite MATH38001 and MATH38141 or MATH48011.
Students are not permitted to take, for credit, MATH48132 in an undergraduate programme and then MATH68132 in a postgraduate programme at the University of Manchester, as the courses are identical.
To study advanced techniques of statistical sciences, and to develop statistical skill of analyzing correlated data and cluster data. To explore a wide range of real-life examples occurring in particular in biology, medicine and social sciences.
In longitudinal studies, repeated measurements are made on subjects over time and responses within a subject are likely to be correlated, although responses between subjects may be independent. Data such as these are very common in practice, for example, in quality control in industry, panel data analysis in economics, growth curve analysis in biology and agriculture, randomized controlled trials in medicine and public health, etc. Longitudinal data therefore combine elements of multivariate and time series data. However, they differ from classical multivariate data in that the time series aspect of the data typical imparts a much more highly structured pattern of interdependence among measurements than for standard multivariate data sets; and they differ from classical time series data in consisting of a large number of short series, one from each subject, rather than a single long series. When modelling such data, these characteristics have to be taken into account. Otherwise, it is very likely that statistical inferences are severely biased.
The primary objective of longitudinal data analysis is to study how a response variable is related to explanatory variables of interest and how its expectation varies over time, by taking into account the within-subject correlation. The second objective is to quantify random variations in different sources and to characterize the within-subject correlation structures, which plays an important role in longitudinal and clustered data analysis arising in many areas.
On successful completion of this course unit students will have a good understanding of:
- apply advanced statistical models, including general linear models with correlated random errors, linear mixed models, generalise linear mixed models and generalised estimating equations, to analyse longitudinal data and clustered data,
- distinguish the roles and functions of these models for continuous and discrete longitudinal data and clustered data,
- describe the parameter estimation theory and model selection criteria for these models,
- compare the strengths and weaknesses of marginal models and conditional models for longitudinal data and clustered data,
- formulate models for missing data, including missing completely at random, missing at random and missing not at random. Conduct statistical analysis for missing data,
- implement these statistical methods in statistical software R for practical data analysis.
- Other - 20%
- Written exam - 80%
Assessment Further Information
- Coursework 20%.
- End of semester examination: three hours weighting 80%
- Introduction: motivation examples from medical practice, fundamental problems of longitudinal data, exploring longitudinal data 
- Ordinary linear regression models for longitudinal data: linear models with independent random errors, analysis of variance (ANOVA) for longitudinal data, drawbacks and limitations of the classical models 
- General linear models for longitudinal data: general linear models with correlated random errors, various covariance models including compound symmetry, AR(1), exponential correlation, ante-dependence, etc., maximum likelihood estimation, restricted maximum likelihood estimation 
- Linear mixed models: Fixed effects, random effects, random variation in different sources, model representation, variance components, maximum likelihood estimation, EM-algorithm, restricted maximum likelihood estimation, prediction of random effects, goodness of fit 
- Non-normal longitudinal data models: a) population-averaged models: generalized estimating equations, working covariance specification, estimation and properties, b) subject-specific models: random effects models, exponential family of distributions, generalized linear mixed models, penalized quasi-likelihood estimation, variance component estimators, goodness of fit 
- Statistical methods dealing with missing data: a) missing data mechanism: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), b) simple methods of correction for missing data: single imputation and last-value-carried-forward methods, drawbacks and limitations, c) inference based methods: likelihood-based methods, multiple imputation, weighted estimating equations, sensitivity analysis 
- Davis, C. S. (2002). Statistical methods for the analysis of repeated measurements. Springer, New York
- Diggle, P. J., Heagerty, P., Liang, K Y. and Zeger, S. L. (1994). Analysis of longitudinal data. 2nd Edition. Oxford University Press
- Fitzmaurice, G. M., Laird, N. M., and Ware, J. H. (2004). Applied longitudinal analysis. New York, Wiley.
- Little, R. J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, 2nd Edition. New York: Wiley.
Feedback tutorials will provide an opportunity for students' work to be discussed and provide feedback on their understanding. Coursework or in-class tests (where applicable) also provide an opportunity for students to receive feedback. Students can also get feedback on their understanding directly from the lecturer, for example during the lecturer's office hour.
- Lectures - 33 hours
- Tutorials - 11 hours
- Independent study hours - 106 hours