Research Interests and Methods
In longitudinal studies, repeated measurements are made regularly or irregularly on subjects over time and responses within a subject are likely to be correlated, although responses between subjects are independent. If the within-subject correlation is not taken into account, statistical inferences of longitudinal studies may become very unreliable due to biased estimates of parameters of interest.
Modelling longitudinal data hence involves exploration of the within-subject covariance structure. Conventional modelling strategy in the literature either assumes a specific "working" covariance structure, e.g., symmetric compound or AR(1) structure, or selects an "optimal" covariance structure from a class of available candidates in terms of certain selection criteria such as AIC or BIC. However, mis-specification of covariance matrices may occur when the true covariance structure is not included in the class of candidates, which may lead to biased estimates of parameters of interest.
I am particularly interested in developing new methodologies, based on parametric and non-parametric techniques, to jointly model the mean and covariance structures in longitudinal studies and to compare those with the literature methods.
Statistical diagnostics in longitudinal studies includes model diagnostics, i.e., goodness-of-fit, and outliers/influential observation detection. When a model is fitted to a data set, we need to assess the performance of the model. For example, we need to compare the joint modelling of mean and covariance structures with conventional modelling methods that pre-specify a covariance structure. We focus to find appropriate criteria to measure the goodness-of-fit of joint modelling of mean and covariance structures. It is well known that a few of subjects or observations within a subject in longitudinal studies may substantially alter statistical inferences. Detection of such subjects/observations and further investigation of relationship of those two levels' influences are also my research interest.
State-space models, or time-dependent statistical models, have been widely used, for example, in fisheries stock assessment and financial markets. An advantage of state-space modelling is that all major sources of uncertainty can be taken into account and as a consequence time series of abundance can be estimated. It is well known that the Kalman filter performs very well when both state and observation equations are linear and Normally distributed. However, when models are markedly non-linear and non-Normally distributed, Kalman filter may lead to severely biased estimates of parameters. I am particularly interested in studying computer-based intensive methods in state space models, for example, applying sequential importance sampling or MCMC to state-space modelling.
Generalised linear mixed models (GLMMs) are an extension of generalised linear models (GLMs) in the sense that certain random effects are incorporated into the linear predictor of GLMs. Also GLMMs are an extension of linear mixed models (LMMs). Incorporation of random effects can take into account the correlation of longitudinal/spatial data and variation in different sources. GLMMs in general produce reliable and accurate statistical inferences. However, incorporation of random effects complicates the estimation problem considerably. To obtain the likelihood, for example, the random effects have to be integrated out from the joint likelihood of the response and random effects. This integral except for a few special cases is analytically intractable. A typical example is the analysis of a salamader mating experimental data set (McCullagh and Nelder, 1989), which involves six 20-dimensional analytical intractable integrals.
A work of mine in this area is to make an appropriate transformation of random effects and then apply Gauss-Hermite quadrature approximation to the integrated likelihood. This method can be thought of as higher-order Laplace approximation where the number of integration nodes corresponds to the order of Laplace approximation. For example, when choosing a single integration node it reduces to the penalised quasi-likelihood (PQL) estimation (Breslow and Clayton's, 1993). If we choose two Gauss-Hermite quadrature nodes, it is equivalent to Goldstein's PQL2 (1996). This method, however, is only suitable for lower-dimensional random effects because the computational efforts increase exponentially with the dimension. For higher-dimensional random effects, we developed several procedures that are based on Quasi-Monte Carlo approximation, importance sampling and EM algorithm to calculate the estimates of parameters in GLMMs.
Growth curve models (GCMs) are generalised multivariate analysis-of-variance models, which are commonly used in repeated measures and longitudinal data analysis. My interests in this area include a) exploring relationship between the generalised least square estimates and the maximum likelihood estimates of the regression coefficients and dispersion components b) finding the best linearly unbiased estimates of the regression coefficients c) studying admissibility of estimates of the regression coefficients d) conducting various hypothesis tests under a class of elliptically symmetric distributions e) finding posterior distributions of the regression coefficients and dispersion components under non-informative prior f) developing methodologies of likelihood-based and Bayesian diagnostics in GCMs.
I am interested in medical statistics in particular its methodology study involved in randomised controlled clinical trials and epidemiology, including trial designs, pilot studies, sample size calculation and data analysis.