Content area
Full Text
Introduction
Most of the many recent dietary pattern analyses 1-8 have used one of two apparently contrasting statistical methods to find patterns in food consumption data. The first approach using factor analysis finds dimensions in the diet that represent an individual's tendency to eat in certain ways. For example, reports on the Health Professionals Study 5 6 and the Nurses' Health Study 7 8 have described 'Western' and 'prudent' dimensions in the diets of their participants. A disadvantage, however, is that an additional step of cross-classifying the dimensions in some way is necessary if one wishes to compute pattern prevalence or risk of disease for one group of individuals compared with another group.
The second approach classifies individuals into mutually exclusive groups according to how (dis)similar they are with respect to their food consumption using (non-parametric) cluster analysis, for example, the K-means method used by Chen et al. 9 A disadvantage of cluster analysis is that each individual is assigned to one dietary pattern with a probability of 1 and all others to a dietary pattern with a probability of 0. Thus, classification uncertainty is assumed to be 0. Other disadvantages stemming from non-parametric approaches include the difficulty in taking into account covariates and the lack of a convenient way to compare the many different clustering criteria.
The primary objectives of a dietary pattern analysis are to characterise the eating habits of a population and to associate diet with disease 10 11 . A finite mixture model (FMM) can be used to achieve these objectives with additional advantages as outlined by Fahey et al. 12 Classification uncertainty is measured by the posterior probability of pattern membership given the data, which for each individual, may take values between zero and one. It is also easy to adjust for energy and to choose among different clustering criteria. An FMM is analogous to a factor analysis with a categorical latent variable and can be used to create mutually exclusive groups. However, it can also be used to estimate dietary pattern prevalence and to describe patterns without 'hard' classification of individuals to clusters. Instead, classification is 'soft' with estimates weighted by the posterior probabilities. We adapt the general approach outlined by Fahey et al 12 in this paper...