Classifying genetic resources by categorical and continuous variables
- Additional Document Info
- View All
Hierarchical and nonhierarchical clustering methods are used for classifying genetic resources. In hierarchical clustering methods, all variables (categorical and continuous) can be used to form the subpopulations (groups or clusters), but in standard nonhierarchical methods only the continuous variables are incorporated in the analysis. The Location model (LM) allows classifying individuals into homogeneous subpopulations by continuous and categorical variables. In practice, the multinomial variable of the LM that arises from the combination of all the categorical variables usually shows empty cells in some subpopulations with the consequence of not allowing estimation of cell means and within-cell variances and covariances. The main objectives of this study were (i) to develop the Modified Location model (MLM) that allows empty cells in some subpopulations under the assumption that the means and the variance-covariance matrices depend on a given subpopulation instead of on a specific cell, (ii) to show how to use the MLM in the context of two-stage clustering in which the Ward method is used to form the initial groups and the MLM is applied to those groups (Ward-MLM), and (iii) to show how to apply the Ward-MLM to three different data sets to study some of its features and to compare results with other methods. The two-stage clustering strategy of finding initial groups by the Ward method and then improving the composition of the groups by the MLM produces compact and well-separated groups with respect to all the variables (categorical and continuous) compared with classifications obtained with only categorical variables, with only continuous variables, and with the standard Location model.
has subject area