Publications:Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data

From ISLAB/CAISR
Revision as of 21:39, 30 September 2016 by Slawek (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Do not edit this section

Keep all hand-made modifications below

Title Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data
Author Evaldas Vaiciukynas and Antanas Verikas and Adas Gelzinis and Marija Bacauskiene and Irina Olenina
Year 2015
PublicationType Journal Paper
Journal Chemometrics and Intelligent Laboratory Systems
HostPublication
Conference
DOI http://dx.doi.org/10.1016/j.chemolab.2015.04.018
Diva url http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:809939
Abstract Multivariate permutation-based energy test of equal distributions is considered here. Approach is attributable to the emerging field of ε-statistics and uses natural logarithm of Euclidean distance for within-sample and between-sample components. Result from permutations is enhanced by a tail approximation through generalized Pareto distribution to boost precision of obtained p-values. Generalization from two-sample case to multiple samples is achieved by combining p-values through meta-analysis. Several strategies of varied statistical power are possible, while a maximum of all pairwise p-values is chosen here. Proposed approach is tested on several morphometric and chemometric data sets. Each data set is additionally transformed by principal component analysis for the purpose of dimensionality reduction and visualization in 2D space. Variable selection, namely, sequential search and multi-cluster feature selection, is applied to reveal in what aspects the groups differ most.Morphometric data sets used: 1) survival data of house sparrows Passer domesticus; 2) orange and blue varieties of rock crabs Leptograpsus variegatus; 3) ontogenetic stages of trilobite species Trimerocephalus lelievrei; 4) marine phytoplankton species Prorocentrum minimum.Chemometric data sets used: 1) essential oils composition of medicinal plant Hyptis suaveolensspecimens; 2) chemical information of olive oil samples; 3) elemental composition of biomass ash; 4) exchangeable cations of earth metals in forest soil samples.Statistically significant differences between groups were successfully indicated, but the selection of variables had a profound effect on the result. Permutation-based energy test and it’s multi-sample generalization through meta-analysis proved useful as an unbalanced non-parametric MANOVA approach. Introduced solution is simple, yet flexible and powerful, and by no means is confined to morphometrics or chemometrics alone, but has a wide range of potential applications. Copyright © 2015 Elsevier B.V.