↓ Skip to main content

Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots

Overview of attention for article published in BMC Bioinformatics, May 2017
Altmetric Badge

About this Attention Score

  • Above-average Attention Score compared to outputs of the same age (62nd percentile)
  • Above-average Attention Score compared to outputs of the same age and source (57th percentile)

Mentioned by

twitter
1 X user
wikipedia
1 Wikipedia page

Citations

dimensions_citation
32 Dimensions

Readers on

mendeley
25 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots
Published in
BMC Bioinformatics, May 2017
DOI 10.1186/s12859-017-1645-5
Pubmed ID
Authors

Jochen Kruppa, Klaus Jung

Abstract

Analyses of molecular high-throughput data often lack in robustness, i.e. results are very sensitive to the addition or removal of a single observation. Therefore, the identification of extreme observations is an important step of quality control before doing further data analysis. Standard outlier detection methods for univariate data are however not applicable, since the considered data are high-dimensional, i.e. multiple hundreds or thousands of features are observed in small samples. Usually, outliers in high-dimensional data are solely detected by visual inspection of a graphical representation of the data by the analyst. Typical graphical representation for high-dimensional data are hierarchical cluster tree or principal component plots. Pure visual approaches depend, however, on the individual judgement of the analyst and are hard to automate. Existing methods for automated outlier detection are only dedicated to data of a single experimental groups. In this work we propose to use bagplots, the 2-dimensional extension of the boxplot, to automatically identify outliers in the subspace of the first two principal components of the data. Furthermore, we present for the first time the gemplot, the 3-dimensional extension of boxplot and bagplot, which can be used in the subspace of the first three principal components. Bagplot and gemplot surround the regular observations with convex hulls and observations outside these hulls are regarded as outliers. The convex hulls are determined separately for the observations of each experimental group while the observations of all groups can be displayed in the same subspace of principal components. We demonstrate the usefulness of this approach on multiple sets of artificial data as well as one set of gene expression data from a next-generation sequencing experiment, and compare the new method to other common approaches. Furthermore, we provide an implementation of the gemplot in the package 'gemPlot' for the R programming environment. Bagplots and gemplots in subspaces of principal components are useful for automated and objective outlier identification in high-dimensional data from molecular high-throughput experiments. A clear advantage over other methods is that multiple experimental groups can be displayed in the same figure although outlier detection is performed for each individual group.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 25 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Sweden 1 4%
Brazil 1 4%
Unknown 23 92%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 11 44%
Researcher 4 16%
Student > Bachelor 1 4%
Student > Doctoral Student 1 4%
Student > Master 1 4%
Other 1 4%
Unknown 6 24%
Readers by discipline Count As %
Agricultural and Biological Sciences 7 28%
Biochemistry, Genetics and Molecular Biology 5 20%
Mathematics 1 4%
Veterinary Science and Veterinary Medicine 1 4%
Business, Management and Accounting 1 4%
Other 3 12%
Unknown 7 28%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 4. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 05 May 2017.
All research outputs
#7,280,433
of 22,968,808 outputs
Outputs from BMC Bioinformatics
#2,874
of 7,306 outputs
Outputs of similar age
#115,180
of 310,760 outputs
Outputs of similar age from BMC Bioinformatics
#46
of 113 outputs
Altmetric has tracked 22,968,808 research outputs across all sources so far. This one has received more attention than most of these and is in the 67th percentile.
So far Altmetric has tracked 7,306 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one has gotten more attention than average, scoring higher than 59% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 310,760 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 62% of its contemporaries.
We're also able to compare this research output to 113 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 57% of its contemporaries.