↓ Skip to main content

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster

Overview of attention for article published in BMC Genomics, January 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (87th percentile)
  • High Attention Score compared to outputs of the same age and source (90th percentile)

Mentioned by

twitter
21 X users
facebook
1 Facebook page

Citations

dimensions_citation
143 Dimensions

Readers on

mendeley
477 Mendeley
citeulike
3 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
Published in
BMC Genomics, January 2016
DOI 10.1186/s12864-015-2353-z
Pubmed ID
Authors

Yanzhu Lin, Kseniya Golovnina, Zhen-Xia Chen, Hang Noh Lee, Yazmin L. Serrano Negron, Hina Sultana, Brian Oliver, Susan T. Harbison

Abstract

A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set. We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex. The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data.

X Demographics

X Demographics

The data shown below were collected from the profiles of 21 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 477 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 9 2%
Spain 3 <1%
Switzerland 2 <1%
Sweden 2 <1%
Italy 1 <1%
Brazil 1 <1%
United Kingdom 1 <1%
Germany 1 <1%
Mexico 1 <1%
Other 3 <1%
Unknown 453 95%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 120 25%
Researcher 111 23%
Student > Master 67 14%
Student > Bachelor 35 7%
Student > Doctoral Student 33 7%
Other 65 14%
Unknown 46 10%
Readers by discipline Count As %
Agricultural and Biological Sciences 177 37%
Biochemistry, Genetics and Molecular Biology 148 31%
Computer Science 36 8%
Immunology and Microbiology 14 3%
Mathematics 10 2%
Other 34 7%
Unknown 58 12%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 11. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 11 February 2016.
All research outputs
#3,065,129
of 24,598,501 outputs
Outputs from BMC Genomics
#1,000
of 11,013 outputs
Outputs of similar age
#51,213
of 403,788 outputs
Outputs of similar age from BMC Genomics
#26
of 264 outputs
Altmetric has tracked 24,598,501 research outputs across all sources so far. Compared to these this one has done well and is in the 87th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 11,013 research outputs from this source. They receive a mean Attention Score of 4.8. This one has done particularly well, scoring higher than 90% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 403,788 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 87% of its contemporaries.
We're also able to compare this research output to 264 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 90% of its contemporaries.