Title |
Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms
|
---|---|
Published in |
BMC Bioinformatics, November 2014
|
DOI | 10.1186/s12859-014-0350-x |
Pubmed ID | |
Authors |
Daniel I Speiser, M Sabrina Pankey, Alexander K Zaharoff, Barbara A Battelle, Heather D Bracken-Grissom, Jesse W Breinholt, Seth M Bybee, Thomas W Cronin, Anders Garm, Annie R Lindgren, Nipam H Patel, Megan L Porter, Meredith E Protas, Ajna S Rivera, Jeanne M Serb, Kirk S Zigler, Keith A Crandall, Todd H Oakley |
Abstract |
BackgroundTools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families.ResultsWe generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 30 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository (http://bitbucket.org/osiris_phylogenetics/pia/) and we demonstrate PIA on a publicly-accessible web server (http://galaxy-dev.cnsi.ucsb.edu/pia/).ConclusionsOur new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 5 | 26% |
Germany | 3 | 16% |
United Kingdom | 1 | 5% |
Norway | 1 | 5% |
Unknown | 9 | 47% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 12 | 63% |
Members of the public | 7 | 37% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Germany | 2 | 1% |
Netherlands | 1 | <1% |
Brazil | 1 | <1% |
Unknown | 133 | 97% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 40 | 29% |
Researcher | 27 | 20% |
Student > Bachelor | 14 | 10% |
Student > Master | 14 | 10% |
Other | 7 | 5% |
Other | 17 | 12% |
Unknown | 18 | 13% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 68 | 50% |
Biochemistry, Genetics and Molecular Biology | 24 | 18% |
Computer Science | 8 | 6% |
Environmental Science | 5 | 4% |
Earth and Planetary Sciences | 2 | 1% |
Other | 12 | 9% |
Unknown | 18 | 13% |