A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments

Overview of attention for article published in BMC Bioinformatics, October 2007

Altmetric Badge

Citations

dimensions_citation: 63 Dimensions

Readers on

mendeley: 225 Mendeley
connotea: 1 Connotea

Summary Dimensions citations

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments
Published in	BMC Bioinformatics, October 2007
DOI	10.1186/1471-2105-8-419
Pubmed ID	17963529
Authors	Mark D Robinson, David P De Souza, Woon Wai Keen, Eleanor C Saunders, Malcolm J McConville, Terence P Speed, Vladimir A Likić
Abstract	Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies. A new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated; (2) the guide tree is built based on the similarity between the peak lists; (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites. We propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.

View on publisher site Alert me about new mentions

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 225 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	2	<1%
United Kingdom	2	<1%
Hong Kong	1	<1%
India	1	<1%
Singapore	1	<1%
Brazil	1	<1%
Korea, Republic of	1	<1%
China	1	<1%
Spain	1	<1%
Other	1	<1%
Unknown	213	95%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	55	24%
Researcher	40	18%
Student > Master	28	12%
Student > Bachelor	22	10%
Student > Doctoral Student	15	7%
Other	41	18%
Unknown	24	11%

Readers by discipline	Count	As %
Computer Science	62	28%
Engineering	42	19%
Agricultural and Biological Sciences	29	13%
Chemistry	16	7%
Mathematics	10	4%
Other	37	16%
Unknown	29	13%