Retrotransposons are mobile elements that have a high impact on shaping the mammalian genomes. Since the availability of whole genomes, genomic analyses have provided novel insights into retrotransposon biology. However, many retrotransposon families and their possible genomic impact have not yet been analysed.
Here, we analysed the structural features, the genomic distribution and the evolutionary history of mouse VL30 LTR-retrotransposons. In total, we identified 372 VL30 sequences categorized as 86 full-length and 49 truncated copies as well as 237 solo LTRs, with non-random chromosomal distribution. Full-length VL30s were highly conserved elements with intact retroviral replication signals, but with no protein-coding capacity. Analysis of LTRs revealed a high number of common transcription factor binding sites, possibly explaining the known inducible and tissue-specific expression of individual elements. The overwhelming majority of full-length and truncated elements (82/86 and 40/49, respectively) contained one or two specific motifs required for binding of the VL30 RNA to the poly-pyrimidine tract-binding protein-associated splicing factor (PSF). Phylogenetic analysis revealed three VL30 groups with the oldest emerging ~17.5 Myrs ago, while the other two were characterized mostly by new genomic integrations. Most VL30 sequences were found integrated either near, adjacent or inside transcription start sites, or into introns or at the 3' end of genes. In addition, a significant number of VL30s were found near Krueppel-associated box (KRAB) genes functioning as potent transcriptional repressors.
Collectively, our study provides data on VL30s related to their: (a) number and structural features involved in their transcription that play a role in steroidogenesis and oncogenesis; (b) evolutionary history and potential for retrotransposition; and (c) unique genomic distribution and impact on gene expression.