Annotation and target analysis of human endogenous retroviruses
DOI:
https://doi.org/10.71373/bggqy536Keywords:
human endogenous retroviruses, Long terminal repeat, regulation mechanismAbstract
Endogenous retroviruses (ERVs) are important regulatory elements in the human genome. They are involved in the regulation of host gene expression and disease progression through long terminal repeats (LTR) and coding domains (gag, pol, env). Methods: Based on the implicit Markov model, this study integrated LTRharvest and LTRdigest software, combined with 55 ERV-related protein domain databases, systematically annotated ERVs elements in the human genome (GRCh38.p14), and analyzed their potential targets and functions by using STRING, GO and KEGG. Results: A total of 47,666 HERVs candidate sequences (11.05% of the genome) were identified in this study, of which 605 were complete structures, mainly concentrated in chromosomes 1 and 3. It was found that env accounted for the least among the three protein structures. Potential target genes in the upstream and downstream 20kb range of LTR were screened, and core targets such as histone family genes H4C6 and H2BC12 and immune-related genes TLR2 and CCR5 were found to be involved in disease regulation through chromatin remodeling or immune pathways. Enrichment results were significantly associated with nucleosome assembly, innate immune response, and cancer-related pathways (e.g., herpes simplex virus infection, systemic lupus erythematosus). Conclusion: This study constructed a comprehensive HERVs annotated database, revealing the potential regulatory ability of LTR, providing a theoretical basis for the application of HERVs in cancer, autoimmune diseases and evolutionary research, and laying a foundation for the development of targeted therapy strategies.