Summary
Scoliosis is a complex spinal deformity with strong familial aggregation, yet its genetic basis remains incompletely defined. Here, we performed whole-exome sequencing (WES) in 34 scoliosis families, identifying candidate single nucleotide polymorphisms (SNPs) potentially contributing to disease susceptibility. To connect genomic variation with transcriptional regulation, we integrated these data with cis-expression quantitative trait loci (cis-eQTL) summary statistics from eQTLGen and scoliosis GWAS data from FinnGen R12 using summary-based Mendelian randomization (SMR). After harmonization and filtering (PSMR < 0.05, PHEIDI > 0.05), we identified 31 shared genes overlapping between family-based SNPs and eQTL-associated loci. Among them, LINC00299 and NPIPB11 exhibited strong colocalization signals, with LINC00299 expression inversely correlated with scoliosis risk. Functional annotation further highlighted NCAM1, AGPAT3, SMIM12, and CTSH as core pathogenic nodes linking neural development, lipid metabolism, mitochondrial energy regulation, and bone remodeling. Together, these findings reveal a multi-systemic genetic architecture of scoliosis and nominate metabolic-neuro-skeletal pathways as potential targets for precision diagnosis and therapy.
Keywords
Family-based integrative genomics,Scoliosis,Susceptibility
Introduction
Scoliosis represents a common structural deformity of the spine, affecting up to 3% of the global population, yet its etiology remains largely elusive1,2,3. Familial aggregation and twin studies strongly support a genetic component, but known risk loci explain only a fraction of disease heritability. Previous genome-wide association studies (GWAS) have identified several susceptibility regions; however, pinpointing the causal variants and their regulatory mechanisms has been hindered by the complexity of spinal development and by the polygenic nature of the disorder4.
To better define the genetic determinants underlying scoliosis, family-based sequencing approaches offer a complementary strategy that reduces population heterogeneity and enhances power to detect rare, high-impact variants. In parallel, advances in expression quantitative trait loci (eQTL) resources and summary-based Mendelian randomization (SMR) analysis enable systematic integration of transcriptional regulation with GWAS signals to infer putative causal relationships between gene expression and disease phenotypes.
In this study, we combined whole-exome sequencing (WES) from 34 scoliosis families with cis-eQTL data from eQTLGen and scoliosis GWAS data from FinnGen R125,6,7. We aimed to identify genes whose regulatory variants exert causal effects on scoliosis susceptibility, characterize their biological functions, and delineate the molecular pathways connecting neural, muscular, metabolic, and skeletal systems in spinal morphogenesis. Our integrative analysis highlights novel pathogenic genes and provides a mechanistic framework for understanding the multi-system basis of scoliosis.
Methods
WES Data from Scoliosis Families
The WES data of scoliosis families included in this study were obtained from 34 scoliosis families previously collected by our research group. The study protocol was approved by the ethics committee (approval number: 20181106003-GZ2021). After obtaining informed consent, peripheral blood samples were collected, and genomic DNA was extracted. WES was performed for sequencing, and following quality control and variant filtering, candidate SNP data were generated for subsequent intersection analysis with the SMR results.Detailed sequencing data, sampling details, and systematic sample information for the 34 families are provided in Supplementary Table 1.
cis-eQTL Data
The cis-eQTL summary statistics used in this study were obtained from the eQTLGen Consortium, which provides standardized cis-eQTL summary files suitable for SMR analysis via its official website. This dataset integrates whole-blood samples from 31,684 individuals of European ancestry across 37 independent studies, offering comprehensive and high-quality cis-eQTL data8. The data are available at:https://www.eqtlgen.org/cis-eqtls.html.
GWAS Data
The scoliosis GWAS summary statistics were obtained from the FinnGen R12 database, which integrates genomic and health data from 500,000 Finnish biobank participants. FinnGen is a large-scale genomics initiative that has analyzed over 500,000 Finnish biobank samples and linked genetic variants to health data to understand disease mechanisms and susceptibility9. Case-control summary statistics for scoliosis were retrieved using the keyword “scoliosis”. The data are available at: (1) FinnGen R12 summary statistics: https://www.finngen.fi; (2) Scoliosis data:https://storage.googleapis.com/finngen-public-data-r12/summary_stats/ release/finngen_R12_M13_SCOLIOSIS.gz.
Data Preprocessing
The raw summary statistics from the FinnGen R12 scoliosis GWAS and the eQTLGen cis-eQTL data were subjected to standardized preprocessing and format harmonization. For the GWAS data, core fields including SNP identifier, effect allele, reference allele, allele frequency, effect size, standard error, and P-value were uniformly extracted, and field naming conventions were standardized. Duplicate and blank SNP records were removed to ensure unique variant identification. Total sample size information was added, and the data were converted into the outcome data format required for SMR analysis. Subsequently, the cis-eQTL and GWAS data underwent allele orientation alignment, genomic coordinate matching, and gene symbol standardization to eliminate format discrepancies and naming inconsistencies across different data sources, ensuring complete matching of variant information between the two datasets. Finally, standardized input files directly applicable to SMR colocalization analysis were generated.
SMR Analysis
Based on the principle of Mendelian Randomization (MR), this study used SMR (Windows 1.3.1) software to perform colocalization and causal inference analysis between gene expression and scoliosis. By integrating cis-eQTL and GWAS summary data, this method overcomes the limitation of traditional GWAS that can only identify genetic associations but cannot pinpoint functional causal genes. Single nucleotide polymorphisms (SNPs) significantly associated with gene expression in the cis-eQTL data were used as instrumental variables, with gene expression level (cis-eQTL) as the exposure and scoliosis (GWAS) as the outcome. The effect estimates of the same variants from the GWAS summary data were integrated to quantitatively infer the potential causal effect of gene expression on the risk of scoliosis.
The significance threshold was set at PSMR< 0.05. Since two different variants located in linkage disequilibrium (LD) may show significant mutual associations, relying solely on the significance of SMR results cannot directly distinguish true causal associations from false positives due to LD, i.e., it is difficult to exclude pleiotropic effects10. To effectively differentiate LD effects from true pleiotropic causal effects, Zhu et al.10 proposed the HEIDI test based on instrumental variables. In this study,PHEIDI > 0.05 was used as a filtering criterion to remove false positive associations caused by LD interference. Ultimately, significantly associated genes passing both filters were obtained, along with their association information, generating the SMR.filter.csv dataset, which serves as a core candidate set for subsequent functional enrichment and mechanistic analysis.
Family-Based Gene Intersection Analysis
To integrate the results from family-based WES and public omics data, and to improve the reliability and specificity of candidate genes, an intersection analysis was performed between the two independent gene sets. The significant genes identified by SMR analysis were intersected with those carrying key SNPs from the WES analysis of scoliosis families. Venn diagrams were plotted using R software (version 4.4.2) with the ggvenn package11 to visualize the overlapping genes. Finally, 31 common genes present in both datasets were identified as core candidates for subsequent functional annotation and mechanistic exploration of scoliosis pathogenesis.
Data Visualization Analysis
All genome-wide data visualization analyses were performed using R software (version 4.4.2). The dplyr and tidyr packages12,13 were used for data manipulation, and the CMplot package14 was used to generate Manhattan plots. A Manhattan plot was drawn with chromosomes on the x-axis and -log10 P-value on the y-axis for the 31 common genes, providing a visual representation of the genome-wide distribution of scoliosis-associated SNPs across human chromosomes. The Manhattan plot clearly shows the chromosomal localization of significant association loci and preliminarily reveals the distribution pattern of scoliosis susceptibility regions, offering a visual basis for subsequent core gene screening and functional interpretation.
To further validate the colocalization association between gene expression and scoliosis risk, scatter plots of eQTL effect size versus GWAS effect size were generated for each of the 31 common genes using the magick and TeachingDemos packages15,16 in R 4.4.2. In these plots, the x-axis represents the eQTL effect size of the variant on gene expression, and the y-axis represents the GWAS effect size of the variant on scoliosis risk, with the top cis-eQTL and common cis-eQTL sites annotated for each gene. These scatter plots effectively illustrate the correlation trend between eQTL effects and GWAS effects, assist in evaluating the strength of the causal association between gene expression and disease phenotype, and provide visual support for the colocalization validation of core candidate genes.
Functional Enrichment
The Gene Ontology (GO) database provides a standardized annotation system for gene functions and serves as one of the core bioinformatics tools for interpreting gene expression data. GO enrichment analysis systematically annotates gene functions from three dimensions: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC), effectively identifying statistically enriched functional categories and biological characteristics under disease conditions17. In this study, GO enrichment analysis was performed using the clusterProfiler package18 in R software (version 4.4.2), and bubble plots were generated. A P-value < 0.05 was considered statistically significant for enrichment results19.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a comprehensive molecular function database constructed by the Bioinformatics Center of Kyoto University, covering 17 sub-databases and integrating information on cellular processes, genetic information, human diseases, and more20,21. Enrichment analysis based on KEGG data is a common approach to identify biological pathways and functional modules involved in target genes. In this study, KEGG enrichment analysis was performed using the clusterProfiler package in R 4.4.2, and bubble plots were visualized. A P-value < 0.05 was considered statistically significant for KEGG enrichment results19.
Results
Family-based analysis of scoliosis cohorts
Based on the integrated analysis of WES data from 34 scoliosis families and SMR results, a total of 31 common genes present in both datasets were identified. These genes passed stringent filtering criteria (PSMR < 0.05 and PHEIDI > 0.05), indicating that they carry key genetic variants at the family level while also showing causal associations with scoliosis risk at the population level. The PSMR and PHEIDI values for these 31 common genes are detailed in Table 1.
Table 1. SMR and HEIDI test P-values for the 31 common genes associated with scoliosis
| Correlation with loci | Gene | PSMR | PHEIDI |
| Positive | NCAM1 | 0.000922659 | 0.07680346 |
| Positive | AGPAT3 | 0.004800337 | 0.1413867 |
| Positive | SMIM12 | 0.00501161 | 0.05906706 |
| Negative | CTSH | 0.00691361 | 0.8436791 |
| Negative/td> | LINC00299 | 0.01113481 | 0.06945593 |
| Negative | NPIPB11 | 0.01167407 | 0.2094704 |
| Positive | NAPRT | 0.01293904 | 0.2441823 |
| Negative | HSPBP1 | 0.0159643 | 0.9743946 |
| Negative | TMEM132D | 0.01659424 | 0.8338818 |
| Negative | DEAF1 | 0.01783776 | 0.5167472 |
| Positive | TSHR | 0.01815261 | 0.9751504 |
| Positive | ALOX12-AS1 | 0.0189947 | 0.4511633 |
| Negative | L3MBTL4 | 0.02060692 | 0.5297665 |
| Positive | PRCP | 0.02117731 | 0.6506786 |
| Positive | EIF3E | 0.02500594 | 0.5799573 |
| Positive | KCNN3 | 0.02503178 | 0.8401287 |
| Negative | CRYGN | 0.02639662 | 0.9189739 |
| Positive | CAMK2D | 0.02760503 | 0.2768263 |
| Negative | NMNAT3 | 0.02938423 | 0.3255898 |
| Negative | NUP214 | 0.03011871 | 0.6775703 |
| Not determined | DLG2 | 0.03200031 | 0.3766503 |
| Positive | NMT1 | 0.03203178 | 0.1570335 |
| Negative | HEG1 | 0.03310556 | 0.1746253 |
| Negative | GOLGA8O | 0.03618912 | 0.2041633 |
| Positive | SYNJ2BP | 0.03664463 | 0.420418 |
| Positive | VDR | 0.03686418 | 0.5702664 |
| Negative | GRB10 | 0.03821055 | 0.2389569 |
| Negative | ACOT7 | 0.03850855 | 0.6677867 |
| Negative | DOCK7 | 0.04185344 | 0.8608807 |
| Negative | LDHAL6A | 0.04815689 | 0.6968079 |
Colocalization validation at the LINC00299 locus
The Manhattan plot illustrates the distribution of the 31 common genes across all chromosomes (Figure 1A), revealing that multiple genomic loci are significantly associated with scoliosis susceptibility. Colocalization analysis indicated that the long non-coding RNA LINC00299 exhibits a significant cis-eQTL effect in the context of scoliosis genetic susceptibility.
The single‑gene scatter plot for LINC00299 (Figure 1B) showed that most data points fall along a negative correlation trend line, indicating that alleles increasing its expression tend to decrease the risk of scoliosis. That is, higher expression of LINC00299 is associated with lower disease susceptibility, suggesting that down-regulation of this gene may promote abnormal spinal development. The top cis-eQTL for this locus (marked by a red triangle) also perfectly aligns with the overall fitted trend. Statistical analysis showed a significant SMR causal association test (PSMR = 0.01113481, P < 0.05) and a HEIDI heterogeneity test (PHEIDI = 0.06945593, P > 0.05) indicating no horizontal pleiotropy, thus excluding false colocalization bias.
Further analysis using the regional colocalization plot (Figure 1C) revealed that within the 8.5-9.0 Mb region on chromosome 2, the significant association peak from the scoliosis GWAS and the strong cis-eQTL association peak for LINC00299 are highly overlapping. The two genetic signals are consistently distributed across the genomic interval, exhibiting clear colocalization characteristics.
eQTL-GWAS effect characteristics of core candidate genes
Through integrative colocalization analysis of scoliosis GWAS signals and cis-eQTL signals, this study identified four candidate genes with potential pathogenic relevance:NCAM1, AGPAT3, SMIM12, and CTSH.
For NCAM1(Figure 2A), AGPAT3(Figure 2B), and SMIM12(Figure 2C), the cis-eQTL effect sizes showed a stable positive correlation with the GWAS disease effect sizes, indicating that the direction of genetic variant effects on gene expression was consistent with the direction of variant effects on scoliosis risk. In contrast, CTSH(Figure 2D) exhibited a clear negative correlation between eQTL effect sizes and GWAS effect sizes. Moreover, the top cis-eQTL (marked by red triangles) for each gene fell exactly on the overall fitted trend line, suggesting that altered expression levels of these four genes provide reliable genetic evidence for an association with the risk of scoliosis.
Functional annotation of key genes associated with scoliosis
Functional enrichment analysis was conducted on the 31 consensus genes associated with scoliosis. GO analysis identified significant enrichment across three standard functional dimensions(Figure 3A-B).
In the BP category, these genes were mainly clustered in pathways related to energy coenzyme biosynthesis and neural signal transduction. The most significantly enriched terms included NAD biosynthetic process, nicotinamide nucleotide biosynthetic process, pyridine nucleotide biosynthetic process, pyridine-containing compound biosynthetic process, Rho protein signal transduction, and regulation of cold-induced thermogenesis.
For the CC category, prominent enrichment was found in synaptic complexes, neuronal structures and specialized cell membrane domains, such as basal part of cell, postsynaptic density, asymmetric synapse, juxtaparanode region of axon, postsynaptic specialization, and neuron-to-neuron synapse.
In the MF group, enriched functions were primarily involved in catalytic activity, ion channel regulation and molecular binding. Key significant terms covered exopeptidase activity, dipeptidyl-peptidase activity, bile acid binding, myristoyltransferase activity, nuclear export signal receptor activity, and calcium-activated potassium channel activity. Full detailed statistics for all GO enrichment results are listed in Table 2.
Further KEGG pathway enrichment analysis revealed that these genes were significantly enriched in nicotinate and nicotinamide metabolism, insulin secretion, glucagon signaling pathway, Hypoxia-Inducible Factor-1 (HIF-1) signaling pathway, cofactor biosynthesis, and the renin-angiotensin system (Figure 3C).
Table 2. GO enrichment analysis results of the 31 common genes associated with scoliosis
| ID | Description | Pvalue | Enrichment factor |
| BP GO:0009435 | NAD biosynthetic process | 5.7e-04 | 56.17 |
| BP GO:0019359 | nicotinamide nucleotide biosynthetic process | 6.7e-04 | 52.16 |
| BP GO:0019363 | pyridine nucleotide biosynthetic process | 6.7e-04 | 52.16 |
| BP GO:0072525 | pyridine-containing compound biosynthetic process | 8.2e-04 | 47.11 |
| BP GO:0007266 | Rho protein signal transduction | 8.8e-04 | 15.76 |
| BP GO:0120161 | regulation of cold-induced thermogenesis | 1.1e-03 | 14.51 |
| CC GO:0045178 | basal part of cell | 9.5e-04 | 9.00 |
| CC GO:0014069 | postsynaptic density | 1.4e-02 | 5.96 |
| CC GO:0032279 | asymmetric synapse | 1.5e-02 | 5.70 |
| CC GO:0044224 | juxtaparanode region of axon | 1.7e-02 | 59.40 |
| CC GO:0099572 | postsynaptic specialization | 1.8e-02 | 5.40 |
| CC GO:0098984 | neuron to neuron synapse | 2.0e-02 | 5.19 |
| MF GO:0008238 | exopeptidase activity | 1.0e-02 | 12.99 |
| MF GO:0008239 | dipeptidyl-peptidase activity | 1.5e-02 | 66.92 |
| MF GO:0032052 | bile acid binding | 1.5e-02 | 66.92 |
| MF GO:0019107 | myristoyltransferase activity | 1.6e-02 | 60.83 |
| MF GO:0005049 | nuclear export signal receptor activity | 1.8e-02 | 55.76 |
| MF GO:0015269 | calcium-activated potassium channel activity | 1.8e-02 | 55.76 |
Discussion
By integrating multi-omics data from family-based WES, cis-eQTL, and GWAS, combined with SMR-HEIDI colocalization analysis and family-based variant validation, this study systematically dissected the genetic susceptibility mechanisms underlying scoliosis. We identified 31 common genes that link family-derived rare variants with population-level expression regulation signals, constructing a convergent genetic map connecting transcriptional regulation to scoliosis susceptibility22,23,24. Among these, the non-coding RNA LINC00299 and the protein-coding genes NCAM1, AGPAT3, SMIM12, and CTSH demonstrated key functional and regulatory potential, providing new insights into the pathogenesis of scoliosis.
The strong colocalization signal observed for LINC00299 suggests that its expression level is inversely correlated with scoliosis risk, indicating an important role for this long non-coding RNA as a regulatory element in spinal morphogenesis25,26. As a key non-coding regulatory gene identified in this study, dysregulation of LINC00299 may interfere with the gene regulatory network governing normal spinal development, positioning it as a potential critical regulatory node mediating genetic susceptibility to scoliosis.
Beyond LINC00299, four protein-coding genes were also confirmed to participate in core biological processes relevant to scoliosis. NCAM1 mediates neuronal adhesion and synaptic signaling, influencing muscle tone balance and the maintenance of spinal symmetry through the regulation of neuronal development and neuromuscular junction function27,28,29. AGPAT3 is involved in phospholipid biosynthesis and fatty acid remodeling, playing an important role in maintaining chondrocyte membrane integrity and regulating normal skeletal morphogenesis30,31,32. SMIM12 is a mitochondrial membrane protein that regulates cellular energy metabolism and oxidative stress homeostasis; its expression imbalance may induce muscle metabolic disturbances and bilateral muscle tension asymmetry33. CTSH encodes a lysosomal cysteine protease that participates in extracellular matrix degradation and bone remodeling, thereby regulating vertebral bone metabolism and spinal structural stability34,35.
Together, these genes cover multiple key biological dimensions including neural regulation, lipid and energy metabolism, muscle function, and bone remodeling, collectively forming a multi-system pathogenic regulatory network spanning the metabolic-neural-skeletal axis. This provides new molecular evidence and research directions for deciphering the complex multi-dimensional molecular pathogenesis of scoliosis.Recent genetic studies have identified susceptibility genes for scoliosis through genome-wide association analysis, and bidirectional Mendelian randomization investigations have further revealed causal associations of inflammatory cytokines and gut microbiota with scoliosis, supporting that scoliosis is regulated by multiple genetic and molecular factors36,37,38.
Functional enrichment analysis revealed that GO annotations of the candidate genes were mainly enriched in core functions such as neuronal synapse structure, energy coenzyme biosynthesis, and ion channel regulation, confirming their involvement in neural development and energy metabolism regulation at the levels of cellular component, biological process, and molecular function. KEGG pathway enrichment further highlighted significant enrichment in nicotinate and nicotinamide metabolism, insulin and glucagon signaling pathways, HIF-1 signaling, and the renin-angiotensin system, suggesting extensive interplay between endocrine-metabolic regulation and skeletal growth. The key genes identified in this study may link systemic metabolic regulation with spinal skeletal development by modulating metabolic homeostasis, energy substrate synthesis, and neuromuscular signaling. These findings indicate that scoliosis is not a purely orthopedic disorder but rather a multi-organ developmental condition involving coordinated neural and metabolic regulation34,39.
Several limitations of this study should be acknowledged. First, the GWAS summary data were obtained from the FinnGen database, which predominantly includes individuals of Finnish European ancestry, and the eQTL data were also derived from whole-blood samples of European populations. Therefore, the generalizability of our conclusions to other ethnic groups requires further validation40. Second, although SMR analysis can effectively identify causal relationships between gene expression and disease, it cannot completely exclude bias due to horizontal pleiotropy, and the power of the HEIDI test depends on the number of instrumental variables and the accuracy of the linkage disequilibrium structure10. Third, the functional roles of the key genes identified in this study are primarily based on bioinformatics annotations and lack experimental validation in vitro or in vivo; their specific pathogenic mechanisms await further investigation. Finally, scoliosis exhibits high clinical heterogeneity, and the GWAS data used in this study were not stratified by different clinical subtypes (e.g., idiopathic, congenital, etc.), which may affect the fine mapping of our results.
Conclusion
This study integrated family-based multi-omics data and SMR-HEIDI colocalization analysis to identify 31 scoliosis susceptibility-related consensus genes, with LINC00299 and four protein-coding genes emerging as critical regulatory and functional hubs. Our findings reveal that scoliosis arises from a complex neural–metabolic–skeletal interactive network, rather than being a simple orthopedic disease. This family-driven integrative genomic analysis expands the understanding of scoliosis genetic mechanisms, and provides novel candidate targets and a foundational framework for future mechanistic research and translational development.
Author contributions
Xuanchen Li, Zhenrui Lin: Conceptualization, data curation, formal analysis, visualization, manuscript drafting. Xiaosheng Chen, Zhixiang Zhu, Rui Zhang: Data evaluation, Data collection, Data filtering. Weijun Wang, WenYu Zhou, Yi He, Lei Yang: Data analysis, result visualization, manuscript editing. Lihe Jiang, Guodong Huang and Qian Liang: Conceptualization, investigation, data curation, supervision, writing manuscript and review.
Funding
This work was supported by the Natural Science Foundation of Guangdong Province (2024A1515013174), Sanming Project of Medicine in Shenzhen (SZSM202211003), Shenzhen Key Medical Discipline Construction Fund (SZXK022), the National Natural Science Foundation Youth Project of China (82203807), Shenzhen-Hong Kong Jointly Funded Project, Shenzhen Science and Technology Program (SGDX20230116093645007), Shenzhen Medical Research Fund (B2303005), Development and Reform Commission of Shenzhen Municipality (S2002Q84500835), Shenzhen Basic Research Project (JCYJ20220530150810024), (JCYJ20240813141020027).
Acknowledgements
The authors would like to express sincere thanks to all participating schools and students of the study and all the rehabilitation therapists from Shenzhen Youth Spine Health Center for their help with the scoliosis screening and data collection. Thanks to/Supported by Shenzhen Portion of Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone, project No. HTHZQSWS-KCCYB-2023060.
References
Figure Legends(3)
Supplemental information(1)
Citation