As the proportions of the elderly are going up rapidly, the need to investigate factors that impact the cognitive impairment process is of particular importance. With recent advancements in available genomic resources and data, the popularity of genomic association analysis, including the genome-wide association study (GWAS), epigenome-wide association study (EWAS), transcriptome-wide association study (TWAS), is expected to pave the way to understand the genetic and molecular mechanism underlying the cognitive aging. However, current statistical methods for genome-wide and omics association studies are dominated by analytical models based on multiple assumptions such as normal distribution for omics data and cognitive measurements, linear relationship between molecular profiles, and cognitive measurements. Additionally, dealing with correlation structure in twin data imposes more assumptions. The multiple assumptions from identified markers could be responsible for the low replication and limited description in the phenotype variation. This thesis introduces an assumption-free generalized correlation coefficient (GCC) and compares the performance of the GCC model association analysis of omics data to the currently popular regression models on cognitive scores on twin samples.
The first study introduces a GWAS on cognitive function in related samples of Danish twins, where it compares the performance of GCC with conventional linear models and replicates results in Chinese twin samples. The heritability estimates for cognitive function from twin studies are generally higher than 44%, suggesting an essential role of GWAS for identifying the phenotype-associated genetic variants. The results indicate that GCC is able to capture different patterns of genotype-phenotype associations, not limited to additive genetic effects. More genes and meaningful biological pathways were replicated by GCC than linear models.
The second study introduces an EWAS on cognitive function using DNA methylation data measured in blood samples of Danish twins to compare the performance of GCC and popular linear models. DNA methylation is an important epigenetic modification, highly involved in many age-associated phenotypes. The results show that the combination of methylation CpGs from both GCC and linear models identified more important genes, pathways, and differentially methylated regions that might implicate the cognitive performance and cognitive decline compared to limiting to the linear models. Notably, the top findings were replicated successfully in an independent Danish twin data.
The third and fourth studies introduce a TWAS focusing on the expression of long non-coding RNA (lncRNA) and messenger RNA (mRNA) associated with cognitive function in Danish twin data by performing both GCC and linear models. LncRNA is a crucial epigenetic regulator with a role in cis or trans modulators to express protein-coding genes. Further, the third study linked the top identified lncRNAs to the lncRNA-mRNA interaction network to investigate their functional implication in cognition. And, the fourth study further determines the importance of previously identified cognitive function-related transcription factors (TFs). The combination of GCC and linear models could identify interesting lncRNA and genes and functional clusters. By mapping the top lncRNAs to the lncRNA-mRNA interaction network, we could detect significantly enriched biological pathways involving cognitive impairment and neurological disorders. Furthermore, by using both GCC and linear models, we identified more interesting differentially expressed genes and biological pathways implicated in cognitive function. In addition, significant regulons link to cognition verified the previously TFs associated with cognitive function.
In conclusion, our comparison study through the analysis of different omics data types in twin samples reveals that the assumption-free GCC could serve as a robust method along with popular linear models to identify important markers missing by traditional linear models. Overall, our results promote using the generalized association method in omics studies for biomarker discovery and improved functional annotations.