diff --git a/src/content/research/ai-computational-methods.md b/src/content/research/ai-computational-methods.md index 5059c61..00f2019 100644 --- a/src/content/research/ai-computational-methods.md +++ b/src/content/research/ai-computational-methods.md @@ -1,49 +1,14 @@ --- title: "AI & Computational Methods" -description: "Developing advanced AI and machine learning methods to solve complex genomics and systems biology problems." +description: "Developing AI and computational methods for genomics and systems biology." cover: "../../assets/research-ai.jpg" order: 1 --- # AI & Computational Methods -Our lab develops cutting-edge AI and machine learning methods specifically designed to solve challenging problems in genomics and systems biology. We create innovative algorithms that can extract meaningful biological insights from complex, high-dimensional genomic data, enabling discoveries that would be impossible through traditional approaches. +Our group develops AI and computational approaches that turn complex genomics data into testable biological insight. We focus on methods that are both high-performing and interpretable, so researchers can use model outputs to understand mechanisms rather than only generate scores. This includes machine learning for gene and variant prioritization, network-aware modeling of molecular systems, and statistically grounded frameworks for noisy, high-dimensional biomedical data. -## Key Research Areas +At the systems level, we build models for pathway activity inference, multi-omics integration, and single-cell analysis to connect signals across molecular layers. We combine representation learning, probabilistic inference, and causal reasoning to improve robustness and biological relevance across diverse cohorts and experimental settings. These methods are designed to support end-to-end discovery workflows, from data interpretation to target nomination and experimental follow-up. -### Machine Learning & Deep Learning -- **Graph Neural Networks**: Developing specialized architectures for modeling molecular interactions and biological networks -- **Transfer Learning**: Adapting pre-trained models for biomedical applications with limited labeled data -- **Representation Learning**: Creating informative embeddings for genes, variants, and biological entities -- **Interpretable AI**: Building transparent models that provide biological insights alongside predictions - -### Statistical Methods -- **Graphical Models**: Employing Bayesian networks and probabilistic graphical models for modeling regulatory relationships -- **High-Dimensional Statistics**: Developing methods for analyzing genomic data with thousands of variables -- **Causal Inference**: Inferring causal relationships from observational biological data -- **Regularization Methods**: Implementing sparse learning techniques (Lasso, elastic net) for feature selection in genomic studies - -### Data Integration -- **Multi-Omics Integration**: Combining genomics, transcriptomics, proteomics, and other data types -- **Knowledge Graph Construction**: Building comprehensive biological knowledge bases -- **Cross-Species Analysis**: Leveraging model organism data to understand human biology -- **Heterogeneous Data Fusion**: Integrating structured databases with unstructured literature - -## Methodological Innovations - -### Signal Pathway Activity Prediction -Quantitative approaches for inferring signaling pathway activity from genome-wide expression data, enabling systems-level understanding of cellular responses. - -### Network-Based Methods -Leveraging biological network topology to prioritize disease genes, predict gene function, and identify therapeutic targets. - -### Dimensionality Reduction -Advanced techniques for visualizing and analyzing high-dimensional single-cell data while preserving biological structure. - -## Open-Source Tools & Software - -We are committed to making our methods accessible to the broader research community through well-documented, user-friendly software packages. Our tools are designed with both computational biologists and wet-lab researchers in mind. - -## Collaborative Research - -Our computational methods are developed in close collaboration with experimental biologists and clinicians, ensuring that our approaches address real-world biological questions and can be validated experimentally. +We prioritize open, reusable software so these methods are accessible beyond our lab. Representative tools include [AI-MARRVEL](https://github.com/LiuzLab/AI_MARRVEL), [NMRQNet](https://github.com/LiuzLab/NMRQNet), [SPA-STOCSY](https://github.com/LiuzLab/SPA-STOCSY), and [CRISPRcloud](https://crispr.nrihub.org/). Together, these resources support computational biologists, experimental teams, and translational collaborators working on real-world genomic and disease-focused questions. diff --git a/src/content/research/translational-bridge.md b/src/content/research/translational-bridge.md index 6c0a814..7781669 100644 --- a/src/content/research/translational-bridge.md +++ b/src/content/research/translational-bridge.md @@ -7,112 +7,8 @@ cover: "../../assets/research-translational.jpg" # Translational Bridge -We serve as a critical bridge connecting the power of AI and machine learning with the practical needs of researchers and clinicians. This translational focus ensures that cutting-edge computational methods don't remain isolated in the AI world but become accessible, usable tools that accelerate biomedical discovery and improve patient outcomes. We actively work to close the gap between what's possible with AI and what's practical in research labs and clinical settings. +Our translational mission is to connect AI/ML innovation with everyday research and clinical decision-making. In practice, this means designing methods that are not only technically strong, but also interpretable, deployable, and compatible with real-world constraints in genetics clinics and biomedical laboratories. We work at the interface of computational science, genomics, and patient care so that advanced models can be used responsibly by clinicians, not just evaluated in idealized settings. -## Bridging the Gap +This bridge is especially important in rare disease and neurodevelopmental research, where clinicians need timely and evidence-based guidance from complex genomic data. Through projects such as AI-MARRVEL and related diagnostic workflows, we support variant prioritization, phenotype-guided interpretation, and evidence integration across databases, model organisms, and literature. Our autism and neurological disease efforts follow the same translational logic: combine large-scale data science with biologically grounded interpretation to improve diagnosis, stratification, and treatment-oriented discovery. -### Understanding Both Worlds -- **AI/ML Expertise**: Deep understanding of machine learning methods, their capabilities and limitations -- **Biological Domain Knowledge**: Comprehensive grasp of genomics, disease biology, and clinical needs -- **Practical Constraints**: Awareness of real-world limitations in clinical and research settings - -### Making AI Accessible -- **User-Friendly Interfaces**: Web-based tools requiring no programming expertise -- **Interpretable Results**: Clear explanations that bridge technical and clinical language -- **Actionable Outputs**: Results that directly inform research decisions and clinical care -- **Training & Education**: Workshops and documentation to empower users - -### Two-Way Communication -- **Clinician Feedback**: Continuous input from end-users shapes tool development -- **Research Integration**: Embedding tools into existing laboratory workflows -- **Iterative Improvement**: Rapid cycles of development, testing, and refinement based on real-world use - -## Key Research Areas - -### AI-Powered Rare Disease Diagnosis - -#### AI-MARRVEL (AIM) -Our flagship machine learning system, published in NEJM AI, helps clinicians prioritize potentially causative genetic variants for Mendelian disorders. Diagnosing rare genetic diseases is often a multi-year "diagnostic odyssey" - AI-MARRVEL significantly accelerates this process by: - -- **Variant Prioritization**: Ranking genetic variants based on their likelihood of causing disease -- **Evidence Integration**: Combining diverse data sources including population genetics, functional predictions, model organism data, and clinical phenotypes -- **Clinical Phenotype Matching**: Leveraging HPO (Human Phenotype Ontology) terms to match patient symptoms with known disease profiles -- **Literature Mining**: Automatically extracting relevant information from millions of scientific publications - -### Autism Research & Diagnosis - -Supported by grants from the **NIH Autism Data Science Initiative** and the **Silicon Valley Community Foundation**, our autism research program focuses on: - -#### Data-Driven Discovery -- **Large-Scale Data Analysis**: Mining autism genomic databases to identify risk genes and pathways -- **Molecular Subtyping**: Defining biologically meaningful autism subtypes based on molecular signatures -- **Biomarker Development**: Identifying molecular markers for early diagnosis and treatment response - -#### Mechanistic Understanding -- **Gene Network Analysis**: Understanding how autism risk genes interact and contribute to disease -- **Cellular Pathways**: Identifying disrupted biological pathways that can be therapeutic targets -- **Brain Development**: Studying how genetic variants affect neurodevelopment - -### Undiagnosed Disease Programs - -We collaborate with clinical genetics programs to solve the most challenging diagnostic cases: - -- **Whole Genome/Exome Analysis**: Comprehensive analysis of patient genetic data -- **Novel Gene Discovery**: Identifying previously unknown disease genes -- **Variant Interpretation**: Determining the pathogenicity of rare variants -- **Family Studies**: Analyzing inheritance patterns and segregation - -### Neurological Disease Applications - -#### Neurodegenerative Diseases -- **Cellular Heterogeneity**: Defining how different cell types are affected in neurodegeneration -- **Disease Progression**: Modeling molecular changes over disease course -- **Therapeutic Target Identification**: Finding druggable pathways and proteins - -#### Neurodevelopmental Disorders -- **Gene Discovery**: Identifying new neurodevelopmental disease genes -- **Genotype-Phenotype Correlations**: Understanding how specific mutations lead to clinical features -- **Functional Validation**: Testing disease mechanisms in model systems - -## Clinical Impact - -### Diagnostic Rate Improvement -Our tools have helped increase diagnostic rates for rare diseases, ending diagnostic odysseys for numerous families and enabling precision medicine approaches. - -### Clinical Decision Support -We provide clinicians with interpretable, evidence-based recommendations that integrate seamlessly into existing clinical workflows. - -### Personalized Medicine -By understanding the molecular basis of each patient's condition, we enable tailored treatment approaches and inform prognosis. - -## Translational Research Infrastructure - -### Clinical Collaborations -- **Texas Children's Hospital**: Close collaboration with clinical genetics and neurology departments -- **Baylor College of Medicine**: Integration with genetic testing laboratories -- **Undiagnosed Diseases Network**: Participation in national collaborative efforts -- **International Partnerships**: Contributing to global rare disease initiatives - -### Patient-Centered Approach -Our research is guided by the needs of patients and families affected by rare genetic disorders. We actively engage with patient advocacy groups and incorporate patient perspectives into our tool development. - -### Regulatory & Clinical Validation -We work toward clinical-grade tools that meet regulatory standards and undergo rigorous validation in clinical settings. - -## Real-World Implementation - -### Web-Based Platforms -User-friendly interfaces that require no programming expertise, making advanced AI accessible to all clinicians. - -### Secure Data Handling -HIPAA-compliant infrastructure ensuring patient privacy and data security. - -### Continuous Improvement -Ongoing updates incorporating new scientific knowledge, clinical feedback, and improved machine learning models. - -## Future Directions - -- **Expanded Disease Coverage**: Extending our methods to broader categories of genetic disorders -- **Therapeutic Predictions**: Using AI to suggest potential treatments based on molecular mechanisms -- **Real-Time Clinical Integration**: Embedding our tools directly into electronic health records -- **Multi-Modal Data**: Incorporating imaging, clinical notes, and other data types for comprehensive diagnosis +We maintain close collaboration with clinical teams and research partners, including Texas Children's Hospital, Baylor College of Medicine, and broader undiagnosed disease networks, to ensure iterative improvement from real use. This collaboration-driven model supports secure data handling, continuous model refinement, and practical implementation pathways that move computational advances from development into measurable patient impact. diff --git a/src/content/research/validation-data-science.md b/src/content/research/validation-data-science.md index bcf2ed2..b4e58e2 100644 --- a/src/content/research/validation-data-science.md +++ b/src/content/research/validation-data-science.md @@ -1,95 +1,14 @@ --- title: "Rigorous Validation & Data Science" -description: "Ensuring AI-driven discoveries are robust, reproducible, and biologically meaningful through comprehensive validation frameworks." +description: "Building robust, reproducible validation frameworks for autism and neurogenomics AI models." order: 2 cover: "../../assets/research-validation.jpg" --- # Rigorous Validation & Data Science -We are committed to ensuring that AI-driven findings are not just computationally impressive, but biologically robust and clinically actionable. Our validation framework combines large-scale data analysis, experimental validation, and rigorous statistical testing to ensure reproducibility and reliability. This work is supported by major grants including the **NIH Autism Data Science Initiative (ADSI)** and the **Silicon Valley Community Foundation**, reflecting the importance of rigorous, validated approaches in advancing biomedical discovery. +Our validation program is centered on one principle: AI models should be trusted only when they are independently replicated, clinically relevant, and transparent about their limits. We use cross-cohort benchmarking, robustness testing, and reproducibility-focused evaluation to determine whether model performance holds across institutions, populations, and data modalities, rather than only in the training environment. -## Validation Frameworks +We recently secured support through the [NIH Autism Data Science Initiative (ADSI)](https://dpcpsi.nih.gov/autism-data-science-initiative) for **Validate ASD: Independent Multimodal Replication and Validation of Autism Data-Science Models**. This project applies a two-track validation strategy, combining intact model testing with code-blinded replication, and integrates structured clinical data from Texas Children's Hospital with genomic data from SPARK, metabolomics from BaBS, and environmental exposure models linked to placental neurodevelopmental biomarkers. The goal is to define where current autism models generalize, where recalibration is needed, and how fairness and reliability can be improved for real-world pediatric use. -### Computational Validation -- **Cross-Validation & Benchmarking**: Rigorous testing against gold-standard datasets -- **Replication Studies**: Validating findings across independent cohorts -- **Robustness Analysis**: Ensuring predictions are stable across different conditions -- **Statistical Rigor**: Multiple testing correction and proper statistical frameworks - -### Experimental Validation -- **Functional Genomics**: Testing predictions using CRISPR and other perturbation methods -- **Model Organisms**: Validating human disease genes in Drosophila and other systems -- **Multi-Omics Confirmation**: Verifying findings across genomic, transcriptomic, and proteomic layers - -### Clinical Validation -- **Patient Cohort Studies**: Testing predictions in real patient populations -- **Longitudinal Data**: Validating temporal predictions across disease progression -- **Clinical Outcome Correlation**: Ensuring predictions align with clinical observations - -## Key Research Areas - -### Autism Data Science (NIH ADSI & SVCF Supported) - -Supported by the **NIH Autism Data Science Initiative** and the **Silicon Valley Community Foundation**, our autism research program emphasizes rigorous, data-driven approaches: - -- **Large-Scale Genomic Analysis**: Systematic analysis of autism genomic databases with strict quality control -- **Reproducible Pipelines**: Standardized, well-documented analysis workflows -- **Cross-Study Validation**: Confirming findings across multiple independent autism cohorts -- **Biological Mechanism Validation**: Testing autism risk gene functions experimentally -- **Phenotype-Genotype Correlation**: Rigorous statistical analysis of clinical-molecular relationships - -### Transcriptional Regulation -- **Transcription Factor Networks**: Mapping regulatory circuits that control gene expression -- **Chromatin Accessibility**: Analyzing open chromatin regions and their role in gene regulation -- **Enhancer-Promoter Interactions**: Understanding long-range regulatory elements -- **RNA Regulation**: Investigating post-transcriptional control mechanisms including microRNAs and RNA-binding proteins - -### Single-Cell Genomics -- **Cell Type Identification**: Defining cellular heterogeneity in complex tissues -- **Developmental Trajectories**: Reconstructing cell state transitions during development and disease -- **Spatial Transcriptomics**: Integrating gene expression with spatial context in tissues -- **Single-Cell Multi-Omics**: Profiling multiple molecular layers simultaneously in individual cells - -### Gene Function & Networks -- **Gene Regulatory Networks**: Inferring causal relationships between genes -- **Protein-Protein Interaction Networks**: Mapping physical and functional interactions -- **Pathway Analysis**: Identifying perturbed biological pathways in disease -- **Gene Prioritization**: Ranking candidate disease genes based on network properties - -### Multi-Omics Data Analysis -- **Genomics**: Analyzing DNA variation, structural variants, and genome architecture -- **Transcriptomics**: Profiling gene expression across conditions, tissues, and cell types -- **Epigenomics**: Examining DNA methylation, histone modifications, and chromatin states -- **Proteomics**: Integrating protein abundance and post-translational modifications - -## Experimental Validation - -### CRISPR Screening -- **Pooled CRISPR Screens**: Systematic perturbation of genes to identify functional relationships -- **CRISPRcloud**: Our cloud-based platform for analyzing CRISPR screening data -- **Screen Design & Analysis**: Optimizing experimental design and computational pipelines - -### Model Organisms -- **Drosophila Studies**: Leveraging the power of fly genetics for functional validation -- **Cross-Species Conservation**: Identifying evolutionarily conserved mechanisms -- **Phenotypic Characterization**: Detailed behavioral and molecular phenotyping - -## Neurological Disease Focus - -### Autism Spectrum Disorders -Investigating molecular mechanisms underlying autism, supported by grants from the NIH Autism Data Science Initiative and the Silicon Valley Community Foundation. - -### Rare Neurological Disorders -Studying genetic and molecular basis of rare neurodevelopmental and neurodegenerative conditions. - -### MeCP2 and Rett Syndrome -Understanding the role of MeCP2 and its regulatory networks in brain development and function. - -## Data Resources - -### MARRVEL Database -Our flagship resource integrating human genetic data with model organism information, facilitating functional annotation of the human genome and enabling researchers to leverage decades of model organism research for understanding human disease. - -### Transposable Element Analysis -Tools and methods for quantifying and analyzing transposable elements from RNA-seq data, revealing their roles in gene regulation and disease. +This work is designed to produce high-impact validation reports and community-accessible tools that raise standards for autism model evaluation and clinical deployment. Representative validation-related publications include [(Jeong and Liu, 2019)](https://linkinghub.elsevier.com/retrieve/pii/S0896-6273(19)30972-9), [(Raman et al., 2018)](https://www.nature.com/articles/s41467-018-05627-1), [(Yi and Liu, 2011)](https://www.nature.com/articles/nmeth.1830), and [(Wan et al., 2014)](https://pubmed.ncbi.nlm.nih.gov/24489963/).