-
Notifications
You must be signed in to change notification settings - Fork 25
Rewrite research pages in concise essay style #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,49 +1,14 @@ | ||
| --- | ||
| title: "AI & Computational Methods" | ||
| description: "Developing advanced AI and machine learning methods to solve complex genomics and systems biology problems." | ||
| description: "Developing AI and computational methods for genomics and systems biology." | ||
| cover: "../../assets/research-ai.jpg" | ||
| order: 1 | ||
| --- | ||
|
|
||
| # AI & Computational Methods | ||
|
|
||
| Our lab develops cutting-edge AI and machine learning methods specifically designed to solve challenging problems in genomics and systems biology. We create innovative algorithms that can extract meaningful biological insights from complex, high-dimensional genomic data, enabling discoveries that would be impossible through traditional approaches. | ||
| Our group develops AI and computational approaches that turn complex genomics data into testable biological insight. We focus on methods that are both high-performing and interpretable, so researchers can use model outputs to understand mechanisms rather than only generate scores. This includes machine learning for gene and variant prioritization, network-aware modeling of molecular systems, and statistically grounded frameworks for noisy, high-dimensional biomedical data. | ||
|
|
||
| ## Key Research Areas | ||
| At the systems level, we build models for pathway activity inference, multi-omics integration, and single-cell analysis to connect signals across molecular layers. We combine representation learning, probabilistic inference, and causal reasoning to improve robustness and biological relevance across diverse cohorts and experimental settings. These methods are designed to support end-to-end discovery workflows, from data interpretation to target nomination and experimental follow-up. | ||
|
|
||
| ### Machine Learning & Deep Learning | ||
| - **Graph Neural Networks**: Developing specialized architectures for modeling molecular interactions and biological networks | ||
| - **Transfer Learning**: Adapting pre-trained models for biomedical applications with limited labeled data | ||
| - **Representation Learning**: Creating informative embeddings for genes, variants, and biological entities | ||
| - **Interpretable AI**: Building transparent models that provide biological insights alongside predictions | ||
|
|
||
| ### Statistical Methods | ||
| - **Graphical Models**: Employing Bayesian networks and probabilistic graphical models for modeling regulatory relationships | ||
| - **High-Dimensional Statistics**: Developing methods for analyzing genomic data with thousands of variables | ||
| - **Causal Inference**: Inferring causal relationships from observational biological data | ||
| - **Regularization Methods**: Implementing sparse learning techniques (Lasso, elastic net) for feature selection in genomic studies | ||
|
|
||
| ### Data Integration | ||
| - **Multi-Omics Integration**: Combining genomics, transcriptomics, proteomics, and other data types | ||
| - **Knowledge Graph Construction**: Building comprehensive biological knowledge bases | ||
| - **Cross-Species Analysis**: Leveraging model organism data to understand human biology | ||
| - **Heterogeneous Data Fusion**: Integrating structured databases with unstructured literature | ||
|
|
||
| ## Methodological Innovations | ||
|
|
||
| ### Signal Pathway Activity Prediction | ||
| Quantitative approaches for inferring signaling pathway activity from genome-wide expression data, enabling systems-level understanding of cellular responses. | ||
|
|
||
| ### Network-Based Methods | ||
| Leveraging biological network topology to prioritize disease genes, predict gene function, and identify therapeutic targets. | ||
|
|
||
| ### Dimensionality Reduction | ||
| Advanced techniques for visualizing and analyzing high-dimensional single-cell data while preserving biological structure. | ||
|
|
||
| ## Open-Source Tools & Software | ||
|
|
||
| We are committed to making our methods accessible to the broader research community through well-documented, user-friendly software packages. Our tools are designed with both computational biologists and wet-lab researchers in mind. | ||
|
|
||
| ## Collaborative Research | ||
|
|
||
| Our computational methods are developed in close collaboration with experimental biologists and clinicians, ensuring that our approaches address real-world biological questions and can be validated experimentally. | ||
| We prioritize open, reusable software so these methods are accessible beyond our lab. Representative tools include [AI-MARRVEL](https://github.com/LiuzLab/AI_MARRVEL), [NMRQNet](https://github.com/LiuzLab/NMRQNet), [SPA-STOCSY](https://github.com/LiuzLab/SPA-STOCSY), and [CRISPRcloud](https://crispr.nrihub.org/). Together, these resources support computational biologists, experimental teams, and translational collaborators working on real-world genomic and disease-focused questions. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,95 +1,14 @@ | ||
| --- | ||
| title: "Rigorous Validation & Data Science" | ||
| description: "Ensuring AI-driven discoveries are robust, reproducible, and biologically meaningful through comprehensive validation frameworks." | ||
| description: "Building robust, reproducible validation frameworks for autism and neurogenomics AI models." | ||
| order: 2 | ||
| cover: "../../assets/research-validation.jpg" | ||
| --- | ||
|
|
||
| # Rigorous Validation & Data Science | ||
|
|
||
| We are committed to ensuring that AI-driven findings are not just computationally impressive, but biologically robust and clinically actionable. Our validation framework combines large-scale data analysis, experimental validation, and rigorous statistical testing to ensure reproducibility and reliability. This work is supported by major grants including the **NIH Autism Data Science Initiative (ADSI)** and the **Silicon Valley Community Foundation**, reflecting the importance of rigorous, validated approaches in advancing biomedical discovery. | ||
| Our validation program is centered on one principle: AI models should be trusted only when they are independently replicated, clinically relevant, and transparent about their limits. We use cross-cohort benchmarking, robustness testing, and reproducibility-focused evaluation to determine whether model performance holds across institutions, populations, and data modalities, rather than only in the training environment. | ||
|
|
||
| ## Validation Frameworks | ||
| We recently secured support through the [NIH Autism Data Science Initiative (ADSI)](https://dpcpsi.nih.gov/autism-data-science-initiative) for **Validate ASD: Independent Multimodal Replication and Validation of Autism Data-Science Models**. This project applies a two-track validation strategy, combining intact model testing with code-blinded replication, and integrates structured clinical data from Texas Children's Hospital with genomic data from SPARK, metabolomics from BaBS, and environmental exposure models linked to placental neurodevelopmental biomarkers. The goal is to define where current autism models generalize, where recalibration is needed, and how fairness and reliability can be improved for real-world pediatric use. | ||
|
|
||
| ### Computational Validation | ||
| - **Cross-Validation & Benchmarking**: Rigorous testing against gold-standard datasets | ||
| - **Replication Studies**: Validating findings across independent cohorts | ||
| - **Robustness Analysis**: Ensuring predictions are stable across different conditions | ||
| - **Statistical Rigor**: Multiple testing correction and proper statistical frameworks | ||
|
|
||
| ### Experimental Validation | ||
| - **Functional Genomics**: Testing predictions using CRISPR and other perturbation methods | ||
| - **Model Organisms**: Validating human disease genes in Drosophila and other systems | ||
| - **Multi-Omics Confirmation**: Verifying findings across genomic, transcriptomic, and proteomic layers | ||
|
|
||
| ### Clinical Validation | ||
| - **Patient Cohort Studies**: Testing predictions in real patient populations | ||
| - **Longitudinal Data**: Validating temporal predictions across disease progression | ||
| - **Clinical Outcome Correlation**: Ensuring predictions align with clinical observations | ||
|
|
||
| ## Key Research Areas | ||
|
|
||
| ### Autism Data Science (NIH ADSI & SVCF Supported) | ||
|
|
||
| Supported by the **NIH Autism Data Science Initiative** and the **Silicon Valley Community Foundation**, our autism research program emphasizes rigorous, data-driven approaches: | ||
|
|
||
| - **Large-Scale Genomic Analysis**: Systematic analysis of autism genomic databases with strict quality control | ||
| - **Reproducible Pipelines**: Standardized, well-documented analysis workflows | ||
| - **Cross-Study Validation**: Confirming findings across multiple independent autism cohorts | ||
| - **Biological Mechanism Validation**: Testing autism risk gene functions experimentally | ||
| - **Phenotype-Genotype Correlation**: Rigorous statistical analysis of clinical-molecular relationships | ||
|
|
||
| ### Transcriptional Regulation | ||
| - **Transcription Factor Networks**: Mapping regulatory circuits that control gene expression | ||
| - **Chromatin Accessibility**: Analyzing open chromatin regions and their role in gene regulation | ||
| - **Enhancer-Promoter Interactions**: Understanding long-range regulatory elements | ||
| - **RNA Regulation**: Investigating post-transcriptional control mechanisms including microRNAs and RNA-binding proteins | ||
|
|
||
| ### Single-Cell Genomics | ||
| - **Cell Type Identification**: Defining cellular heterogeneity in complex tissues | ||
| - **Developmental Trajectories**: Reconstructing cell state transitions during development and disease | ||
| - **Spatial Transcriptomics**: Integrating gene expression with spatial context in tissues | ||
| - **Single-Cell Multi-Omics**: Profiling multiple molecular layers simultaneously in individual cells | ||
|
|
||
| ### Gene Function & Networks | ||
| - **Gene Regulatory Networks**: Inferring causal relationships between genes | ||
| - **Protein-Protein Interaction Networks**: Mapping physical and functional interactions | ||
| - **Pathway Analysis**: Identifying perturbed biological pathways in disease | ||
| - **Gene Prioritization**: Ranking candidate disease genes based on network properties | ||
|
|
||
| ### Multi-Omics Data Analysis | ||
| - **Genomics**: Analyzing DNA variation, structural variants, and genome architecture | ||
| - **Transcriptomics**: Profiling gene expression across conditions, tissues, and cell types | ||
| - **Epigenomics**: Examining DNA methylation, histone modifications, and chromatin states | ||
| - **Proteomics**: Integrating protein abundance and post-translational modifications | ||
|
|
||
| ## Experimental Validation | ||
|
|
||
| ### CRISPR Screening | ||
| - **Pooled CRISPR Screens**: Systematic perturbation of genes to identify functional relationships | ||
| - **CRISPRcloud**: Our cloud-based platform for analyzing CRISPR screening data | ||
| - **Screen Design & Analysis**: Optimizing experimental design and computational pipelines | ||
|
|
||
| ### Model Organisms | ||
| - **Drosophila Studies**: Leveraging the power of fly genetics for functional validation | ||
| - **Cross-Species Conservation**: Identifying evolutionarily conserved mechanisms | ||
| - **Phenotypic Characterization**: Detailed behavioral and molecular phenotyping | ||
|
|
||
| ## Neurological Disease Focus | ||
|
|
||
| ### Autism Spectrum Disorders | ||
| Investigating molecular mechanisms underlying autism, supported by grants from the NIH Autism Data Science Initiative and the Silicon Valley Community Foundation. | ||
|
|
||
| ### Rare Neurological Disorders | ||
| Studying genetic and molecular basis of rare neurodevelopmental and neurodegenerative conditions. | ||
|
|
||
| ### MeCP2 and Rett Syndrome | ||
| Understanding the role of MeCP2 and its regulatory networks in brain development and function. | ||
|
|
||
| ## Data Resources | ||
|
|
||
| ### MARRVEL Database | ||
| Our flagship resource integrating human genetic data with model organism information, facilitating functional annotation of the human genome and enabling researchers to leverage decades of model organism research for understanding human disease. | ||
|
|
||
| ### Transposable Element Analysis | ||
| Tools and methods for quantifying and analyzing transposable elements from RNA-seq data, revealing their roles in gene regulation and disease. | ||
| This work is designed to produce high-impact validation reports and community-accessible tools that raise standards for autism model evaluation and clinical deployment. Representative validation-related publications include [(Jeong and Liu, 2019)](https://linkinghub.elsevier.com/retrieve/pii/S0896-6273(19)30972-9), [(Raman et al., 2018)](https://www.nature.com/articles/s41467-018-05627-1), [(Yi and Liu, 2011)](https://www.nature.com/articles/nmeth.1830), and [(Wan et al., 2014)](https://pubmed.ncbi.nlm.nih.gov/24489963/). |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRISPRcloud URL here uses
https, but elsewhere in the repo CRISPRcloud is referenced withhttp://crispr.nrihub.org(no TLS). To avoid shipping a potentially broken link, please verify that HTTPS is supported; otherwise switch this link tohttp(or update all references consistently if HTTPS is now available).