December 10

Add to Calendar 2025-12-10 11:30:00 2025-12-10 13:00:00 America/New_York Advancing protein sequence analysis with protein language models AbstractProtein language models (PLMs) have emerged as transformative tools for understanding and interpreting protein sequences, enabling advances in structure prediction, functional annotation, and variant effect assessment directly from sequence alone. Yet realizing their full potential requires both algorithmic innovation and a deeper understanding of their capabilities and limitations. In this talk, I will present several recent developments that advance PLM-based protein sequence analysis along these dimensions. First, I will introduce Bag-of-Mer (BoM) pooling, a biologically inspired strategy for aggregating amino acid embeddings that can capture both local motifs and long-range interactions, improving performance on diverse tasks such as protein activity prediction, remote homology detection, and peptide–protein interaction prediction. Next, I will describe ARIES, a highly scalable multiple-sequence alignment algorithm that leverages PLM embeddings to achieve superior accuracy even in low-identity regions where traditional methods struggle. Finally, time permitting, I will discuss insights into PLM performance, including the roles of training data, sequence fit, and model architecture. Together, this work illustrates how PLMs can both power and reshape core computational biology tasks, while providing guidance for more effective and biologically grounded model development.Speaker BioMona Singh is the Wang Family Professor in Computer Science at Princeton University, where she is jointly appointed in the Computer Science department and the Lewis-Sigler Institute for Integrative Genomics. Mona obtained her AB and SM degrees at Harvard University, and her PhD at MIT, all three in Computer Science. She did postdoctoral work at the Whitehead Institute for Biomedical Research. She received the Presidential Early Career Award for Scientists and Engineers (PECASE). She is a Fellow of the International Society for Computational Biology, a Fellow of the Association for Computing Machinery and a Fellow of the American Institute for the Medical and Biological Engineering. She is currently Editor-In-Chief of the Journal of Computational Biology. She has been program committee chair for several major computational biology conferences, including ISMB (2010), WABI (2010), ACM-BCB (2012), and RECOMB (2016). She has been Chair of the NIH Modeling and Analysis of Biological Systems Study Section (2012-2014), and a council member of the Computing Community Consortium (2021-2024), and is currently on the steering committee for WABI. TBD

December 03

Add to Calendar 2025-12-03 11:30:00 2025-12-03 13:00:00 America/New_York Towards virtual cells - the need for actionable, robust perturbation models AbstractComputational cell biology is evolving from descriptive atlases to predictive, actionable models — from mapping what cells are to simulating what they do. In this talk, I will outline progress toward virtual cells, focusing on machine learning approaches that enable robust perturbation modeling.Among others, I will present scConcept, a framework that differs from large-scale foundation models such as Geneformer by learning in the latent space through control-based objectives. Rather than passively embedding cellular states, scConcept explicitly models transitions between them, capturing how cells move through gene-expression space in response to context and perturbation.Building on this foundation, I will introduce CellFlow, a generative perturbation model that predicts how interventions — such as drugs, cytokines, or gene edits — reshape cellular phenotypes. By learning causal directions of change, CellFlow enables in silico experimentation and virtual screening of differentiation protocols.Together, these developments point toward virtual cells: computational counterparts capable of robustly predicting and designing biological behavior.Speaker BioProf. Fabian Theis is internationally recognized for pioneering work at the interface of artificial intelligence, machine learning, and biomedicine. As Head of the Computational Health Center at Helmholtz Munich and Chair for Mathematical Models of Biological Systems at the Technical University of Munich, he leads cutting-edge research on multimodal data integration, single-cell and spatial omics, and AI-powered modeling of cell states in health and disease.A founding force behind Helmholtz.AI and co-director of several national and European AI initiatives, Theis plays a key role in shaping the biomedical AI landscape. He is a core contributor to the Human Cell Atlas and has driven the development of widely adopted computational tools in the life sciences.His achievements have been recognized with numerous honors, including the Gottfried Wilhelm Leibniz Prize (2023), the ISCB Innovator Award (2025), and an ERC Advanced Grant (2022). In 2025, he was elected to the German National Academy of Sciences Leopoldina and appointed Chair of the Bavarian AI Council.Beyond academia, Theis actively advises biotech companies and drives translational AI research towards clinical applications and precision medicine. TBD

November 19

AI for Genomes: Rethinking de novo Assembly

Genome Institute of Singapore / University of Zagreb
Add to Calendar 2025-11-19 11:30:00 2025-11-19 13:00:00 America/New_York AI for Genomes: Rethinking de novo Assembly AbstractAccurately resolving genomic paths in assembly graphs is a key challenge in de novo genome assembly, especially in the presence of repeats that create tangles and fragmentation. We present a geometric deep learning framework that learns directly from graph structure, bypassing conventional heuristics and exploiting problem symmetries to achieve PacBio HiFi reconstructions with state-of-the-art quality and contiguity. The same approach can be implemented for other sequencing technologies. Here, we will present results for haploid and diploid genomes. Our method performs robustly on both simulated and real datasets and will be able to utilise telomere-to-telomere reference expansion. By decoupling path inference from hard-coded strategies and generalising across species and genomic architectures, this framework opens the door to reconstructing highly complex genomes, including those with high ploidy or extensive structural variation.Speaker BioMile Šikić is the group leader at the Genome Institute of Singapore and a Professor of computer science at the University of Zagreb, Croatia. Throughout his scientific career, he has specialized in developing algorithms and AI methods for genomics. His laboratory has created several cutting-edge tools and models, including the HERRO error correction tool, the RiNALMo large RNA language model, the Racon consensus tool, the Raven de novo assembler, and the Edlib sequence aligner. Recently, the focus of his lab has shifted towards integrating AI into the de novo assembly process and innovating AI models to make RNA druggable.In the initial decade of his career, Dr Šikić was engaged in various industry projects related to computer and mobile networks. He is an accomplished entrepreneur, having founded several ventures, including a hedge fund. TBD

November 12

Add to Calendar 2025-11-12 11:30:00 2025-11-12 13:00:00 America/New_York From Networks to Subtypes: Statistical Frameworks for Mechanistic Insights into Complex Disease Genetics AbstractComplex diseases often arise from diverse genetic mechanisms acting through interconnected pathways and frequently encompass multiple hidden subtypes that share similar diagnostic features but have distinct genetic origins. Naturally, this raises two key questions: how can we move beyond single-gene associations to uncover mechanistic links to disease, and how can we identify latent, clinically meaningful subtype structure within complex disorders? Traditional analyses struggle to answer both. In this talk, I will present two approaches that bring a systems perspective to human genetics. First, I will introduce NERINE, a network-aware rare variant testing framework that integrates gene-gene interaction topology into a hierarchical model. By embedding human genetic variation within gene networks, NERINE enables competitive evaluation of biological hypotheses, achieving higher power and interpretability than traditional burden tests. Applied to both canonical pathway databases and experimentally derived networks, NERINE reveals novel disease mechanisms in breast cancer, type II diabetes, cardiovascular disease, and Parkinson’s disease (PD) across biobanks. In PD, rare variant associations identified by NERINE converge with a genome-scale CRISPRi screen in iPSC-derived neuronal models of synucleinopathies, revealing a mechanistic role for PRL in the α-synuclein stress response. Next, I will present Checkers, a new method for detecting genetic subtypes of complex diseases directly from genotype data. Using eigenanalysis under a liability threshold model, I will show that when subtypes arise from distinct underlying liabilities, a mixed disease cohort exhibits predictable low-dimensional patterns of sample relatedness at causal variants that can be detected without phenotypic or omics covariates. Checkers couples a novel matrix transformation to correct for sample kinship and linkage disequilibrium in controls with a statistical test on the eigenvalue distribution to estimate the number of disease subtypes. Our method effectively disentangles mixtures of binarized quantitative traits in the UK Biobank and provides a general framework for understanding disease heterogeneity. Together, these studies illustrate how network-aware and structure-aware computational frameworks can unify experimental and population-level perspectives, illuminating both the mechanistic architecture and hidden subtypes of human disease.Speaker BioSumaiya Nazeen, PhD, is a postdoctoral research fellow jointly advised by Professors Shamil Sunyaev and Vikram Khurana at Harvard Medical School’s Department of Biomedical Informatics and Brigham and Women’s Hospital’s Department of Neurology. Her research focuses on developing computational and statistical models to interpret the genetic basis of complex human diseases, with a particular emphasis on neurodegenerative disorders. By integrating large-scale genomic and functional omics datasets, her work aims to uncover the molecular mechanisms through which genetic variants influence disease risk and heterogeneity. Her broader goal is to design statistical and machine learning frameworks that bridge human genetics and experimental biology, advancing mechanistic understanding of disease and realizing the promise of precision medicine. Dr. Nazeen is an International Fulbright Science & Technology Fellow, a Ludwig Center Fellow, and a recipient of the Sudarsky Scholar award from the Movement Disorders Division at Brigham and Women’s Hospital. She earned her Ph.D. in Computer Science from MIT in 2019 under the supervision of Prof. Bonnie Berger. TBD

November 05

October 29

Add to Calendar 2025-10-29 11:30:00 2025-10-29 13:00:00 America/New_York Exploring 100 million years of mammalian evolution for the origins of exceptional traits AbstractThe Zoonomia Project, one of the largest comparative genomics initiatives ever undertaken, compared 240 mammalian species spanning over 100 million years of evolutionary history. This work revealed that at least 11% of the human genome is evolutionarily constrained, and that these constrained bases are more enriched for variants explaining common disease heritability than any other functional annotation. Yet nearly half of the most highly constrained bases remain unannotated in existing datasets, underscoring how much of the genome’s regulatory landscape remains unexplored. Building on this foundation, we are integrating the “common garden” framework from classical ecology with modern genomics to assay and compare cellular responses across diverse mammals. This effort includes RNA-seq and ATAC-seq profiling across 12 species and seven experimental states varying in temperature, oxygen, and glucose levels. We can identify molecular responses shared across mammals and those unique to species with remarkable physiological adaptations—such as camels that thrive in extreme heat, seals that dive deeply without suffering oxygen damage, and bats that tolerate extreme blood sugar fluctuations. Uncovering the genomic mechanisms that enable these exceptional traits may reveal new strategies for improving human health.Speaker BioElinor Karlsson, PhD, is an associate professor in Genomics and Computational Biology at the UMass Chan Medical School, and director of Vertebrate Genomics at the Broad Institute of MIT and Harvard. Her research combines large-scale comparative genomics, new technology and community science to investigate diseases and discover the origins of exceptional mammalian traits.  Dr. Karlsson’s research includes the Zoonomia project, an international effort to compare the genomes of over 240 mammals (from the African Yellow-spotted Rock Hyrax to the Woodland Dormouse) and identify segments of DNA that are important for survival and health. Dr. Karlsson also has a special interest in pet genetics. Her international Darwin’s Ark project invites all dog and cat owners to enroll their dogs in an open data research project exploring the genetic basis of behavior, as well as diseases such as cancer.Elinor received her B.A. in biochemistry/cell biology and her B.F.A. (Bachelor of Fine Arts) from Rice University, and earned her Ph.D. in bioinformatics from Boston University. She was a postdoctoral fellow at Harvard University before starting her research group at UMass Chan in 2014, where she is Dr. Eileen L. Berman and Stanley I. Berman Foundation Chair in Biomedical Research. TBD

October 22

Add to Calendar 2025-10-22 11:30:00 2025-10-22 13:00:00 America/New_York Creating the next generation of genome analysis tools with deep learning AbstractDeep learning is fueling a revolution in genomics, enabling the development of a new generation of analysis tools that offer unprecedented accuracy. This talk presents a suite of deep learning models designed to address fundamental challenges in variant calling and generating high-quality genome assemblies. We begin with DeepVariant, a convolutional neural network that redefined the standard for germline variant calling, and its extension, DeepSomatic, which adapts this technology to the critical task of identifying low-frequency somatic mutations in cancer genomes. Moving from variant analysis to genome construction, we introduce DeepPolisher. This tool leverages a powerful Transformer-based architecture to significantly reduce errors in genome assemblies, providing a more accurate and reliable foundation for downstream research. Finally, we explore the future of variant calling by integrating these methods with emerging pangenome references. We demonstrate how a pangenome-aware approach allows for a more comprehensive survey of human genetic diversity, resolving variation in previously intractable regions of the genome. Together, these tools represent a cohesive framework that is building the next generation of genomic analysis, transforming our ability to accurately read and interpret the code of life.Speaker BioKishwar Shafin is a Research Scientist at Google, where he specializes in developing novel computational methods for accurate genome analysis. His research is at the intersection of genomics, bioinformatics, and deep learning, with a strong focus on building deep learning-based tools that push the boundaries of genome assembly and variant calling. He has been a key developer on several impactful projects, including DeepVariant, a highly accurate germline variant caller, and its extension DeepSomatic for detecting cancer mutations. Dr. Shafin has also played a pivotal role in developing tools for long-read sequencing technologies such as DeepPolisher for improving the quality of genome assemblies. His work has been instrumental in completion of the first telomere-to-telomere assembly of a human genome and the Human Pangenome Reference Consortium, contributing to a more comprehensive and equitable representation of human genetic diversity. TBD

October 15

Add to Calendar 2025-10-15 11:30:00 2025-10-15 13:00:00 America/New_York From Data to Knowledge: Integrating Clinical and Molecular Data for Predictive Medicine AbstractAlzheimer’s disease (AD) remains one of the most pressing medical challenges, with limited therapeutic options and heterogeneous disease trajectories complicating diagnosis and treatment. Recent advances in computational biology and artificial intelligence (AI) together with availability of rich molecular and clinical data, offer new opportunities to address these challenges by integrating molecular, clinical, and systems-level insights. In our recent studies, we developed a cell-type-directed, network-correcting approach to identify and prioritize rational drug combinations for AD, enabling targeted modulation of disease-relevant pathways across distinct cellular contexts (Li et al., Cell 2025). Complementarily, by leveraging large-scale electronic medical records (EMRs) integrated with biological knowledge networks, we demonstrated the ability to predict disease onset and progression while uncovering mechanistic insights into AD heterogeneity (Tang et al., Nature Aging 2024). Together, these complementary approaches illustrate the power of combining real-world clinical data, knowledge networks, and systems pharmacology to advance precision medicine for AD. This work highlights a paradigm shift toward AI-enabled, data-driven strategies that bridge molecular discovery and clinical application, ultimately informing novel therapeutic interventions and improving patient care.Speaker BioMarina is currently a Professor and the Interim Director at the Bakar Computational Health Sciences Institute at UCSF. Prior to that she has worked as a Senior Research Scientist at Pfizer where she focused on developing Precision Medicine strategies in drug discovery. She completed her PhD in Biomedical Informatics at Stanford University. Dr. Sirota’s research experience in translational bioinformatics spans nearly 20 years during which she has co-authored over 170 scientific publications. Her research interests lie in developing computational integrative methods and applying these approaches in the context of disease diagnostics and therapeutics with a special focus on women’s health. The Sirota laboratory is funded by NIA, NLM, NIAMS, Pfizer, March of Dimes, and the Burroughs Wellcome Fund. As a young leader in the field, she has been awarded the AMIA Young Investigator Award in 2017.  She leads the UCSF March of Dimes Prematurity Research Center at UCSF as well as co-directs ENACT, a center to study precision medicine for endometriosis. Dr. Sirota also is the founding director of the AI4ALL program at UCSF, with the goal of introducing high school girls to applications of AI and machine learning in biomedicine. TBD

October 08

Add to Calendar 2025-10-08 11:30:00 2025-10-08 13:00:00 America/New_York AI integrating imaging and genetics to understand human evolution, development, aging, and disease AbstractImaging has been the primary means of diagnosing as well as tracking the progression of many diseases for decades but has largely been collected in isolation. Recently through the advent of large scale biobanks, this rich type of data has become linked with genetic and electronic health care record data at the level of tens of thousands of individuals, providing an unprecedented ability to study the relationship between genotype and phenotype directly in humans. I will discuss our group's work leveraging >1.2M medical images (DXA, MRI, and ultrasound) from ~60,000 individuals across multiple views of the heart, brain, skeleton, liver, and pancreas to provide new insights in 4 different domains of biological science: (a) to understand the evolution of the human skeletal form which underlies our ability to be bipedal; (b) examining the classical question in developmental biology of the genetic basis of left-right symmetry; (c) building biological aging clocks to study mechanisms of age acceleration/deceleration and to identify gene targets to combat aging; (d) multi-modal AI combining imaging, genetics, and metabolics to predict 10-year disease incidence for common complex disease.Speaker BioAfter initially training in Electrical Engineering focusing on computer vision and information theory, Vagheesh did a Masters in Biostatistics under Curtis Huttenhower, and then moved to the University of Cambridge to do a PhD in Genetics with Chris Tyler Smith and Richard Durbin. He returned to Harvard as a postdoc with David Reich and Nick Patterson, and since 2020 he has been an Assistant Professor in the Departments of Integrative Biology as well as Statistics and Data Science at the University of Texas at Austin. TBD

October 01

Add to Calendar 2025-10-01 11:30:00 2025-10-01 13:00:00 America/New_York Furthering our understanding of human genetic variation: the human pangenome reference project second release AbstractHuman genomics has relied on a single reference genome for the last twenty years. This reference genome is a cornerstone of much of what we do in genomics but it can not, by definition, represent the variation present in the human population, and as a reference introduces a pervasive bias into genomic analyses. I will survey our recent efforts, through the Human Pangenome Reference Consortium, to build and use a reference pangenome—a collection of extremely high-quality reference genomes related together by a consensus genome alignment that we intend as a replacement for the reference genome.Speaker BioDr. Benedict Paten is a professor in the department of Biomolecular Engineering at the University of California, Santa Cruz. He is also associate director of the UC Santa Cruz Genomics Institute. He received his Ph.D. in computational biology from the University of Cambridge and the European Molecular Biology Laboratory. Dr. Paten’s work is broadly focused on the growing field of computational genomics. He is involved in a number of large-scale efforts, currently he is a PI of the Human Cell Atlas Data Platform, the NHGRI AnVIL, HuBMAP, GENCODE, and the Human Pangenome Reference Consortium. Through these efforts he is helping to develop methods to further our ability to assay and understand genomes. TBD

September 24

Add to Calendar 2025-09-24 11:30:00 2025-09-24 13:00:00 America/New_York Discovering New Biochemistry from Biological Conflicts AbstractBiological replicators are locked in deeply intertwined genetic conflicts with each other. Using comparative genomics, protein sequence and structure analysis and evolutionary investigations, my lab has uncovered a staggering diversity of molecular armaments and mechanisms regulating their deployment, collectively termed biological conflict systems. These include toxins used in interorganismal interactions and a host of mechanisms involved in self/nonself discrimination, especially in the context of host-selfish element conflicts. Our studies have helped identify shared syntactical features in the organizational logic of biological conflict systems. These principles can be exploited to discover new conflict systems through computational analyses. Further, we find that across the range of biological organization, from intragenomic conflicts to interorganismal conflicts, a circumscribed set of effector protein domain families is deployed, targeting genetic information flow through the Central Dogma, certain membranes, and key molecules like NAD+ and NTPs. This has led to significant advances in discovering new biochemistry of these systems and furnished new biotechnological reagents for genome editing, sequencing and beyond. I’ll discuss this using specific examples of toxins in interorganismal conflict and effectors in antiviral immunity.Speaker BioI obtained my PhD (computational biology) in 1999 from Texas A & M University, though I did most of my dissertation research at the NIH. Resuming my research as a staff scientist at the NLM/NIH in 2000, I started my own lab at the same place at the beginning of 2002. My research encompasses the evolutionary classification of proteins, the prediction of novel biochemical activities and the inference of organismal biology from comparative sequence, structure and genome analysis. My research team and I have made several discoveries, predicting previously unknown enzymatic and ligand interactions of numerous protein domains, novel transcription factors and understanding the interplay between natural selection and structural/genomic constraints in shaping the diversity of protein domains. The fundamental contributions of my lab include the discovery of key proteins participating in RNA biochemistry, protein stability, DNA modification, toxin systems involved in biological conflicts, apoptosis and novel immune mechanisms (e.g., key components of the CRISPR systems) and providing the theoretical framework for their functioning. I have developed the synthetic hypothesis on the role of biological conflicts in shaping biochemical innovation and major evolutionary transitions. Trainees (postdocs and students)from my lab have gone on to become faculty in institutions around the world or serve in the industry. TBD

September 17

Add to Calendar 2025-09-17 11:30:00 2025-09-17 13:00:00 America/New_York Discovering Safe, Effective Drugs via Machine Learning and Simulation of 3D Structure AbstractRecent years have seen dramatic advances in both experimental determination and computational prediction of macromolecular structures. These structures hold great promise for the discovery of highly effective drugs with minimal side effects, but structure-based design of such drugs remains challenging. I will describe recent progress toward this goal, using both atomic-level molecular simulations and machine learning on three-dimensional structures.Speaker BioRon Dror is the Cheriton Family Professor of Computer Science in the Stanford Artificial Intelligence Lab and a professor, by courtesy, of Structural Biology and of Molecular and Cellular Physiology at the Stanford School of Medicine. He leads a research group that uses molecular simulation and machine learning to elucidate biomolecular structure, dynamics, and function, and to guide the development of more effective medicines. He collaborates extensively with experimentalists in both academia and industry. Before moving to Stanford, he served as second-in-command of D. E. Shaw Research, a hundred-person company, having joined as its first hire. Dr. Dror earned a PhD in Electrical Engineering and Computer Science at MIT and an MPhil in Biological Sciences as a Churchill Scholar at the University of Cambridge.This talk is part of the MIT Bioinformatics Seminar Series. TBD