9/7/2023 |
Sagi Snir |
University of Haifa |
Title: Assembling the Tree of Life in Light of Conflicting Signals
Abstract: The genomic era has reached the point where tasks that seemed imaginary only a decade ago are now within reach. One such task, that is among the most fundamentals in biology, is the inference of the evolutionary history for tens of thousands of species, sometimes of very close origin. Such a history is depicted in a tree structure and is called a phylogeny or a phylogenetic tree. In prokaryotes however, the validity of such a tree has been questioned significantly. This is primarily due to the phenomenon of horizontal gene transfer (HGT), the transfer of genetic material not through lineal descent. HGT links between distantly related species, obfuscating the vertical signal of evolution. Quartets, trees over four leaves, are the minimal informational phylogenetic unit and stand at the heart of almost any phylogenetic task, in particular when conflicting signals arise. Besides, and perhaps due to their rudimentary role, quartet-based phylogenetic approaches offer combinatorially/statistically clean problems yet highly relevant to topical questions in evolution. In a series of works we have developed a graph theoretically based approach for the quartets amalgamation task. Our approach is based on a divide and conquer algorithm, where the divide step is based on solving a MaxCut in a special graph we construct. Besides providing a very fast and accurate heuristic for phylogenetics, the cleanliness of this approach has also advanced a more than thirty years open problem of quartets’ compatibility. Additionally, I will survey HGT geared quartets approaches that provide provably guarantees to popular approaches in genomics and metagenomics. |
Iribe Room 4105 |
10/05/2023 |
Hari Subrahmaniam Muralidharan (Pop Lab) |
UMD |
(RIPS talk) The impact of transitive annotation on the training of taxonomic classifiers
Abstract: A common task in the analysis of microbial communities involves assigning taxonomic labels to the sequences derived from organisms found in the communities. Frequently, such labels are assigned using machine learning algorithms that are trained to recognize individual taxonomic groups based on training data sets that comprise sequences with known taxonomic labels. Ideally, the training data should rely on labels that are experimentally verified—formal taxonomic labels require knowledge of physical and biochemical properties of organisms that cannot be directly inferred from sequence alone. However, the labels associated with sequences in biological databases are most commonly computational predictions which themselves may rely on computationally-generated data—a process commonly referred to as “transitive annotation”. In this manuscript we explore the implications of training a machine learning classifier (the Ribosomal Database Project’s Bayesian classifier in our case) on data that itself has been computationally generated. We demonstrate that even a few computationally-generated training data points can significantly skew the output of the classifier to the point where entire regions of the taxonomic space can be disturbed. |
Iribe Room 4105 |
11/09/2023 |
Jason Ernst |
University of California, Los Angeles |
Title: Computational Approaches for Analyzing the Epigenome and Non-coding Genome
Abstract: Large-scale collections of epigenomic data are a powerful source of information to annotate and understand genomes including the vast non-coding portions of them. However, to make full use of such data requires the development of novel computational methods. In this talk I will give an overview of some computational methods that we have developed for modeling and analyzing epigenomic data including ChromHMM for chromatin state modeling, ChromImpute for epigenome imputation, ChromGene for gene based epigenome annotations, and LECIF for scoring evidence of conservation at the functional genomics level. |
Iribe Room 4105 |