Computational Challenges in Precision Health

To be held on October 18, 2021 (1pm - 4pm EDT) – virtually in conjunction with MICRO 2021

Genomics is at an inflection point and is set to transform precision health over the next decade. It can help detect variants of infectious pathogens quickly, diagnose cancer early, determine genetic disorders, assess disease risks, etc.

Genome sequencing technology has been far outpacing Moore’s law over the last decade. While it cost nearly $3 billion to sequence the first human genome in 2001, just over the last decade, the production cost of sequencing data has plummeted from ten million dollars to one thousand dollars. In a year or so, it is expected to fall below a hundred dollars per genome. And it is not just cost. Oxford Nanopore is now selling palm-sized portable sequencers. Unfortunately, computing solutions have not kept pace with the growth in computing demand from these high-throughput, real-time, and portable sequencing technologies.

This tutorial seeks to highlight these growing computing needs in genomics, and identify opportunities for innovation in computing systems and architectures. It will also cover the landscape of commonly used bioinformatics tools and state-of-the-art architectures. The tutorial will feature keynote presentations from leading researchers that work on computational problems in precision health and seminars by the tutorial organizers.

For more details on common genomics analysis pipelines and their computational bottlenecks, please check out GenomicsBench.

If you are interested in understanding the computational challenges involved in rapid and portable testing of infectious viral diseases, please see our MICRO 2021 paper, SquiggleFilter, which is a hardware accelerator for portable virus detection working directly on raw signals from Oxford Nanopore’s MinION device.

Tutorial Program (tentative)

Time (in EDT) Agenda Speaker
1.00 - 2.00 PM Keynote: The assembly of human genomes Heng Li, Harvard Medical School & Dana-Farber Cancer Institute
2.00 - 2.30 PM Reinventing Variant Calling in Genomics using Deep Learning Ankit Sethia, NVIDIA
2.30 - 2.45 PM Break
2.45 - 3.15 PM OpenOmics: A framework for accelerating digital biology research Sanchit Misra, Intel Labs
3.15 - 3.45 PM Accelerators for Timely Genetic Testing Satish Narayanasamy, U. Michigan
3.45 - 4.00 PM GenomicsBench: A Benchmark Suite for Genomics Arun Subramaniyan, U. Michigan

Speaker: Heng Li, Harvard Medical School and Dana-Farber Cancer Institute

Title: The assembly of human genomes

Abstract: De novo sequence assembly is the ultimate solution to the analysis of genome sequencing data in studying biological mechanisms, human evolution and the diagnostics, prevention and treatment of diseases. In this talk, I will give an overview of the current sequencing technologies, review the frontier of assembly algorithm development and discuss its future direction.

Bio: Dr. Li is an assistant professor of Biomedical Informatics at Harvard Medical School and Dana-Farber Cancer Institute. He studies advanced computational methods and mathematical models to help researchers to understand biology better. He has worked on genomics, population genetics and phylogenetics and led the design of the SAM format, created SAMtools and developed BWA, minimap2 and other widely used computational tools.

Speaker: Ankit Sethia, NVIDIA

Title: Reinventing Variant Calling in Genomics using Deep Learning

Abstract: The promise of human genetics and precision medicine has exploded the number of DNA samples that are analyzed every year. It is common to find studies with cohort sizes of tens to hundreds of thousands of samples. At these scales, even minor inaccuracies can occur in large enough numbers to impact data analyses. In this talk, I will show how deep learning approaches are showing significant improvements over traditional statistical methods in variant calling leading to high accuracies in mutation detection. However, analysis of such large number of samples, combined with the computational complexity of deep learning based variant callers create significant computational burden. In this talk I will also show how GPUs are used to address these computational challenges.

Bio: Dr. Ankit Sethia is leading the development efforts for NVIDIA Clara Parabricks to accelerate critical and popular NGS genomic data analysis. He was the co-founder and CTO at Parabricks, where several popular NGS genomic analyses were accelerated by orders of magnitude. The Parabricks products were used in several large scale population scale projects' data analysis, leading to its acquisition by Nvidia in early 2020. He received his Phd from Dept. of Computer Science and Engineering at the University of Michigan in 2015.

Speaker: Sanchit Misra, Intel Labs

Title: OpenOmics: A framework for accelerating digital biology research

Abstract: We are in the epoch of Digital Biology. Datasets & compute requirements for Digital Biology are poised to dwarf everything else on the planet making them a leading indicator for data-centric workloads. Digital Biology is fueled by the convergence of three revolutions 1) Measurement of biological systems at high resolution resulting in massive multi-modal, multi-scale, unstructured, distributed data, 2) Novel data science (AI and data management) techniques on this data, and 3) Wide-spread cloud use enabling massive public data repositories, large collaborative projects and consortia. The enormous quantities of data being produced requires that biologists have the high performance computing tools available to be able to quickly do the data science experiments. Researchers working in biology aren’t interested in HW or SW. They are interested in a framework – i.e. ; a tightly integrated set of tools – they can use to get their work done. An important criteria for this framework is that it should be open sourced so that anyone can customize it for slight variations in use-cases. We have been working towards developing this framework. In this talk, I will present the first version of OpenOmics – our open sourced high throughput framework for accelerating digital biology research that brings together example reference applications, biological compute motifs, AI and data management to enable productive performance.

Bio: Dr. Sanchit Misra is a senior research scientist and lead the efforts in computational biology/HPC research at Intel Labs. Before joining Intel Labs, he earned his PhD in high performance computational biology from Northwestern University. Dr. Misra has ~15 years of experience in Genomics and machine learning and scaling applications on large clusters/supercomputers, extracting every iota of performance from the hardware and driving hardware improvements. Over the years, he has led Intel’s collaborations on various computational biology projects with domain experts (WGS with Broad institute, Harvard Medical school, Wellcome Sanger institute; transcriptomics with BGI), AI experts (combining HPC and AI for medical research with MILA; Graph Neural Networks with Stanford; applying learned indexes to genomics with MIT), HPC experts (Georgia Tech, Purdue University, Virginia Tech, IIT Bombay and Indian Institute of Science) and architecture experts (University of Michigan).

Speaker: Satish Narayanasamy, University of Michigan

Title: Accelerators for Timely Genetic Testing

Abstract: Sequencing technologies have been far outpacing Moore’s law, leading to over three orders of magnitude improvement in throughput and cost in just the last decade. They are now robust enough to be useful for numerous clinical diagnoses. To realize this potential, however, we need orders of magnitude higher efficiency than general-purpose processors.

This talk covers recent advancements in accelerated computing systems made as part of the Michigan’s Precision Health Initiative. It includes solutions that can enable point-of-care molecular testing for viral strains to intra-operative cancer diagnosis to one-day whole genome sequencing (WGS).

Bio: Satish Narayanasamy is a professor of Computer Science and Engineering at the University of Michigan, Ann Arbor. He works at the intersection of architecture, systems, and program analysis. His current focus is on developing efficient and privacy-preserving computing solutions to enable precision health.

Speaker: Arun Subramaniyan, University of Michigan

Title: GenomicsBench: A Benchmark Suite for Genomics

Abstract: Over the last decade, advances in high-throughput sequencing and the availability of portable sequencers have enabled fast and cheap access to genetic data. For a given sample, sequencers typically output fragments of the DNA in the sample. Depending on the sequencing technology, the fragments range from a length of 150-250 at high accuracy to lengths in few tens of thousands but at much lower accuracy. Sequencing data is now being produced at a rate that far outpaces Moore’s law and poses significant computational challenges on commodity hardware. To meet this demand, software tools have been extensively redesigned and new algorithms and custom hardware have been developed to deal with the diversity in sequencing data. However, a standard set of benchmarks that captures the diverse behaviors of these recent algorithms and can facilitate future architectural exploration is lacking. To that end, we present the GenomicsBench benchmark suite which contains 12 computationally intensive data-parallel kernels drawn from popular bioinformatics software tools. It covers the major steps in short and long-read genome sequence analysis pipelines such as basecalling, sequence mapping, de-novo assembly, variant calling and polishing.

Bio: Arun Subramaniyan is a Ph.D. student at the University of Michigan, advised by Prof. Reetuparna Das. His dissertation research is on developing efficient algorithms and customized computing systems for pattern matching, specifically focusing on applications in precision health. He is also interested in in-/near-memory computing architectures and hardware reliability.


Reetuparna Das (U.Michigan)

Sanchit Misra (Intel Labs)

Satish Narayanasamy (U.Michigan)