Write a Blog >>
SPLASH 2019
Sun 20 - Fri 25 October 2019 Athens, Greece
Thu 24 Oct 2019 14:00 - 14:30 at Room 1 - DSLs and Parsing Chair(s): Eric Van Wyk

The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100—a factor of over $10^6$—and the amount of data to be analyzed has increased proportionally. Yet, as Moore's Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines.

Here, we introduce \textbf{Seq}, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python—and is in many cases a drop-in replacement—yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, $k$-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160$\times$ improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650$\times$ improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2$\times$ improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.

Thu 24 Oct

Displayed time zone: Beirut change

14:00 - 15:30
DSLs and Parsing OOPSLA at Room 1
Chair(s): Eric Van Wyk University of Minnesota, USA
14:00
30m
Talk
Seq: A High-Performance Language for Bioinformatics
OOPSLA
DOI
14:30
30m
Talk
Generating a Fluent API with Syntax Checking from an LR Grammar
OOPSLA
Tetsuro Yamazaki Graduate School of Information Science and Technology, The University of Tokyo, Tomoki Nakamaru Graduate School of Information Science and Technology, The University of Tokyo, Kazuhiro Ichikawa Graduate School of Information Science and Technology, The University of Tokyo, Shigeru Chiba Graduate School of Information Science and Technology, The University of Tokyo
DOI
15:00
30m
Talk
Derivative Grammars: A Symbolic Approach to Parsing with Derivatives
OOPSLA
Ian Henriksen The University of Texas at Austin, Gianfranco Bilardi University of Padova, Italy, Keshav Pingali The University of Texas at Austin
DOI