Corpus Distillation Techniques for Effective Fuzzing: A Comprehensive Evaluation (NJR 2019)

Write a Blog >>

Sun 20 - Fri 25 October 2019 Athens, Greece

Track

NJR 2019

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 21 Oct 2019 16:30 - 17:00 at Room 2A - Session 4

Abstract

Mutation-based fuzzing typically uses an initial set of non-crashing seed inputs (a corpus) from which to generate new inputs by random mutation. A given corpus of potential seeds will often contain thousands of similar inputs. This lack of diversity can lead to wasted fuzzing effort, as the fuzzer will exhaustively explore mutation from all available seeds. To address this, industrial-strength fuzzers such as American Fuzzy Lop (AFL) come with distillation tools (e.g., afl-cmin) that automatically select seeds as the smallest subset of a given corpus that triggers the same range of instrumentation data points as the full corpus. Common practice suggests that minimizing both the number and cumulative size of the seeds may lead to more efficient fuzzing, which we explore systematically here.

We present the results of over 27 CPU-years of fuzzing with eight alternative distillation techniques to understand the impact of corpus distillation on finding bugs in real-world software. Inspired by previous work—in particular, the MINSET technique—we devise a new corpus distillation technique based on a near-optimal solution to the set cover problem. Our technique, MoonLight, delivers smaller corpora—from a factor of three up to two orders of magnitude—compared to afl-cmin, the industrial standard. Furthermore, we show that afl-cmin is comparatively limited in finding bugs.

In contrast to previous work, we conduct rigorous experimental evaluation of MoonLight, comparing it to state-of-the-art techniques (including afl-cmin and MINSET) on long fuzzing campaigns. We target a diverse set of six common open-source libraries and programs, covering seven different input file formats, and show that distillation is a necessary precursor to any fuzzing campaign when starting with a large initial corpus. Notably, we find that neither MoonLight nor MINSET finds all of the 33 bugs revealed by our extensive fuzzing campaigns. Each technique appears to have its own strengths while also producing smaller corpora than afl-cmin. We find (and report) new bugs with MoonLight that are not found by MINSET, while MINSET also finds some bugs that MoonLight is unable to discover. Afl-cmin fails to reveal many of these bugs. Of the 33 bugs revealed by our campaigns seven new bugs have received CVEs.

Bio

I am a professor of computer science the Australian National University, contributing also as a researcher with Data61 (formerly NICTA). I previously spent 22 years on the faculty at Purdue University. I studied computer science at the University of Adelaide, the University of Waikato, and the University of Massachusetts at Ahmerst, receiving BSc, MSc, and PhD degrees, respectively. My research interests lie in the area of programming language implementation, and I work on problems arising in object persistence, object databases, distribution, memory management (garbage collection), managed language runtimes, language virtual machines, optimizing compilers, and architectural support for programming languages and applications.

I am a Life Member of the Association for Computing Machinery and a Member of the IEEE. I was named a Distinguished Scientist of the ACM in 2012.

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 21 Oct
Displayed time zone: Beirut change

16:00 - 17:30	Session 4NJR at Room 2A

16:00 30m Talk		NAB: Automated Large-scale Multi-language Dynamic Program Analysis in Public Code Repositories NJR Andrea Rosà University of Lugano, Switzerland
16:30 30m Talk		Corpus Distillation Techniques for Effective Fuzzing: A Comprehensive Evaluation NJR Tony Hosking Australian National University / Data61
17:00 30m Talk		MadMax and Friends: Program Analysis for Smart Contracts NJR Neville Grech University of Athens

Corpus Distillation Techniques for Effective Fuzzing: A Comprehensive Evaluation

Mon 21 Oct
Displayed time zone: Beirut change

Tony Hosking

Australian National University / Data61

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia

Corpus Distillation Techniques for Effective Fuzzing: A Comprehensive Evaluation

Program Display Configuration

Program Display Configuration

Mon 21 OctDisplayed time zone: Beirut change

Tony Hosking

Australian National University / Data61

Mon 21 Oct
Displayed time zone: Beirut change