SPLASH 2019 - OOPSLA Artifacts

A Case Study on Artifact Evaluation for OOPSLA 2019

Chair’s Report

Thank you to the authors for submitting an excellent crop of artifacts to accompany their OOPSLA’19 papers, and to the 28 young researchers who served on the evauluation committee.

Our task was to assess the overall quality of artifacts. Three outcomes were possible from reviewing:

Functional: The artifact was judged to reproduce the paper’s claims. Generally this was all of the paper’s results, special accommodations were made in some cases when there were good valid reasons for non-reproducibility.
Reusable: A subset of Functional artifacts were further judged to be Reusable. These artifacts were particularly well-packaged, well-designed, and/or well-documented, in a way that reviewers felt gave a particularly solid starting point for future researchers to build on.
Rejected: Papers that were not Functional were rejected. This should not be interpreted as casting doubt on the results of the accompanying paper, but merely as our inability to reproduce all sought-after results.

The artifact evaluation deadline this year was roughly a week after the notifications of Phase 1 outcomes for OOPSLA Research Papers. Authors were asked to submit artifacts in one of 2 formats:

A compressed archive, with md5 hash at submission, uploaded to a hosting site that does not permit the authors from seeing information about who accesses the artifact.
A pointer to a public Github/Bitbucket/Gitlab/etc. repository, with the hash of the relevant commit.

These formats permitted a range of submisison formats, including source code, virtual machine images, or container images. Docker images built from a source checkout were a popular addition this year. One set of authors submitted an artifact via DockerHub (accompanied by the sha256 hash Docker uses to compare image versions) after consultation with the chairs.

Reviewing was split into two rounds. The “Kick the Tires” round had reviewers follow author-provided instructions to validate the artifacts could compile, run and execute — without investigating their results. The goal was to quickly identify any “silly” setup issues. Authors were given a 4 day window to exchange comments with reviewers to work through any issues and even submit small corrections to artifacts or instructions. After that, the reviewers continued with the remaining author instructions to attempt to reproduce results, as well as evaluating whether all results that should be reproducible were included. During the Kick the Tires round, one set of authors who had previously contacted the AEC Chairs about special hardware requirements were permitted to switch from their original submission to an author-provided Amazon EC2 instance with the specialized hardware, to which reviewers connected.

During the second phase, reviewers focused on evaluating:

Whether the submitted artifact could reproduce the relevant claims from the paper, as identified by authors.
Whether the supported claims were actually adequate to say the paper’s results were reproduced. Any omissions needed a good reason, typically technical or legal encumbrances.
For Reusability, whether the artifact formed a strong, as opposed to minimal, starting point for other groups to build upon the work. Standards applied were adjusted based on the sort of artifact (e.g., the fact that many people claim no Coq proofs are reusable was not held against Coq proofs).

Results

We received 44 submissions, of which 9 were rejected, 17 were found to be Functional, and 18 were judged both Functional and Reusable. The rejected artifacts typically ran afoul of poor packaging issues that prevented artifacts from working (even after communication during the Kick-the-Tires round), or omitted the main benchmarks from the paper (without explanation), or omitted the ability to reproduce comparative results against baselines (e.g., tool X is Y% more accurate than tool Z on these benchmarks) when they were not simply reusing baseline numbers from earlier papers. None of the rejections cast doubt on the results of the accompanying papers.

Availability

Authors of artifacts found Functional or Reusable were given the option to apply for an Availabie badge, for making their artifact available publicly in a reliable (roughly, archival) location. This year our approach for this was simple: we asked authors interested in the availability badge to upload a version of their artifact to Zenodo, a service run in part by CERN for the archiving of scientific data sets and software. There is no cost to the authors, and every artifact is given its own unique DOI for archival purposes. Authors were instructed to upload exactly the version of the artifact that was reviewed by the AEC (plus a README and LICENSE). Similar to arXiv, Zenodo supports versioning, so while the version used for the Availability badge was exactly that reviewed, authors are free to upload improved versions, and viewing the page for the reviewed version will include indications of an update. Some authors intend to use this to improve packaging, directions, etc., following reviewer suggestions.

This year, out of the 35 accepted (Functional or Reusable) artifacts, 33 archived the reviewed version on Zenodo and consequently were awarded the Availability badge.

Distinguished Artifacts

4 Distinguished artifacts were chosen by the AEC Chairs, based on nominations from the AEC. Artifacts with a chair as co-author were not eligible.

The chairs reread the reviews of nominated artifacts, and based on those contents selected the distinguished artifacts:

A Path to DOT: Formalizing Fully Path-Dependent Types
Safer Smart Contract Programming with Scilla
Leveraging Rust Types for Modular Specification and Verification
Generating a Fluent API with Syntax Checking from an LR Grammar

Distinguished Reviewers

The chairs have also chosen 5 Distinguished Reviewers, based on the high quality of their reviews and participation in online discussion for artifacts.

Jyothi Vedurada
Fabian Muehlboeck
Simon Fowler
Anthony Canino
Gabriel Radanne

Suggestions for Future Years

(1) A common problem was difficulty running on sufficiently powerful hardware. Reviewers used their primary machines with limited resources. We recommend advertising the option for authors to pay for cloud resources, though formalizing mechanisms for reviewer anonymity. This would address difficulties with insufficient RAM, disk space, lack of GPU hardware, and other limitations encountered by this year’s reviewers. (2) While this is a large ask for authors, anecdotally some of the “easiest” artifacts to get running were those where the authors had incorporated Docker into their development workflow; they simply submitted their source repository, and reviewers got exactly the environment used by the authors for experiments. Short of this, we recommend authors actually follow their instructions to reviewers before submitting artifacts: common problems like files being missing or having the wrong permissions are discovered almost immediately by reviewers, but limits the Kick-the-Tires feedback to these relatively minor issues. Several artifacts were rejected because the authors’ instructions for completely reproducing results simply didn’t work: e.g., a subset of experiments ran, but other experiments crashed or failed with errors in the same way for all reviewers. (3) One recurring problem was author confusion about the bar for artifacts being complete as specified in the call for artifacts. Some authors for example omitted most of the benchmarks from the actual paper intentionally, under the rationale that the experiments might take days or weeks to execute, but the result was that the AEC was not able to affirmatively reproduce any of the results. Another variant of this was the omission of baselines when a paper makes comparative claims against another (open source) tool. In this case, the tool used in a paper’s evaluation was often not the exact version used in prior work, and often not on the exact benchmark sets of previous work. In such situations, the baseline results obtained for the paper were new measurements made for the paper, and inability to reproduce those baselines was interpreted as grounds for lack of a Functional designation.

Help others to build upon your contributions!

The Artifact Evaluation process is a service provided by the community to help authors provide useful supplements to their papers so future researchers can build on previous work. Authors of accepted OOPSLA papers are invited to submit an artifact that supports the conclusions of their paper. The AEC will read explore the artifact to give feedback about how well it supports the paper and how easy it is to use. Submission is voluntary. Papers that go through the Artifact Evaluation process receive a seal of approval. Authors of papers with accepted artifacts are encouraged to make these materials publicly available by including them as “source materials” in the ACM Digital Library

Artifacts

	Title
	A Formalization of Java's Concurrent Access Modes OOPSLA Artifacts John Bender, Jens Palsberg DOI
	A Path to DOT: Formalizing Fully Path-Dependent Types OOPSLA Artifacts Marianna Rapoport, Ondřej Lhoták DOI
	Asphalion: Trustworthy Shielding Against Byzantine Faults OOPSLA Artifacts Vincent Rahli, Ivana Vukotic, Paulo Esteves-Veríssimo DOI
	Automatic and Scalable Detection of Logical Errors in Functional Programming Assignments OOPSLA Artifacts Dowon Song, Myungho Lee, Hakjoo Oh DOI
	Automatic Repair of Regular Expressions OOPSLA Artifacts Rong Pan, Qinheping Hu, Gaowei Xu, Loris D'Antoni DOI
	Casting about in the Dark OOPSLA Artifacts Luis Mastrangelo, Matthias Hauswirth, Nate Nystrom DOI
	Certifying Graph-Manipulating C Programs via Localizations within Data Structures OOPSLA Artifacts Shengyi Wang, Qinxiang Cao, Anshuman Mohan, Aquinas Hobor DOI
	Compiler Fuzzing: How Much Does It Matter? OOPSLA Artifacts Michaël Marcozzi, Qiyi Tang, Alastair F. Donaldson, Cristian Cadar DOI Media Attached
	Design, Implementation, and Application of GPU-based Java Bytecode Interpreters OOPSLA Artifacts Ahmet Celik, Pengyu Nie, Chris Rossbach, Milos Gligoric
	Duet: An Expressive Higher-order Language and Linear Type System for Statically Enforcing Differential Privacy OOPSLA Artifacts Joseph P. Near, David Darais, Chike Abuah, Tim Stevens, Pranav Gaddamadugu, Lun Wang, Neel Somani, Mu Zhang, Nikhil Sharma, Alex Shan, Dawn Song DOI
	Effective Lock Handling In Stateless Model Checking OOPSLA Artifacts Michalis Kokologiannakis, Azalea Raad, Viktor Vafeiadis DOI
	FuzzFactory: Domain-Specific Fuzzing with Waypoints OOPSLA Artifacts Rohan Padhye, Caroline Lemieux, Koushik Sen, Laurent Simon, Hayawardh Vijayakumar DOI
	Generating a fluent API with syntax checking from an LR grammar OOPSLA Artifacts Tetsuro Yamazaki, Tomoki Nakamaru, Kazuhiro Ichikawa, Shigeru Chiba DOI
	Leveraging Rust Types for Modular Specification and Verification OOPSLA Artifacts Vytautas Astrauskas, Peter Müller, Federico Poli, Alexander J. Summers DOI
	Modular Verification for Almost-Sure Termination of Probabilistic Programs OOPSLA Artifacts Mingzhang Huang, Hongfei Fu, Krishnendu Chatterjee, Amir Kafshdar Goharshady DOI
	Modular Verification of Heap Reachability Properties in Separation Logic OOPSLA Artifacts Arshavir Ter-Gabrielyan, Alexander J. Summers, Peter Müller Link to publication DOI File Attached
	On the Complexity of Checking Transactional Consistency OOPSLA Artifacts Ranadeep Biswas, Constantin Enea DOI
	On The Design, Implementation and Use of Laziness in R OOPSLA Artifacts Aviral Goel, Jan Vitek DOI
	On the Fly Synthesis of Edit Suggestions OOPSLA Artifacts Anders Miltner, Sumit Gulwani, Vu Le, Alan Leung, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari, Abhishek Udupa
	Optimal Stateless Model Checking for Reads-from Equivalence under Sequential Consistency OOPSLA Artifacts Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, Magnus Lång, Tuan Phong Ngo, Konstantinos (Kostis) Sagonas DOI
	Optimization of Swift Protocols OOPSLA Artifacts Raj Barik, Manu Sridharan, Murali Krishna Ramanathan, Milind Chabbi DOI
	Precision-Preserving Yet Fast Object-Sensitive Pointer Analysis with Partial Context Sensitivity OOPSLA Artifacts Jingbo Lu, Jingling Xue DOI
	Program Synthesis with Algebraic Library Specifications OOPSLA Artifacts Benjamin Mariano, Josh Reese, Siyuan Xu, ThanhVu Nguyen, Xiaokang Qiu, Jeffrey S. Foster, Armando Solar-Lezama DOI
	Qubit Allocation as a Combination of Subgraph Isomorphism and Token Swapping OOPSLA Artifacts Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Caroline Collange, Fernando Magno Quintão Pereira DOI
	Refinement Kinds: Type-safe Programming with Practical Type-level Computation OOPSLA Artifacts Luís Caires, Bernardo Toninho DOI
	Reliable and Fast DWARF-based Unwinding OOPSLA Artifacts Théophile Bastian, Francesco Zappa Nardelli, Stephen Kell DOI
	Ryū Revisited: Printf Floating Point Conversion OOPSLA Artifacts Ulf Adams DOI
	Safer Smart Contract Programming with Scilla OOPSLA Artifacts Ilya Sergey, Vaivaswatha Nagaraj, Jacob Johannsen, Amrit Kumar, Anton Trunov, Ken Chan DOI
	Scala Implicits are Everywhere OOPSLA Artifacts Filip Křikava, Jan Vitek, Heather Miller DOI
	Seq: A High-Performance Language for Bioinformatics OOPSLA Artifacts Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Saman Amarasinghe, Bonnie Berger DOI
	Sound and Reusable Components for Abstract Interpretation OOPSLA Artifacts Sven Keidel, Sebastian Erdweg DOI
	Specification and Inference of Trace Refinement Relations OOPSLA Artifacts Timos Antonopoulos, Eric Koskinen, Ton Chanh Le DOI
	Specifying Concurrent Programs in Separation Logic: Morphisms and Simulations OOPSLA Artifacts Aleksandar Nanevski, Anindya Banerjee, Germán Andrés Delbianco, Ignacio Fábregas DOI
	Staged Abstract Interpreters OOPSLA Artifacts Guannan Wei, Yuxuan Chen, Tiark Rompf DOI
	TLA+ model checking made symbolic OOPSLA Artifacts Igor Konnov, Jure Kukovec, Thanh-Hai Tran DOI

Call for Artifacts

This process was inspired by the ECOOP 2013 AEC by Jan Vitek, Erik Ernst, and Shriram Krishnamurthi.

Selection Criteria

The artifact is evaluated in relation to the expectations set by the paper. Thus, in addition to running the artifact, evaluators will read the paper and may try to tweak inputs or otherwise slightly generalize the use of the artifact in order to test the artifact’s limits.

Artifacts should be:

consistent with the paper,
as complete as possible,
well documented, and
easy to reuse, facilitating further research.

The AEC strives to place itself in the shoes of such future researchers and then to ask: how much would this artifact have helped me?

Submission Process

If your paper makes it past Round 1 of the review process, you may submit an artifact consisting of three pieces:

an overview,
a URL pointing to either: a single file (recommended), or the address of a public repository
a hash certifying the version of the artifact at submission time: either an md5 hash of the single file file (use the md5 or md5sum command-line tool to generate it), or the full commit hash (e.g., from git reflog --no-abbrev)

The URL must be a Google Drive, Dropbox, Github, Bitbucket, or (public) Gitlab URL.

Reproducibility Non-Profit

NEW This year we will be collaborating with a non-profit, Accelerate Publishing, whose goal is to better support several aspects of academic publishing, in particular reproducibility via creation of artifacts. During artifact submission, authors will be asked whether they are willing to share their artifact with the non-profit, and permit the non-profit to see discussions of their artifact during the review process. The goals of permitting the non-profit access to these discussions and artifacts is to give them an overview of the sources of problems in the creation of reproducible artifacts, in order to focus efforts to make artifact creation easier, less time consuming, and less error-prone. Authors do not need share their artifact* or PC discussions with Accelerate Publishing, and opting against sharing will not influence the AEC’s decision.

Artifact Overview

Your overview should consist of two parts:

a Getting Started Guide and
Step-by-Step Instructions for how you propose to evaluate your artifact (with appropriate connections to relevant sections of your paper);

The Getting Started Guide should contain setup instructions (including, e.g., a pointer to the VM player software, its version, passwords if needed) and basic testing of your artifact that you expect a reviewer to be able to complete in 30 minutes. Reviewers will follow all the steps in the guide during an initial kick-the-tires phase. The Guide should be as simple as possible. It should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

The Step by Step Instructions explain how to reproduce any experiments or other activities that support your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out and explain how to run it on smaller inputs. Where appropriate, include descriptions of and links to files that represent expected outputs (e.g., the log files expected to be generated by your tool on the given inputs); if there are warnings that are safe to be ignored, explain which ones they are.

The artifact’s documentation should include the following:

A list of claims from the paper supported by the artifact, and how/why.
A list of claims from the paper not supported by the artifact, and how/why.

Example: Performance claims cannot be reproduced in VM, authors are not allowed to redistribute specific benchmarks, etc. Artifact reviewers can then center their reviews / evaluation around these specific claims.

Packaging the Artifact

When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members will have a limited time in which to make an assessment of each artifact.

Your artifact can contain a bootable virtual machine image with all of the necessary libraries installed. Using a virtual machine provides a way to make an easily reproducible environment — it is less susceptible to bit rot. It also helps the AEC have confidence that errors or other problems cannot cause harm to their machines. This is recommended.

Submitting source code that must be compiled is permissible. A more automated and/or portable build — such as a Docker file or a build tool that manages all compilation and dependencies (e.g., maven, gradle, etc.) — improves the odds the AEC will not be stuck getting different versions of packages working (particularly different releases of programming languages).

You should make your artifact available as a single archive file and use the naming convention <paper #>.<suffix>, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents.

Artifacts do not have to be anonymous.

COI

Conflict of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the two AEC chairs are handled by the other AEC chair or the PC of the conference if both chairs are conflicted. To be validated, artifacts must be unambiguously accepted and may not be considered for the distinguished artifact.

Contact

Please contact Colin and Jan if you have any questions.

OOPSLA ArtifactsSPLASH 2019

Artifacts

Call for Artifacts

Colin GordonChair

Drexel University

United States

Jan VitekChair

Northeastern University

United States

Arash Alavi

University of California, Riverside

United States

Cheng Cai

University of California, Los Angeles (UCLA)

Anthony Canino

SUNY Binghamton

United States

Guido Chari

Czech Technical University

Czechia

Erin Dahlgren

Accelerate Publishing

United States

Lukas Diekmann

King's College London

United Kingdom

Jack Feser

MIT CSAIL

Simon Fowler

The University of Edinburgh

United Kingdom

Hannah Gommerstadt

Vassar College

Sehun Jeong

Korea University, South Korea

Korea, South

Sungho Lee

KAIST, South Korea

South Korea

Yue Li

Aarhus University, Denmark

China

Julian Mackay

Victoria University of Wellington

New Zealand

Fabian Muehlboeck

IST Austria

Austria

Wytse Oortwijn

University of Twente

Netherlands

Jesper Oqvist

Lund University

Saswat Padhi

University of California, Los Angeles

United States

Jihyeok Park

KAIST, South Korea

South Korea

Junqiao Qiu

University of California, Riverside

Gabriel Radanne

University of Freiburg, Germany

France

John Sarracino

University of California, San Diego

Quentin Stiévenart

Vrije Universiteit Brussel, Belgium

Belgium

Janwillem Swalens

Vrije Universiteit Brussel, Belgium

Belgium

Tian Tan

Aarhus University, Denmark

China

Qiyi Tang

Imperial College London

United Kingdom

Matías Toro

University of Chile