Machine Learning for Detecting Code Bugs

Source: Deep Learning on Medium


How Facebook’s Project Infer Identifies Bugs in Mobile Apps Before Deployment

Just a few days ago, a team of Facebook engineers received the ACM SIGPLAN Most Influential POPL Paper Award which is one of most the covered awards in the machine learning research community. The award was based on the paper “Compositional Shape Analysis by Means of Bi-abduction”, which describes some of the science behind one of my favorite machine learning applications of recent years: Project Infer. The goal of Project Infer seems extracted from an sci-fi movie: detecting bugs in mobile app code before it ships.

Bugs in mobile apps are very costly. Discovering an error after a mobile app has been distributed to thousands of mobile devices is the nightmare facing any mobile developer. Traditional mobile app testing focuses on recreating known scenarios but there are many combinations of code that can trigger error conditions that often go undetected by traditional testing methods. Project Infer scans the code of mobile apps and detects possible error conditions that are then reviewed by developers. Since its open source released three years ago, Project Infer has been by many of the world’s top companies like Amazon Web Services, Spotify or Uber.

How Project Infer Works?

Project Infer uses inference logic to reason about potential executions of a software program. Any decently large mobile app can include hundreds of millions of combinations that can trigger error conditions which makes traditional code analysis routines impossible to scale. Facebook’s Infer builds incremental knowledge about an application increasing its efficiency throughout the development lifecycle.

At a high level, the Facebook Infer workflow can be divided into two main phases: capture and analyze and its lifecycle also into two main segments: global and differential.

In the capture phase, Infer uses compilation commands to translates the files that need to be analyzed into its own internal intermediate language. The analysis phase explores each function for possible error conditions. If Infer encounters an error when analyzing a method or function, it stops there for that method or function, but will continue the analysis of other methods and functions. The following animation shows Facebook’s Infer capture and analyze phases in action.

From the execution standpoint, Facebook’s Infer can be used in two modalities: Global and Differential. The Global workflow takes place when Infer analyzes all files in a given project. For a project that compiles using Gradle, the following syntax will be sued to run Infer’s global workflow:

infer run -- gradle build

The differential workflow is used to apply Infer in incremental build systems which are common in mobile apps. In that scenario, we first need to run Infer’s capture to obtain all compilation commands and then run Infer to only analyze the code changes using the –-reactive parameter.

gradle clean
infer capture -- gradle build
edit some/File.java
# make some changes to some/File.java
infer run --reactive -- gradle build

The reports produced by Infer can be explored using the InferTraceBugs command.

infer run -- gradle build
inferTraceBugs

The Science Behind Project Infer

Facebook’s Infer is based on two novel mathematical techniques: separation logic and bi-abduction.

Separation logic is used to allow Infer to analysis to reason about small, independent parts of the application storage, rather than having to consider the entirety of the memory potentially at every step. Mathematically, separation logic reasons about mutations to computer memory. The core element of separation logic is the operator, called separating conjunction and pronounced “and separately”. For instance, the formula xyyx can be read “x points to y and separately y points to x” which is analogous to how memory pointers work. Separation logic provides the foundation for reasoning about a computer program as most inference logic about computer functions tends to work by updating *∗-conjuncts in-place, mimicking the operational in-place update of RAM.

Bi-abduction is a form of logical inference for separation logic which automates the key ideas about local reasoning. In the context of Infer, Bi-abduction can be seen as a logical inference technique that allows the framework to discover properties about the behavior of independent parts of the application code. Bi-abduction displays abduction as a kind of inverse to the frame problem: it jointly infers anti-frames (missing portions of state) and frames (portions of state not touched by an operation), and is the basis of a new inter-procedural analysis algorithm. Mathematically, a Bi-abduction problem is express using the following syntax: A∗?antiframeB∗?frame. The goal is to discover a pair of frame and anti-frame formulae that make the entailment statement valid. In the context of Facebook’s Infer, bi-abduction provides a way to infer a pre/post specs from bare code, as long as we know specs for the primitives at the base level of the code.

Facebook’s Infer has been the result of years of research in different machine learning areas. The efforts have produced some marquee research papers that provide groundbreaking contributions to inference logic and other areas of machine learning. Let’s explore some of those key papers:

· Compositional Shape Analysis by means of Bi-Abduction: The recipient of the prestigious ACM SIGPLAN Most Influential POPL Paper Award, this research paper introduces the principles of compositional shape analysis. Based on Bi-abduction, compositional shape analysis is a form of analysis in which procedure is analyzed independently of its callers. Compositional shape analysis extends traditional shape analysis to apply it to the analysis of computer programs. The idea is to be able to analyze a program by effectively analyzing its parts and inferring missing parts.

· A Local Shape Analysis Based on Separation Logic: This paper introduces separation logic as a mechanism to describe program analyses. The main concept illustrated in the paper is that we are able to analyze the data structures in a memory heap without understanding the entire heap itself but individual cells in it. For instance, we can infer that specific cells make a linked list without analyzing the entire heap.

· Smallfoot: Modular Automatic Assertion Checking with Separation Logic: This paper introduces Smallfoot, a predecessor of Facebook’s Infer that uses separation logic for inferring lightweight data structures in software programs.

· AL: A new declarative language for detecting bugs with Infer: AL provides the key extensibility model for Facebook’s Infer. Conceptually, AL allows any engineer to design new checkers without any knowledge of Infer’s inner workings. AL is a declarative language that enables the reasoning of syntax trees in a simple and interactive manner.

· Moving Fast with Software Verification: Finally, the paper that describes how Facebook uses Project Infer internally. The paper describes how Facebook’s engineers integrated Infer into their development process to power static analysis for mobile apps like Instagram, Facebook Messenger and the Facebook apps for Android and iOS.

Facebook’s Infer is one of the first and most practical applications of machine learning to the writing of software programs. While today Infer is mostly constrained to mobile apps, many of its principles apply to general purpose applications. Who knows? In the future, we might have a version of Infer for machine learning programs.