Introducing Uber’s Ludwig

Source: Deep Learning on Medium


A New Toolbox for Training Deep Learning Models Without Writing Any Code

Uber continues its spree of deep learning technology releases. Since last year, the Uber AI Labs team has open sourced different frameworks that enable many of the fundamental building blocks of deep learning solutions. The productivity of the Uber engineering team is nothing short of impressive: Pyro is a framework for probabilistic programming built on top of PyTorch, Horovod is a Tensor-Flow based framework for distributed learning, Manifold focused on visual debugging and interpretability and, of course, Michelangelo is a reference architecture for large scale machine learning solutions. The latest creation of Uber AI Labs is Ludwig, a toolbox for training deep learning models without writing any code.

Training is one of the most developer intensive aspects of deep learning applications. Typically, data scientists spend numerous hours experimenting with different deep learning models to better perform about a specific training datasets. This process involves more than just training including several other aspects such as model comparison, evaluation, workload distribution and many others. Given its highly technical nature, the training of deep learning models is an activity typically constrained to data scientists and machine learning experts and includes a significant volume of code. While this problem can be generalized for any machine learning solution it has gotten way worse in deep learning architectures as they typically involve many layers and levels. Simplifying the training processes is the number one factor that can streamline the experimentation phase in deep learning solutions.

Enter Ludwig

Ludwig is a TensorFlow based toolbox that allows to train and test deep learning models without the need to write code. Incubated at Uber for the last two years, Ludwig was finally open sourced to incorporate the contributions of the data science community. Conceptually, Ludwig was created under five fundamental principles:

  • No coding required: no coding skills are required to train a model and use it for obtaining predictions.
  • Generality: a new data type-based approach to deep learning model design that makes the tool usable across many different use cases.
  • Flexibility: experienced users have extensive control over model building and training, while newcomers will find it easy to use.
  • Extensibility: easy to add new model architecture and new feature data types.
  • Understandability: deep learning model internals are often considered black boxes, but we provide standard visualizations to understand their performance and compare their predictions.

Using Ludwig, a data scientist can train a deep learning model by simply providing a CSV file that contains the training data as well as a YAML file with the inputs and outputs of the model. Using those two data points, Ludwig performs a multi-task learning routine to predict all outputs simultaneously and evaluate the results. Under the covers, Ludwig provides a series of deep learning models that are constantly evaluated and can be combined in a final architecture. The Uber engineering team explains this process by using the following analogy: “if deep learning libraries provide the building blocks to make your building, Ludwig provides the buildings to make your city, and you can chose among the available buildings or add your own building to the set of available ones.”

The main innovation behind Ludwig is based on the idea of data-type specific encoders and decoders. Ludwig uses specific encoders and decoders for any given data type supported. Like in other deep learning architectures, encoders are responsible for mapping raw data to tensors while decoders map tensors to outputs. The architecture of Ludwig also includes the concept of a combiner which is a component that combine the tensors from all input encoders, process them, and return the tensors to be used for the output decoders.

The flexible encoder-decoder architecture of Ludwig allows even non-experience data scientists to train incredibly sophisticated models. For instance, for a given natural language processing scenario, Ludwig can use a convolutional neural network(CNN) as an encoder and a recurrent neural network(RNN) as a decoder. Those decisions are based on the characteristics of the data and require minimum input from the data scientists.

Ludwig in Action

Data scientists will use Ludwig for two main functionalities: training and predictions. Suppose that we are working on a text classification scenario with the following dataset.

We can get started with Ludwig by installing it using the following command:

pip install ludwig
python -m spacy download en

The next step would be to configure a model definition YAML file that specifies the input and output features of the model.

input_features:
-
name: text
type: text
encoder: parallel_cnn
level: word

output_features:
-
name: class
type: category

With those two inputs(training data and YAML configuration), we can train a deep learning model using the following command:

ludwig experiment \
--data_csv reuters-allcats.csv \
--model_definition_file model_definition.yaml

Ludwig provides a series of visualizations that can be used during training and predictions. For instance, the learning curve visualization give us an idea of the training and testing performance of the model.

After training we can evaluate the predictions of the model using the following command:

ludwig predict --data_csv path/to/data.csv --model_path /path/to/model

Other visualizations can be used to evaluate the performance of the model.

The complete Ludwig feature set is programmatically available via APIs. Recreating our example using Python is a matter of a few lines of code:

from ludwig import LudwigModel

# train a model
model_definition = {...}
model = LudwigModel(model_definition)
train_stats = model.train(training_dataframe)

# or load a model
model = LudwigModel.load(model_path)

# obtain predictions
predictions = model.predict(test_dataframe)

model.close()

Despite its robust capabilities, Ludwig provides a very extensible architecture for data scientists to incorporate their own encoders and decoders as well as functions to pre-process the data. For instance, creating a new Ludwig encoder is a matter of implementing the init and call methods as shown in the following code:

def __init__(
self,
should_embed=True,
vocab=None,
representation='dense',
embedding_size=256,
embeddings_trainable=True,
pretrained_embeddings=None,
embeddings_on_cpu=False,
num_layers=1,
state_size=256,
cell_type='rnn',
bidirectional=False,
dropout=False,
initializer=None,
regularize=True,
reduce_output='last',
**kwargs
):
__call__(
self,
input_placeholder,
regularizer,
dropout,
is_training
)

Ludwig is an incredibly helpful toolbox for the training and experimentation of deep learning models. Using Ludwig will allow even junior data scientists to train and test highly sophisticated deep learning models without the need of writing any code. Ludwig’s simple training and interactive visualization processes can drastically shorten the experimentation cycles in deep learning applications allowing experts to focus on fine tuning the architecture of the target models instead of spending countless hours doing repetitive training work.