April 8, 2022 • 10 min read

How to Build Customizable Web UI for ML with Streamlit and DVC

Rédigé par Antoine Toubhans

Antoine Toubhans

There are tons of Machine Learning tools available on the market and sometimes it is difficult for data scientists to navigate. Undoubtedly, Streamlit (15K ⭐ on GitHub) and DVC (8.1k ⭐sur GitHub) are among them.

This article shows how Streamlit and DVC can help data scientists to quickly develop a web UI to analyze their experiments on their ML projects.


In Machine Learning projects, you need scripts to build your dataset, train and evaluate your models. DVC (Data Version Control) is a very popular ML tool that allows the orchestration of these scripts and the tracking of the input/output data: datasets, models, and metrics.

Yet, DVC has some limitations:

  1. Metrics tracking is limited to scalar values. Even though you can track any large file, it is not easy to compare different versions of the tracked data;

  2. Files are just files: DVC provides limited tools (e.g. plots) to dynamically explore the data e.g., dig into large CSV files, play with the trained models by running predictions, drill down nice data visualization.

Another popular ML tool is Streamlit that “turns data scripts into shareable web apps in minutes”. In this article, I’ll show how Streamlit lets you harness the potential of the tracked data. Combined with DVC, it allows you to compare different versions (meaning data tracked at different commits) in a very nice and customizable way.

If you are familiar with DVC, you may skip parts 1 and 2 that present a very simple pipeline to train a cat vs dog classifier.

In this article, I provide code snippet examples. If you are interested in the code, you can clone the companion repository of this article here.

1. A Cat versus Dog Classifier DVC Pipeline

Let’s say I want to train a model to classify cats and dogs, simple right?

To do so, there are 4 steps:

  1. Download the cat_vs_dogs dataset from Tensorflow Datasets;

  2. Split the dataset into train/val/test subsets;

  3. Train a neural network classifier using train/val subsets;

  4. Evaluate the trained model on the test subset.

In order to orchestrate the 4 steps and track the data, I created a 4 stages DVC pipeline running python scripts. Concretely, the DVC pipeline is a dvc.yaml file describing the command to run, the input and output files to be tracked for each stage.

In the end, it defines a dag that looks like this:

A Simple ML Pipeline (arrows are reversed)

Note: arrows are reversed as they represent stage dependencies. You can generate this graph automatically using DVC command line by running :

dvc dag --full --dot | dot -Tpng -o docs/images/dvc-pipeline.png

1.1 Download the data

First and foremost, we need to download the dataset. The stage download_dataset uses wget to download the dataset archive from Tensorflow and unzip it in the data/raw folder. Let's write the first stage in dvc.yaml:

Adding the download_dataset stage

There are no dependencies (inputs) for this stage, and it produces (output) data in data/raw that is tracked by DVC (see the outs key in the YAML).

Note the flexibility of DVC that lets you run any shell command, not only a python script.

1.2 Split the dataset

Then, after you executed the first stage (run dvc repro dvc.yaml:download_dataset), you’ll see in the data/raw folder train and validation splits but no test subset:

That is perfectly normal: no test split is provided by Tensorflow for the cat vs dog dataset. To alleviate that, I (arbitrarily) chose to split the validation into val (70%) and test (30%) subsets. I do the split using pandas (see the split_dataset.py script) and add the new stage to the pipeline:

Adding the split_dataset stage

Now, if you execute the pipeline (run dvc repro dvc.yaml:split_dataset), you see the train/val/test splits:

1.3 Train the model

Now, I’ll train a Tensorflow binary classifier model. To do so, I simply adapted the Tensorflow transfer learning tutorial to write the train.py script.

Same as before, let's add the train stage to dvc.yaml:

Adding the train stage

And let's run dvc repro dvc.yaml:train, wait a few minutes and your model is trained!

1.4 Evaluate the model

Finally, we want to evaluate the trained model on the test subset. First, we run the trained model on the test set so as to obtain probabilities for each image to be a cat or a dog, producing a CSV file that looks like this:

First rows of the predictions.csv file

Then, we compute the accuracy of the model, i.e., the ratio of images correctly classified, and write the result in a summary JSON file :

Accuracy of the trained model, metrics.json file

You can find the code in the evaluate.py script. Let's add the final stage to our pipeline:

Adding the evaluate stage

Note: metrics are special kind of outs, more details in the next section.

And that is it! We now have a fully functional DVC pipeline that downloads, prepares the data, trains and evaluates the model. You can run the whole pipeline by running dvc repro.

If you are interested in more advanced DVC features, you may look at the full version of the DVC pipeline here, which adds parameters, metrics, plots, dvclive for training summaries... If you want to go further, please read the awesome DVC blog.

2. What I can Do with DVC?

First, let's clone the repo and install requirements:

git clone git@github.com:sicara/dvc-streamlit-example.git
pip install -r requirements.txt

2.1 Track the Data

The core feature of DVC is to track the data produced when executing the DVC pipeline presented in part I at any commit.

Let's take an example: first, have a look at git commits :

git log --stat --color # or glg with oh-my-zsh installed

You can see that commit f242e6ebdb1ddd1fbef8d6f1ed1b7e6f1345348a modified the dvc.lock file meaning the pipeline was executed to produce that commit:

With DVC, the data produced by the pipeline at any commit can be easily retrieved. First checkout the commit:

git checkout f242e6ebdb1ddd1fbef8d6f1ed1b7e6f1345348a

Then pull the data:

And that it! Now the pipeline outputs are restored in your local file system as they were when the pipeline was executed at commit f242e6e.

2.2 Track more Data

Let's say you want to retrain your model, you do some changes in the code or in the training parameters and then you commit your changes:

# Do some modifications in the model, parameters, ...
git add YOUR_MODIFIED_FILES
git commit -m "My changes on the model, params, ..."

To execute the DVC pipeline again, you simply need to run dvc repro:

Stdout when running dvc repro

Then, commit the changes:

git add dvc.lock data
git commit -m "DVC repro"

Finally, push changes to the git and DVC remotes:

# Save changes
git push
dvc push

And that’s it: pipeline inputs and outputs are versioned by git and DVC so that they could be retrieved later on.

Note: the remote storage of the repository is the Sicara's public s3 bucket (see dvc config file). By default, you have permission to read ( dvc pull) but you cannot write ( dvc push). If you want to run experiments and save your result with dvc push, consider adding your own dvc remote.

2.3 Metrics and Plots

If you look closer at the dvc.yaml file from the Github repository, you’ll see metrics and plots in the evaluate stage:

Metrics and plots in the evaluate stage

These are special outputs that enable additional DVC features. Here is what DVC says in its documentation:

DVC has two concepts for metrics, that represent different results of machine learning training or data processing:
1. dvc metrics represent scalar numbers such as AUC, true positive rate, etc.
2. dvc plots can be used to visualize data series such as AUC curves, loss functions, confusion matrices, etc.

Let's try it: to see current metrics values, run dvc metrics show:

dvc metrics show

Additionally, dvc metrics diff let you compare metrics values between different commits:

Compare accuracy metrics between commits f242e6e and afe4ed1

Pretty nice :) Lets now try to plot some data:

dvc plots with confusion matrix templates

It outputs a plots.html file, let's open it in the browser:

plots.html file produced when running “dvc plots show data/evaluation/predictions.csv”

Very nice! When running dvc plots show data/evaluation/predictions.csv, DVC does the following:

  1. the predictions.csv file is parsed;

  2. the predefined confusion matrix template is used to process/transform the data so as to produce the confusion matrix;

  3. VEGA renders the confusion matrix (embedded in an html page).

If you go further, it is even possible to add your own templates to DVC plots (see the documentation).

3. What I Cannot (Easily) Do with DVC?

ML projects often consist of exploratory research: data scientists run experiments, retrain models. To know where they’re going, they need to track the model performance and they need to visualize the model and the data to understand what is going on.

Let's step back a little bit and look closer at DVC capabilities:

  • show scalar values (e.g., model accuracy) at one commit with dvc metrics show ;
  • plots data series (e.g., training loss function) at one commit with dvc plots show ;
  • compare different commits with dvc metrics diff or dvc plots diff;
  • define more complex data visualization by extending DVC plots with custom templates, it allows data transformations and interactive data visualization;
  • track a scalar value (e.g., the model accuracy) through the project history. For instance, dvc metrics show -A shows you metrics values for all commit in the command line.

I sum up DVC data abilities on two axes:

  • visualization expressiveness: how easy it is to build more complex visualization and more complex input data, e.g., tabular data, images, videos, ...
  • version aggregation abilities: the capabilities to collect data from different commits and aggregate it in different ways, e.g, diffing the model accuracy between two commits.
DVC abilities shown on two axes: data visualization expressiveness and version aggregation abilities

DVC Limits

Even though DVC is an amazing tool, it has some limitations:

  • data visualizations are limited: input data formats are limited to tabular file formats (JSON, CSV, and YAML files), which excludes other data such as images, videos;
  • diffing is limited to scalar values. It is possible to show the evolution of scalar values through all commits (dvc metrics show -A) but it is restricted to the command-line interface
  • it provides no real UI: dvc plots relies on VEGA, a declarative language for creating, saving, and sharing interactive visualization. Yet, it does not provide a standalone UI: it needs to be rendered somewhere (e.g., embedded into an HTML page).

DVC needs help to bridge the gap between data tracking in the command line interface and real UI that allows showing interactively any comparison between any kind of data.

4. Streamlit Bridges the Gap

Streamlit is an open-source python library very popular among data scientists that let you build interactive UI for manipulating data very quickly:

Streamlit turns data scripts into shareable web apps in minutes. All in Python. All for free. No front‑end experience required.
- quote from streamlit.io webpage

If I must place it on my two axes graph above, it would be on the top (no data version comparison) right (very expressive data visualization):

Streamlit enhance DVC capabilities

So, regarding DVC limitations I described in the previous section, Streamlit appears to be a good candidate to bridge the gap:

  • it provides an interactive Web UI;
  • it allows to represent almost any kind of data, not only scalar or data series;
  • together with git and DVC python API, it also allows comparing any versions of any kind of data in a very flexible way.

In the following, I’ll go into detail of the third point by describing several concrete use cases from the Cat Vs Dogs classifier example.

4.1 Build a Commit Selector with Git Python API

The first thing to do is to be able to select the commit you want to see the data from. First, we retrieve the list of commits using the git python API:

Note the paths argument: I am interested in commits that correspond to a DVC pipeline execution so I only need to keep commits that modified the dvc.lock file.

Now, I simply use a Streamlit selectbox to let the user choose the commit:

To start the streamlit app, simply run:

streamlit run {PATH_TO_YOUR_SCRIPT}.py

Go to your browser, you should see:

st.selectbox() for commits that modified dvc.lock files

4.2 Explore the Performance of any Model on the Test Set Image

Now that the user can select a commit, I’ll show how to load any data file tracked by DVC and show it in the Streamlit app.

If you remember the Cat vs Dog classifier pipeline I introduced in part I, the evaluation stage outputs a data/evaluation/predictions.csv file containing model predictions on the test set.

The DVC python API is simple yet very powerful: it provides a dvc.api.open() function that behaves like the core python open() function but for files tracked by DVC:

Provided a commit hash, the load_predictions() reads the corresponding prediction CSV file with pandas. Then, we use the commit selector to load the selected predictions and show them with Streamlit.

And that’s it: when you select a commit, you’ll see the dataframe change dynamically:

Predictions from the model of the selected commit.

4.3 Compare Predictions of Two Different Models

Let’s say we want to see where two different versions of the trained model disagree on the test set. First, we put two git commit selectors:

Then, read both prediction files with the load_predictions() function, merge them with pandas, and select test set images where the two models disagree:

Finally, show the final dataframe and the corresponding images with st.image():

And here it goes, with a few lines of python, we built a simple web page for comparing two models:

Compare model afe4ed and db4501 - Note the first image models disagree contains a cat and a dog :)

4.4 Build an Experiments Tracking UI à la MLFlow

Many ML frameworks propose a UI that displays the list of experiments (i.e., training) with model parameters, training statistics, and model performance. For instance for MLFlow:

MLFlow UI for Tracking Experiments

With DVC and Streamlit, it is quite easy to build the same. First, let’s collect the list of commits that modified the dvc.lock file:

Then, let’s collect model parameters that are written in the dvc.lock file. The dvc.lock file is tracked with git, so we need a utility function to read it from any commit:

It is a trick: I recover the file with git python api by computing the diff between current revision rev and the first commit (FIRST_COMMIT).

Then, I can collect model parameters from dvc.lock files:

Now, I’ll collect model performance from metrics.json file:

Finally, let’s assemble the collected information into a single dataframe and show it in the Streamlit app:

Of course, it is not as pretty as the MLFlow tracking experiment interface, but it does the job and it is very flexible: I can easily choose what to show in the Streamlit app, even afterward the experiments were run as long as the data was tracked by git or DVC.

4.5 Run Inferences with Models from Differents Commits

Now, I would like to have direct interactions with trained models: an interface where I can upload any image and run the model of my choice on it. A Streamlit page that looks like this:

Select the model using the top left selector, upload an image and run the model!

I can reuse the model selector as before, but regarding loading the model, it’s a bit more technically challenging: a model is not a single file, it is a folder:

Model saved with Tensorflow is not a single file

It is a problem because the dvc.api.open() function only work for single files, whereas the Tensorflow tf.keras.models.load_model() requires a folder as input.

We need something more. We need the dvc get CLI command:

Provides an easy way to download files or directories tracked in any DVC repository (e.g. datasets, intermediate results, ML models)

Unfortunately, there is no python API for this command. No worries, DVC is written in python, so I can use the internal DVC API to retrieve the model folder. A bit of caution here: the internal DVC API is subject to changes in future versions.

To load the model, I built a function load_model(rev) that downloads the desired model to a model cache directory, and then load it with TensorFlow:

Voila: selected models are downloaded to .model_cache directory:

The local model cache, proxy for the “dvc get” command

Now, I have all I need to do the Streamlit page:

Conclusion: DVC+Streamlit = ❤️

I hope I convinced you that Streamlit allows you to build custom web UI very quickly on top of DVC. At Sicara, I use it in my computer vision project, it is very convenient to share results with the team and our client. If you enjoyed the article, please leave me a comment, and star the repo or contact-us !

If you want more inspiration on Streamlit, look at their blog and gallery.

Last-minute note: DVC just released a few days ago DVC Studio which is a web UI for tracking experiments. I have not tested it yet and I don’t know if the UI is as flexible as Streamlit dashboard but it looks awesome and I’m looking forward to trying it.

Cet article a été écrit par

Antoine Toubhans

Antoine Toubhans