We all know that training Deep Learning models takes time. It’s useful to have a nice visualization for many reasons. For example, we can detect if something is wrong with the run and save both our and computing time. Here I’ll describe how I ended up making a dashboard for watching reinforcement learning agents progress (demo) and how I keep track of experiment logs in a simple way. The method doesn’t depend on any framework, however I assume you use Python. On the good side we won’t touch any JS at all :)

But there is already Tensorboard

In the past few years many great deep learning frameworks were released, however task of visualization and logging is usually left to users. One of the exceptions was Tensorboard and it is a nice tool, especially for inspecting computation graphs, but even if we use Tensorflow there are several main issues that bothered me.

  • There is no good way to see all parameters corresponding to the specific run.
  • There are many interesting plots other than scalar and histogram summaries.
  • For different projects different layout of visualization can be useful, what if we want to see heatmaps from 3 runs next to each other.

Let’s look for a visdom to build a better dashboard.

visdom

Visdom is a live visualization tool powered by Plotly where we can easily move around, resize or close panes we don’t need. Thanks to Plotly each pane can be an interactive graph. For example it is possible to zoom in or change the perspective of our points. visdom supports many views and it’s easy to switch between them (top left corner).

How to integrate visdom with our framework

Unlike Tensorboard, visdom doesn’t yet come with live file log. So directly calling visdom from your evaluation function during training would be a bad idea. When visdom server shuts down, you loose all your evaluation data. We need some intermediate live storage where our logs will be safe. I decided to use SQLite because of it’s server-less and simple design. Now, when we run our experiment, we initialize the database file corresponding to that run - say conv1-run.sqlite3 and then each time we have something to log, we just serialize the data turn it to a byte string and save it as a BLOB in our database. If the ID is incremental, we can follow the same order when we read the file.

arch We can then use a separate script, let’s call it dashboard.py to connect to these databases, wait for an update and if there is something new, deserialize it and immediately call visdom to update plots. Other useful things about this approach are:

  • Our project doesn’t depend on a visualization framework.
  • Instead of running on the same server, visdom can be run on a local machine which has access to the db files.
  • We don’t have to browse in folders anymore, we can log everything - model checkpoints, run parameters, images to as single sqlite file which we can move around easily.
  • SQLite is easy to install on most systems.

Serializing the run logs

Well, I think there are countless ways used in industry, but for our purpose it would be nice to keep things simple. Let’s give our logs an event name - we can imagine them to be strings like QuickEval (for quick evaluation test) or SlowEval (For slow evaluation which might generate images or videos). when we want to log the data we will simply construct a Python dict with our log keys and values in addition to an event name. For example:

data = {'evtname':'QuickEval', 'std':std, 'result':np.random.rand(2, 3)}
dblogger.log(data)

dblogger will internally use pickle to serialize the data to bytestring, (or further compress it) So main requirement is that values you pass are recognized by pickle, which is true for at least Python standard object types and numpy arrays. It will then write to the database (eventname, objectstr) pairs. An advantage of using standard dict instead of creating our own class is that a reader will not depend on any schema changes and always will be able to deserialize the object. It’s better to still validate a dict before logging for example with voluptuous, where schema for our dict above will look like this.

Schema({
    'evtname': 'QuickEval' # we force it to be a correct name
    'std':float, 
    'result':np.ndarray,
    }, required=True)      # all fields are required

Dashboard

Ok, suppose we have several experiments running on a cluster at this point, writing logs to live files. Let’s see how to handle the reading part. Instead of running the dashboard script on a server,it can also be run locally. This way we can quickly change the visualization script if we need to and run it again. As we already said dashboard.py will need a way to connect to database files. On Linux one way to go is to simply to mount your dblogs directory on server to local folder. Then we pass the names of the experiment to the script. For example this is how I map visdom windows to my experiments. I use the main window to compare data from different runs and for each experiment(thus sqlite file) I have a separate window, where I visualize the data related to specifically that run.

arch

As you see names of the files might not be very descriptive, so when we open the window we can have one pane there, listing all of our experiment parameters, as well as even the link to the commit on github which is generating the result.
Here for example video files are logged gradually.

arch

Below is an example to handle logs, win is a visdom env object and you can use it to make visdom api calls in update_window().

while True:
  time.sleep(0.5)
  for db, win, panes in self.tabs:
    try:
      idd, evtname, data, timestamp = next(db)
      update_window(data, win, panes)
    except StopIteration:
      pass

You can see the project here - https://github.com/scientist1642/bombora related files are: utils/dblogging.py , utils/logdb.py and dashboard.py.

Some notes

  • In a demo video I replay the recorded log faster, agent needs several hours to play well.
  • While using visdom you are not restricted to use Plotly you can use vis.text() to put any html you generate on a pane. see Matplotlib animation.
  • I think reading from mounted folder can potentially give you an SQLite error (But it has never happened to me yet).
  • I wish there was a way to define a pane width and height in the visdom (probably easy to fix).
  • It’s probably better not call logging from separate threads, SQlite has some built in concurrency but not sure how good it is.