We all know that training Deep Learning models takes time. It’s useful to have a nice visualization for many reasons. For example, we can detect if something is wrong with the run and save both our and computing time. Here I’ll describe how I ended up making a dashboard for watching reinforcement learning agents progress (demo) and how I keep track of experiment logs in a simple way. The method doesn’t depend on any framework, however I assume you use Python. On the good side we won’t touch any JS at all :)
But there is already Tensorboard
In the past few years many great deep learning frameworks were released, however task of visualization and logging is usually left to users.
One of the exceptions was
Tensorboard and it is a nice tool,
especially for inspecting computation graphs, but even if we use Tensorflow
there are several main issues that bothered me.
- There is no good way to see all parameters corresponding to the specific run.
- There are many interesting plots other than scalar and histogram summaries.
- For different projects different layout of visualization can be useful, what if we want to see heatmaps from 3 runs next to each other.
Let’s look for a visdom to build a better dashboard.
Visdom is a live visualization tool powered by Plotly where we can easily move around, resize or close panes we don’t need. Thanks to Plotly each pane can be an interactive graph. For example it is possible to zoom in or change the perspective of our points. visdom supports many views and it’s easy to switch between them (top left corner).
How to integrate visdom with our framework
Unlike Tensorboard, visdom doesn’t yet come with live file log. So directly calling visdom
from your evaluation function during training would be a bad idea. When visdom server
shuts down, you loose all your evaluation data. We need some intermediate live
storage where our logs will be safe. I decided to use SQLite because of it’s
server-less and simple design.
Now, when we run our experiment, we initialize the database file corresponding to that
run - say
conv1-run.sqlite3 and then each time we have something to log, we just
serialize the data turn it to a byte string and save it as a BLOB in our
database. If the ID is incremental, we can follow the same order when we
read the file.
We can then use a separate script, let’s call it
dashboard.py to connect to these databases,
wait for an update and if there is something new, deserialize it and immediately call visdom to
Other useful things about this approach are:
- Our project doesn’t depend on a visualization framework.
- Instead of running on the same server, visdom can be run on a local machine which has access to the db files.
- We don’t have to browse in folders anymore, we can log everything - model checkpoints, run parameters, images to as single sqlite file which we can move around easily.
- SQLite is easy to install on most systems.
Serializing the run logs
Well, I think there are countless ways used in industry, but for our purpose it would be nice
to keep things simple. Let’s give our logs an event name -
we can imagine them to be strings like
QuickEval (for quick evaluation test) or
slow evaluation which might generate images or videos). when we want to log the
data we will simply construct a Python dict with our log keys and
values in addition to an event name. For example:
will internally use
pickle to serialize the data to bytestring, (or further compress
it) So main requirement is that values you pass are recognized by
pickle, which is true for at least Python standard object types and numpy
will then write to the database
(eventname, objectstr) pairs.
An advantage of using standard dict instead of creating our own class
is that a reader will not depend on any schema changes and always will be able to
deserialize the object. It’s better to still validate a dict before logging for
example with voluptuous,
where schema for our dict above will look like this.
Ok, suppose we have several experiments running on a cluster at this point, writing logs
to live files. Let’s see how to handle the reading part. Instead of running
the dashboard script on a server,it can also be run locally. This way we can
quickly change the visualization script if we need to and run it again.
As we already said
dashboard.py will need a way to connect to database files.
On Linux one way to go is to simply to mount your
dblogs directory on server to local
folder. Then we pass the names of the experiment to the script. For example
this is how I map visdom windows to my experiments. I use the main window to
compare data from different runs and for each experiment(thus sqlite file) I have
a separate window, where I visualize the data related to specifically that run.
As you see names of the files might not be very descriptive, so when we open the
window we can have one pane there, listing all of our experiment parameters,
as well as even the link to the commit on github which is generating the result.
Here for example video files are logged gradually.
Below is an example to handle logs, win is a visdom env object and you can use it to make visdom api calls in update_window().
You can see the project here - https://github.com/scientist1642/bombora
related files are:
- In a demo video I replay the recorded log faster, agent needs several hours to play well.
- While using visdom you are not restricted to use Plotly you can use vis.text() to put any html you generate on a pane. see Matplotlib animation.
- I think reading from mounted folder can potentially give you an SQLite error (But it has never happened to me yet).
- I wish there was a way to define a pane width and height in the visdom (probably easy to fix).
- It’s probably better not call logging from separate threads, SQlite has some built in concurrency but not sure how good it is.