Source: Deep Learning on Medium
As a budding researcher slowly finding his place at the intersection of deep learning and human language, frustration quickly became my close friend. Anyone trying to build capacities meet him one way or another. Usually, he taps your shoulder pointing at a roadblock in your way. Sometimes, he tells you that you missed a shot that you should’ve taken.
In my case, frustration told me I was wasting my time.
I still remember that one night, three days away from defending my thesis. It was 9:30 pm. I was racing to make the most out of my time with my lab machine as the University closes at 10:00 pm and I would be escorted out if I stayed. The prototype models for the defense were nowhere near finished. To make matters worse, I was sick. I was skipping my classes trying to finish the deliverables without beating myself up.
The tensor dimensions were out of place. The data was unclean. None of the neural networks were improving. Some of the earlier jobs did work, but even after training them for two nights in a row, they fail to learn what I wanted. I fired up another Jupyter Notebook (the tool of choice for most of us in this field) and tried to overhaul some code, hopefully figuring out where things go wrong.
By 9:50 pm, I realized I keep writing code over and over and over again to accommodate small changes. Add that to the fact that I had to test multiple variants of my networks at once, I saw myself adjusting every single detail and rewriting boilerplate on top of boilerplate.
By the time the security guards were asking people to leave the labs on the fourth floor, frustration was tapping on my shoulder.
“Stop wasting your time.”
I knew I needed my own framework.
Racing For Time
There are four problems here:
- Data preprocessing takes up too much time. Given that I run multiple experiments in parallel, I have to tweak the preprocessing a little across all experiments. This eats up time that I could use to build models. I need a way to automatically preprocess data given little instructions on my end.
- I use a technique called Few-Shot Learning where you train a model using very little data (like how we humans would). Many Few-Shot Learning techniques exist, but they are all trained differently. Just like all the computer scientists before me, I’m lazy. I want a uniform way to train my models.
- I’m broke. Everyone working in Machine Learning knows that your most important work tool is your GPU, which accelerates your matrix calculations. Little money means less GPU memory to fit models and data in. To circumvent this, I’d need to work with FP16 (called “Floating Point Half-Precision.” where data is stored in 16 bits instead of the usual 32 bits.) Doing this effectively doubles your GPU memory. But doing it is very involved.
- Jupyter Notebooks don’t cut it anymore. To run experiments, we usually spin up a new notebook, write all the boilerplate, do the involved coding, and train the models. If I’m training benchmarks on multiple models, I can’t keep using Jupyter Notebooks. I also can’t keep writing boilerplate. I need a way to write training scripts fast so I can run them on a terminal.
The solution, as I thought, was to write my own framework.
The framework should be dynamic enough to accommodate changes in models and experiments, and should be quick enough to prototype with so I can reduce the concept-to-testing delay. Most importantly, it should allow me to write short code: code short enough to write in a small Python script that I can run in a GPU server. Given the right cases, I should be able to train multiple variants of my models and preprocessing settings in a short span of time. All I need to do is build my network architecture and say how I want my data to look like.
I resigned to my bed for the night, woke up early the next day, and wiped my workspace clean. Two days left before the defense. I spent the first day writing wrappers and building a lite-framework of sorts that suited my needs. While it took a lot of hairpulling every now and then, it did do the trick. By the following day, I was running experiments seamlessly.
I defended my thesis to grab a super shiny pass-with-minor-revisions from my panel.
The mini-framework I wrote turned out to be a precursor — a blueprint of sorts — for a much bigger undertaking. I realized that much of our lab’s research could be made quicker by something like this.
Towards an Automated Neural Networks Framework
Three things, I told myself:
- A preprocessing pipeline that smartly knows how to process your data for you.
- An automated training engine that takes your model and data and efficiently trains it on a GPU, maximizing half-precision compute as much as possible.
- Utility functions that make heavy lifting of data quick work.
If you took these three things, package them in a framework, and allow people to write experiments in a Python script at most 100 lines long, then it would shorten the concept-to-experiment delay.
I ran the idea over with my mentor (who wrote his own mini framework that suited his needs, which in turn inspired the original mini-framework I wrote). I also ran it over with our former lab head who is currently on his PhD study leave. After some idea-sharing and expectation setting, I knew I needed to make a proof of concept.
I decided to call that proof of concept “The Lightpost Project.”
For anyone who would want to check out the alpha version of the framework, I’ll (shamelessly) plug the GitHub repo here:
Automated Neural Networks Training Framework. Developed and Managed by the DLSU Machine Learning Group. …github.com
Lightpost is a framework built on top of the (already amazing) PyTorch framework. It leverages the three points that I outlined above, and packages it into a neat package that works on top of the very flexible system that PyTorch gives you.
The overall design goal is to minimize the delay from the drawing board to the experiments. Yes, Keras might seem like the immediate answer to such a need, but unlike Keras (which by the way undoubtedly delivers the best UX for general purpose Deep Learning), Lightpost provides data pipelines which autoprocesses your data for you. It also sits on top of PyTorch and is 100% extensible using PyTorch code.
At the end of the day, I’d like to be able to write my models in PyTorch, leverage Lightpost to feed in my data seamlessly, load my pretrained word embeddings easily, then feed it to a training engine that automatically maximizes my GPU for me in the background. For logging, it also has Tensorboard support (especially useful for t-SNE visualizations of word embeddings), among other training stat reports.
Back To The Drawing Board
Lightpost is far from finished. Some of the things unimplemented are:
- CUDA Support. The framework doesn’t have GPU support yet, as I’m still in the process of hammering out some dents and finalizing some design choices.
- Mixed Precision Training. Unlike the original mini-framework, Lightpost doesn’t have FP16 (and CUDA) baked in just yet. NVIDIA released a new thing they call Apex which supposedly handles FP16 cleaner than manually implementing it. I’ll see what best fits and roll it out in the next release.
- Computer Vision support. I’m mainly a Natural Language Processing (NLP) researcher, and my experience with Computer Vision (CV) is limited. Hence, no CV support for now. But it will come.
- Categorical data preprocessing. There is still no data pipeline that automatically processes categorical data (sorry, Titanic Dataset!)
- One-shot/Few-shot support. This is the original use case I made my framework for. The reason why they’re not baked in yet is because I made code-breaking design choices when I decided to generalize to a larger use case than just one-shot/few-shot learning. It actually already exists, just not in the GitHub repo. Still ironing out some creases before I release it in alpha.
If you decide to somehow test drive the alpha version of the framework, then you have my thanks. Beware of bugs in the code as it’s currently in 0.0.3a and it’s expected to break if you use it for things you’re not supposed to. But hey, it does give you a lot of utility out of the box already.
Please consider filing a bug report if you see one. If you liked the project, please consider starring the repo ;) And in the rare occasion that you’d like to contribute, please drop by the Issues Tracker in GitHub and I’d love to hear you out!
Looking back, it’s interesting how a very frustrating night of running experiments led to the initiative to build a neural networks framework. From writing boilerplate, to making a sad mishmash mini-framework for thesis, to building a prototype of a fully-fledged framework that we can use for research someday, it was a great learning experience seeing my design choices change and my priorities grow.
I envision Lightpost to be some sort of an in-house tool that we can use in the University’s labs someday. I’m used to seeing people getting frustrated when their experiments don’t turn up the way they’d like it to be. In the worst case, some people start from scratch amidst their jungle of mangled code. Lightpost could hopefully aid these people in writing their experiments and reducing their delay in turning concepts into actual results.
Perhaps someday, when the framework becomes robust enough, we can get closer to this dream. I could already foresee how much of a headache it would be getting the framework to that point. It would be gruelling to find time juggling development and grad studies. Not to mention the frustrating hours debugging.
But hey, if I had one takeaway from the entire ordeal, it’s probably that if things start to get frustrating, it means you’re going down the right path.
See you in the version 0.1 release!