Original article was published on Deep Learning on Medium
Data Pipelines in Fastai (v1)
One of the most useful features that fastai brings to the table is its datablock api. I have been part of many machine learning projects, and one common theme across them has been the large amount of time I spend creating pipelines that can work across training/validation sets consistently.
I must admit that the the first time I started looking through fastai datablock apis to handle data, I was a little confused. It took me some time to grasp the framework, going through their docs, forums and blogs. Once I was past the hump, I could see why it’s built the way it is, and the value it brings: ease of use & flexibility to customize. While it may be intuitive for people to read and follow code, it has always been easier for me to visualize things in flow/block diagrams. Diagrams help me first look at things from a birds eye view, understand the overall objective, and then provide me ways to go deep in certain directions. I looked for blogs/articles/docs that could provide me a few diagrams to start quick, but I could not find any. In this article, I am hoping that I could demonstrate the api framework with some quick flow diagrams, which hopefully paint the larger picture, and help folks like me navigate the large framework faster.
Note: While it is not necessary to understand Object Oriented Programming (OOP) in python in depth, a basic understanding of OOP concepts in python will help a long way, especially when you need to dig through most of the pytorch and fastai codebase. There are plenty of youtube videos taking you through the concepts. I found the video playlist by Corey Schafer quite helpful in brushing up my OOP concepts: Link
Most of the ML pipelines consist of the following steps: