5 modern tools to know if you were the “Rip Van Winkle” of Data Science who slept for the last 2…

Original article was published on Artificial Intelligence on Medium


5 modern tools to know if you were the “Rip Van Winkle” of Data Science who slept for the last 2 years and just woke up

5 data science tools that rose to popularity in recent times.

Sleeping sloth from Pixabay

“Rip Van Winkle” is a short story by the American author Washington Irving, first published in 1819. It follows a Dutch-American villager named Rip Van Winkle who falls asleep in the Mountains and wakes up 20 years later, having missed the American Revolution.

Similarly, if you were a data scientist who slept (got sidetracked) for the last 1–2 years and woke up now, this post will serve as a briefing to the data science revolution that happened during this time.

Disclaimer: It is not comprehensive and I might have missed a few tools that were equally revolutionary. Pardon my recall (No ML pun intended!).

Let’s get started!

Pycaret

Image from Pycaret website

Quoting from their website, PyCaret is an open-source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within seconds in your choice of notebook environment.

My point of view —

Traditional ML pipeline (using Scikit-learn etc):

Prepare features -> Apply missing value imputation techniques (Categorical and numerical) -> Apply feature scaling (min-max etc) -> feature selection -> Run several models (RandomForest, Naive Bayes, Xgboost etc) -> Tune hyperparameters -> visualize results (seaborn etc) -> Deploy your model

Pycaret ML pipeline:

Prepare features-> Pycaret -> Boom 💥

Just get your data as a pandas data frame, write 3 lines of code and get the output from several classifiers with top scores highlighted –

from pycaret.classification import *
class_setup = setup(data = data, target = 'cls var', session_id=123)
compare_models()
Image from Pycaret Demo

You can plot model evaluation metrics, use explainability of models (SHAP), deploy, etc all with 1–2 lines of code.

With Pycaret 2.0 you can even do model logging using MLFlow by data bricks.

Streamlit

Image from Streamlit.io

Problem: Data Scientists lack Frontend skills to create a demo for their application. So great projects slip through the cracks and don’t get the exposure that they deserve.

Solution: Enter Streamlit. A minimal framework to create powerful frontend with just pure python. No HTML, CSS, and Javascript experience needed.

Gif of streamlit frontend from Github Gan Demo

FastAPI

Image from FastAPI Docs

Quoting from their website, FastAPI is a modern, fast (high-performance), web framework for building APIs.

If you wanted to expose your amazing ML model as a REST API, the de facto standard was to use Flask.

Now enters FastAPI which does the job that Flask does with significant improvements –

  • FastAPI already has a production-ready server called Uvicorn, unlike Flask.
  • FastAPI, as the name suggests, is much faster than Flask and is one of the fastest Python frameworks available.
  • It automatically generates documentation using Swagger UI.
  • Easy to switch if you were a Flask user already.

TLDR: If you fell asleep during the Flask era, wake up and learn FastAPI 🙂

Cortex

Image from cortex.dev

My point of view —

Traditional Deployment pipeline (on AWS as an example):

Model inference code -> create Flask API -> Add Gunicorn to be production-ready -> create a Docker container -> deploy Docker on container orchestration service (EKS, ECS, Fargate etc) -> logging, autoscaling, load balancing -> infrastructure for rolling updates (without taking down current API)

Cortex ML deployment pipeline:

Model inference code -> Cortex-> Boom 💥

Cortex automates the whole traditional deployment process. If you just provide your AWS credentials, you can use Cortex’s command-line interface to deploy an API to production just starting from model inference code.

Cortex is open-source, easy-to-use, built for scale, and also supports AWS spot instances to save cost.

MadewithML

MadewithML Topics from their webpage

If you ever had a problem with keeping up with the latest data science developments and projects, worry no more.

MadewithML makes it super easy to organize all the great data science content found on the web in one single place. Every day you can go and see what are the top projects in data science by looking at their trending section. They are community curated and rated so the best ones reach the top!

There are also collections on each topic so if you want to know all the best projects about any topic let’s say “Question-Answering” you can find them curated.