How to Build a Web App for your Data Science Project

Original article was published on Artificial Intelligence on Medium

Okay, time for the actual guide. The plan is to work backwards from what it’s going to look like at the end to how to get started

The Finished Application

At the end, here’s how your code is going to relate to your user, who’s going to think that your website is “Wow so cool.”

You’re going to pay someone (it’s cheap, like $5/month) to run a computer for you. As far as you know, it’s a real computer. But actually it’s a virtual computer, which is a program running on a real computer with a Pinnochio complex (it thinks it’s a real computer too). The beauty is that it doesn’t matter; it’s pretty good at pretending.

Unlike your computer, that computer is going to have a real Internet connection where people can find it. And it will be on 24/7 and never leave the house and get out of wifi range.

And that computer is going to run Docker, and Docker is going to run your code. You could also not use Docker and just run directly on that computer, but it’s a pain.

Client-Server Architecture aka the Internet

Okay, what is your code going to do? The only thing that happens on the Internet is that computers get messages and receive messages. Like using the United States Postal Service (USPS) but it’s not bankrupt and it moves at the speed of light plus traffic (there’s always traffic). On the Internet, these messages are called “packets.”

Packets roughly come in different types, called protocols, just like USPS things (first-class, spam, certified mail, parcels, media-mail etc.). For now we’ll just worry about the regular old HTTP. The one you know from the beginning of urls like http://philosophers.football.

When someone visits your website, they send an HTTP GET request, shown as #1 in the diagram below. In short they send a mail-piece saying “please send me your webpage.” What comes back is a bunch of stuff to show (Hyper Text Markup Lanuage – HTML), instructions for how to show it (Cascading Style Sheets – CSS), and most importantly, a bunch of code to run (JavaScript). Browsers pretty much only run JavaScript, so you’ll have to write some JavaScript to make things work.

Your website picks up the “radio” and says “roger, 200, here’s the website.” The number (200) means “OK.” You probably have met good old 404 all sorts of times (the 400s and 500s mean: things are not okay please leave me alone). Code 418 “I’m a teapot” means that the server won’t brew coffee because it is, permanently, a teapot. (Seriously. If I had made that up I’d be doing standup right now).

Notice that box standing between your website and that laptop on the left? That’s the actual webserver. In industrial-grade applications it does all sorts of fancy things. In our case, it just takes the messages in and decides where to send them. Like the mail room in an office building. I used nginx but Django (see below) is perfectly capable of doing it too. I just didn’t want to put all my frontend code inside of my backend code. It seemed awkward.

Okay, so now you have code running on somebody else’s laptop. You’re in! Don’t try to do anything bad to their computer, it’s a ridiculously over-prosecuted federal crime. (Also the browser has pretty good security; it won’t let you too do much bad stuff). What can your JavaScript code do? It can render animations on that laptop, it can decide to do whatever it wants when the user clicks somewhere on the page, and it can talk to your server all by itself.

Django

Great, so now you know how your code is going to run. What is that code exactly? The first part is the server.

Django is a web framework in Python. The Django tutorial is pretty good so I won’t go into too much depth. What does it do?

  1. It handles setting up your database (CREATE TABLE) as well as writing to and querying it. The abstractions it builds are pretty good so you don’t need to muck around writing SQL queries directly (though you can if you want to, for more complicated queries).
  2. It handles “server side routing.” This means, for example, that if someone navigates to www.yoursite.com/page1, they get whatever you decided page 1 should be. If instead they go to www.yoursite.com/page2, they get something else.
  3. It lets you run python code to decide what the content is going to be on the page. For example, if someone accesses a page, you can query their username in your database and also display information about their account (like their birthday, if they told you previously).
  4. It handles all the HTTP shenanigans like reading the incoming message and sending the appropriate type of response.
  5. It also does a bunch of other stuff, like security, authenication, etc.

In addition to plain Django, you might want to set up an api that returns data instead of a web page. You can do this with djangorestframework (the tutorial is also quite good). This provides two key extensions:

  1. Serializers. These convert data from the data-structure you might have in your database into a data-structure (usually Javascript-object notation — JSON) that you can send to the browser and the client application.
  2. API Views. A simple decorator to handle different types of HTTP requests such as GET (“send me some data”), POST (“here’s some data related to what’s in the database already”), or PUT (“upload/create a whole record to the database.”)

Finally, you may want some things to happen quickly. The HTTP protocol is designed so that every time the client sends a message, it’s supposed to get its response back almost immediately. And if it doesn’t send a message, it can’t receive one. This is a problem if you need to wait, say because it takes a little while for your model to process the incoming data. Websockets are a solution to this, and are supported by Django channels (also with a good tutorial). With Websockets, you can send a message anytime you want and possibly more than one.

React

React, a Facebook product is another “web framework” for writing the code to run on the client side. Again, it has a good tutorial so I won’t go into too much detail.

The basic structure is that everything you see is a “component.” Components can either be stateless or not. If stateless, they act like a function: they take in arguments and produce an output, namely some HTML that will be displayed by the client’s browser. They can also be stateful. Stateful objects can remember things. So if, for example, you are building a board game, the state can be the position of the board and whose turn it is.

Another thing to keep in mind is “client side routing” available with react-router. What this does is make it so that you only have to send your user one page. Then when they go to www.yoursite.com/page1, your React app looks at the URL and decides to render page 1. Similarly for www.yoursite.com/page2, they get page 2 rendered. But they don’t have to talk to the server every time: they just get the one page that knows to look at the URL and decide what to display.

Docker

Docker is the technology I found the most annoying to wrap my head around. I think this is because it insists on using its own vocabulary and also defines itself by not being a virtual machine even though it solves exactly the same problem.

The point of Docker is that, with the technology, you can define with code an environment that your code will run in. Sort of like a virtual environment on steroids. These instructions, in plain English, might say something like:

  1. Create a new “virtual” computer
  2. Install Python 3.6
  3. Copy in my Django code from my Github repository
  4. Run the command to start the Django server

These instructions go in a Dockerfile. The Dockerfile is built into a Docker image. Finally, you can create a container based on this image. And it is the container that runs your code.

The crucial point of Docker is that this code will run the same on every computer. And it’s very easy to get your hosting provider (like Digital Ocean) to give you a server with Docker already installed. Then all you have to do is start up a docker container based on the image you want.

Also, you may want more than one image. For example, one image can run the database, one image can run your backend server, and one image can run your frontend server.

Nginx

I ended up using Nginx as the main webserver. So when you visit the side, first nginx looks at what URL you are trying to visit and then either routes it to Django or else serves up the single set of React files that handle deciding what to render with client side routing.

You can also accomplish this with Django, and that would probably be easier.

Conclusion

If you have a project in mind and make it through the Django and React tutorials, you should come across the other things you need to do and be able to handle them with aplomb. Hopefully, this roadmap gives you a good idea of where to get started!