Deploy API of Airbnb Listing Price Prediction Model with Gradient by Paperspace

Source: Deep Learning on Medium

Deploy API of Airbnb Listing Price Prediction Model with Gradient by Paperspace

Photo by Chris Chan on Unsplash

Introduction

The machine learning model that predicts the Airbnb listing price in the previous efforts has been made, so I can actually use it as the API. To do so, it is necessary to build an environment that has the API call interface and model. I think that companies often use AWS and GCP in the business scene, but this time I tried using Gradient provided by Paperspace, a relatively new service. This product can be delegated tasks such as building servers and networks and can simplify pipeline tasks such as jobs and deployment in machine learning.

Projects

This time, I decided to create three projects: DataWrangling, LearningModel, and DeployAPI. Click here for GitHub code

Data wrangling

I mainly use this that I worked on in JupyterNotebook before. Pipeline management was done using Luigi so that the processing of listing data and calendar data could be done in parallel. In addition, Dask was used for the data frame for speeding up.

All storage of learning source/result data was done using storage in Gradient (the learning data on GitHub was not used this time) because storage can be referenced from other jobs such as model learning. To use it, simply specify the path under /storage as the save destination. However, I didn’t know the function of uploading learning data in advance, so I started the notebook every time just for that. Moreover, because the storage region of the free version of the virtual machine is different, it was necessary to select a paid machine and it cost money, so I want to know if there is a better way.

It can be executed from CLI with the following command.

gradient experiments run singlenode \
--name airbnb_data \
--projectId <ProjectID in gradient> \
--experimentEnv "{\"API_KEY\":\"<Google Maps Places API key>\",\"RADIUS\":300}" \
--container tensorflow/tensorflow:latest-gpu-py3 \
--machineType P4000 \
--command "pip install luigi jpholiday requests "dask[complete]" && python data/wrangling.py -o /storage/airbnb/dataset/marged_data.pkl" \
--modelType Tensorflow \
--workspaceUrl https://github.com/furuta/springboard_capstone_gradient

The machine type is specified P4000. I chose a relatively cheap GPU.

Learning model

The model used was a five-layer deep learning model that produced the best results in JupyterNotebook of the previous efforts. The number of units in the input layer is set to the number of data columns, because the data of neighborhood is acquired with the Google Place API, and the number of input data columns increases or decreases depending on the number of neighborhood types. Since the input must be the same when using the learned model, the column list here was saved separately. The number of epochs and the batch size can be changed using environment variables.

Finally, R2 was 0.955, which was a relatively good result.

It can be executed from CLI with the following command.

gradient experiments run singlenode \
--name airbnb_model \
--projectId <ProjectID in gradient> \
--experimentEnv "{\"EPOCHS\":100,\"BATCH_SIZE\":500}" \
--container tensorflow/tensorflow:latest-gpu-py3 \
--machineType P4000 \
--command "pip install sklearn && pip install "dask[complete]" && python train/train_model.py -i /storage/airbnb/dataset/marged_data.pkl --modelPath /storage/airbnb/model --version 1" \
--modelType Tensorflow \
--modelPath "/storage/airbnb/model" \
--workspaceUrl https://github.com/furuta/springboard_capstone_gradient

Deploy API

The main file is infer.py, and the application was built with Flask. I used it for the first time, but it was easy to build because it was a simple framework. The parameters received from the request are processed into a form that can be used in the model. First, I created the necessary rows according to the date period and added the other columns later because they are common to all rows. Information on a neighborhood is acquired from Google Place API, and information on latitude and longitude that has not been acquired in the past is saved in a file so that it is not repeated.

The input of the learning model and the input of the API are summarized in separate files as columns.py. I decided to ignore data items that were not used in the learning model. The input of categorization variables such as property_type is a number and it is difficult to understand the relationship with the actual category value, so the definition is output to the validation error text. The definition was written in the GitHub readme, but it may be necessary to prepare an environment using Swagger that is easy to understand.

  • Validation by marshmallow

The validation of the input was divided into separate files as schema.py files, and Marshmallow was used. It should be noted that the required arguments differ slightly depending on the Marshmallow version. The date range was within one year (365 days), and the latitude and longitude were limited to the Tokyo area.

It can be executed from CLI with the following command.

gradient jobs create \
--name deploy api \
--projectId <ProjectID in gradient> \
--jobEnv "{\"API_KEY\":\"<Google Maps Places API key>\",\"RADIUS\":300}" \
--container tensorflow/tensorflow:latest-py3 \
--machineType C3 \
--ports 8080:8080 \
--command "pip install flask jpholiday "dask[complete]" requests marshmallow && python deploy/infer.py -m /storage/airbnb/model/1 -d /storage/airbnb/dataset" \
--workspaceUrl https://github.com/furuta/springboard_capstone_gradient

The server is the cheapest C3 with CPU. The API returned a response in an instant and no problems for personal verification.

The response is like following in JSON format. The unit of price is Yen.

Conclusion

It took quite a while to understand Gradient. The document was well made, but I had a hard time grasping how to use it, such as the difference between Experiment and Job. This time it was finished once deployed, so it was unnecessary, but although I tried CI linked with GitHub, it did not work even though I set up the linkage. It takes a little more time to use it. On the other hand, once constructed, it was very good that I could try learning and deploying many times with one CLI command while changing data and parameters. DevOps mechanism should be very important because, in actual operation, this repetitive work will be necessary. Regardless of whether I use Gradient or not, when I build an operating environment, I will endeavor to create an operating environment that is as simple as that.

The response returned from the API was fast and the results were very satisfactory. In addition to facility specifications, prices tended to increase on holidays and at the end of the year. If you really want to use it, please let me know so I can start the server.