Original article can be found here (source): Artificial Intelligence on Medium
Using Dymaxion Labs API to detect pools from satellite images
At Dymaxion Labs, we are developing a geospatial analytics platform (DYMAX). Our goal is to make scalable machine learning modeling with multispectral satellite imagery as the main input.
To achieve this we leverage the computing power of Google Cloud Platform. In particular, we use AutoML Vision, AutoML Tables, and Cloud ML.
We introduce you to a technical article, showcasing all the power of our platform and how we can apply to real-world examples like swimming pool detection in large areas. This article is about how to use the Dymaxion’s API to create an entire machine learning workflow programmatically.
It’s a good idea to begin first describing our methodology. For data science projects like this one is very important how the user is going to define the problem (Where are the swimming pools in our area of interest?) and what datasets or ground truth information we can gather from the real world.
As the problem was defined and the relevant data was gathered, we need to annotate the examples that the model will use to extract the patterns and learn from the data. These steps are called in textbooks as training dataset creation.
Once we have annotated examples, we need to calibrate the model to generalize the relevant patterns and avoid overfitting the training dataset. Our platform sets a minimum number of annotations to unlock the training process. In general, more annotations come with better results.
Note that in this example we will talk about swimming pool detection, but you can detect different kinds of objects like cattle, roads, and solar farms. In a few weeks, we will release the segmentation models section to delineate more complex objects.
Once the model has been trained, you can use it to predict new swimming pools in different areas. If you prefer to experience these steps in a very intuitive and direct way instead of the API path, we suggest you to take our platform’s test drive (https://app.dymaxionlabs.com/testdrive).
The lines below are about how to implement this workflow using our API and Python.
First, let’s create a directory for this project. It is good practice to create a virtual environment for our project. Run the following on a terminal:
If you have Conda installed you can create the environment like this:
conda create -n pools-cd python=3
conda activate pools-cd
Now, we’ll install the Dymaxion Labs Python package, which allows you to easily interact with the API, and other dependencies for reading rasters and vector files and visualizing the input data and results:
pip install dymaxionlabs jupyter geopandas rasterio matplotlib descartes
Let’s start a Jupyter notebook. From the project’s directory run:
On the empty notebook, import the following modules:
import dymaxionlabs as dl
from dymaxionlabs.models import Modelimport rasterio
import geopandas as gp
import matplotlib.pyplot as plt
Create an API key
Now go back to your Jupyter notebook. We’ll need to set the API key like this:
import osos.environ["DYM_API_KEY"] = "insert-api-key"
Define the problem
Now we are ready to start creating our custom object detection model!
First, we need to understand what kind of problem we are trying to solve. We want to detect and localize pools from a satellite image of a residential area. This is an object detection problem. In this case, there is only one type of object we are interested in, so the model will have only one label.
Create a model
We now know that the type of model is object detection and it will have one label (let’s call it “pool”). To create a model use the Model.create method, like this:
pools_detector = Model.create(name="Pools detector",
Upload an image for training
Now download this image and save it on your project’s directory.
pools_file = dl.upload_file("./pools.tif")
When the process finishes, you will be able to upload or make annotations on this image.
We first need to give some examples of pools in the image so that the model can learn how to recognize them. This process is usually called annotation. For this step, you can either go to Dymaxion Labs platform and use the Annotation module to manually annotate the images, or download this file and upload the annotations from Python. This is useful in cases where you already have annotations prepared on GIS software and you want to use them.
You will need to annotate at least 50 examples of each label, in this case, 50 pools. The file we provide has 60 annotations.
Now that we have our model and we have annotated our image, we will proceed to task a training job:
job = pools_detector.train()
Because a training job may take several hours, the Model.train() method is asynchronous and returns a Job object, that tells you if the training has finished or not at any time. For example, using job.status() you can wait until the job has finished and the results can be downloaded:
import timewhile job.is_running():
print("Waiting for results...")
Now that we have our trained model, we are going to use it to detect pools in the whole image. We use Model.predict_files() and pass our pools_file object, that references our previously uploaded file:
job = pools_detector.predict_files([pools_file])
Once the prediction job is finished, we can download the results, like this:
It will automatically create a directory inside the project with the following files:
The GeoJSON file is a vector file format that contains polygons representing the bounding boxes of each detected object. The CSV file contains all the objects detected, with the (x, y) coordinates and (width, height) pairs.
Exploration and visualization
Let’s check out how our model did. You can open the GeoJSON file as any other JSON file, by reading and parsing it with the json module:
import jsonwith open(‘./results/pools.geojson’) as json_file:
data = json.load(json_file)pools_count = len(data[‘features’])
# => 678
In our case, our model detected 678 pools, way more than the 60 annotations we had. You may have different results depending on whether you manually annotated the image, and how the model converged when training.
Let’s plot both the satellite image and the GeoJSON vector file, this time using rasterio and geopandas. We are going to plot the annotations in green and prediction results in red.
from descartes import PolygonPatch
from rasterio.plot import showresults = gp.read_file("./results/pools.geojson")
annotations = gp.read_file("./annotations.geojson")with rasterio.open("./pools.tif") as src:
fig, ax = plt.subplots(1, figsize=(12, 12)) # Plot annotations in blue boxes
for _, row in annotations.iterrows():
ax.add_patch(PolygonPatch(row["geometry"], fc=None, ec="blue", zorder=2)) # Plot results in red boxes
for _, row in results.iterrows():
ax.add_patch(PolygonPatch(row["geometry"], fc=None, ec="red", zorder=1)) # Plot raster image and show plot
show(src.read(), transform=src.transform, ax=ax)
As you have appreciated, following the step by step of a well-defined methodology and using a very compact API and with a powerful level of abstraction, brings you reliable results to the table. For more information about our DYMAX platform and the availability of the API, as well as how to apply DYMAX to other use cases, please contact us: firstname.lastname@example.org.