Deep learning and Soil Science — Part 2

Digital Soil Mapping using contextual spatial information

This is the second article of a Series that I am devoting to the use of Deep Learning in Soil Science. This is an ongoing series and so far it also includes:

In the first part of this series, I gave some context about how soil scientist collect information, which usually involves some field work and laboratory analysis. This is an expensive and time-consuming process and that is one of the reasons why we try to build models to predict soil properties.

In this article I focus on spatial models to generate maps. First I give some context about the traditional and “machine learning” way of producing soil maps. Then I dive into why contextual information is important and how we can use convolutional neural networks (CNNs) to leverage this information.


History of soil theory

In the late 1800’s, the geologist and geographer V. V. Dokuchaev, in charge of mapping the soils of the Russian Empire, developed the idea that soil formation (and hence variation) was dependent on multiple factors including its parent material, climate, topography, vegetation, and time. This general concept is the base of modern pedology (study of the soils).

Traditional soil mapping

Humans have been mapping soils for a long time. To decide what to grow and where, for taxation purposes, etc. To understand soil, and nature in general, observation is a key component. That is exactly what is needed to map soils. After digging a pit, we describe the soil profile and its strata to try to understand its history. But that is just one part of the story. Since that pit is immerse in the landscape, the soil scientist observes the surroundings before drawing any conclusion (or map).

After describing and observing, it is time to draw a map! You position all the points (pits) in a blank paper and draw polygons around them based on: 1) their (dis)similarity and 2) the information about the forming factors. After a while (probably many years in this example), you obtain something like this:

Humus content in Russian soils. Dokuchaev (1883).

Digital Soil Mapping

Since Dokuchaev’s work, things have changed. Not much in the theory, but in how we observe nature and how we process the data. Now, technology is present in most of the process, from a GPS to get the exact location of the pit, to satellite imagery that describes the soil forming factors.

In traditional soil mapping, the conclusions derived from the observation of the interactions between the forming factors are drawn in the head of the soil scientist. In Digital Soil Mapping (DSM), the whole process is now aid by modelling and machine learning. A soil scientist can learn that the soils in a valley are different from the soils in a hill-slope. And sure we can train a model to do the same, right?

DSM is a dynamic sub-field of soil sciences, so it is hard to summarise all that the community is doing. We use models from linear regressions, to random forests. We have many sources to obtain predictors (forming factors), including satellite imagery and derived products. To get more information about DSM, I refer you to McBratney et al. (2003).

Convolutional neural networks and DSM

As I mentioned in the previous section, the theoretical background of DSM is based on the relationship between a soil attribute and soil forming factors. In practice, a single soil observation is usually described as a point p with coordinates (x,y) and the corresponding soil forming factors are represented by a vector of pixel values of multiple covariate rasters (a1,a2,…,an) at the same location, where n is the total number of covariate rasters.

This point representation is definitely useful but it is the equivalent to a soil scientist just looking at the soil profile without considering the surrounding landscape. To complete the picture, we can expose the model to the spatial context of each observation… the equivalent of stepping out of the soil pit and looking around.

With the help of CNNs, we can expand the classic DSM approach by including information about the vicinity around (x,y) and fully leverage the spatial context of a soil observation. We can replace the covariates vector with a 3D array with shape (w,h,n), where w and h are the width and height in pixels of a window centred at point p.

Representation of the vicinity around a soil observation `p`, for `n` number of covariate rasters. `w` and `h` are the width and height in pixels, respectively. Each raster `A` is a proxy for a forming factor.

Because I’m a fan of multi-task learning, we used a CNN with the 3D array as input and generated predictions for a soil property (soil organic carbon) at 5 depth ranges, based on a digital elevation model, slope, topographic wetness index, long term mean annual temperature and total annual rainfall. The network looks something like this:

Multi-task network architecture

The head of the network (“Shared layers”) extract a general representation of the data which is then directed to 5 different branches, one for each target depth. The branches should be able to learn signals that are specific for each depth.


Data augmentation

Data augmentation is a common pre-treatment in machine learning. When we talk about maps, we usually have a top-view of the point of interest so the simplest method to augment the data is by rotation. Here we rotated the images by 90, 180 and 270º. There are two benefits of doing this:

  • The most obvious advantage is that we effectively quadruplicated the number of observations.
  • The second advantage is that we help the model to make more robust generalisations by inducing rotation invariance.
Effect of using data augmentation as a pre-treatment.

As expected, data augmentation was effective at reducing model error and variability (Fig. 4). We observed a decrease of the mean error, by 10.56, 10.56, 11.25, 14.51, and 24.77% for 0–5, 5–15, 15–30, 30–60 and 60–100 cm depth ranges, respectively.

Vicinity size

Effect of vicinity size on prediction error, by depth range. Ref_1x1 corresponds to a fully connected neural network without any surrounding pixels. Ref_Cubist corresponds to the Cubist models used in a previous study (Padarian et al., 2017).

The size of the neighbourhood window (vicinity) had a significant effect on the prediction error (Fig. 5). Sizes above 9 pixels showed an increase in the error. In this example for a country-scale mapping of SOC at 100 m grid size, information from 150 to 450 m radius is useful. This range is similar to the spatial correlation range was reported for croplands in an review by Paterson et al. (2018), where, based on 41 variograms, the authors estimated an average spatial correlation range of around 400 m. Since we used a relatively coarse pixel resolution (100m) it is hard to tell what is the minimum amount of context needed to improve SOC predictions. We believe that using higher resolutions (< 10m) could produce more insights about this matter.

Comparing the CNN results with a more traditional approach, the CNN significantly decreased the error by 23.0, 23.8, 26.9, 35.8, 39.8% for the 0–5, 5–15, 15–30, 30–60 and 60–100 cm depth ranges, respectively.

Prediction of multiple soil layers

In DSM, there are two main approaches to deal with the vertical variation of a soil property. You can make prediction layer by layer (depth is implicit), or you include depth in your model (depth is explicit). Both approaches show a decrease in the variance explained by the model as the prediction depth increases. This is expected since the information used as covariates usually represents surface conditions.

In this study we can see again the synergistic effect of using a multitask CNN (I talked about that in the first post). As shown in the figure bellow, in this case the variance explained by the model actually increased with depth. Absolute values of R² shouldn’t be compared between models and datasets, but it is absolutely possible to compare the trends.

Percentage change in model R² in function of depth.

I think this is the main reason I like multi-task CNNs. But be aware that it doesn’t always work… In a future post I will show you some examples.

What about the maps?

If you have reached this point, you definitely deserve a map! That is what the whole introduction was about after all, right? This study was carried out using soil information from Chile. In the figure bellow you can see an example of the predictions in a small test area (I will share a link to the full map eventually).

Detailed view of (left panel) map generated by a Cubist model (Padarian et al., 2017) and (right panel) model generated by the multi-task CNN.

Visually, the maps generated with the CNN showed some differences compared with the traditional model (Cubist). The map generated with the Cubist model shows more details related with the topography, but also presents some artefacts due to the sharp limits generated by the tree rules, and possibly some artefacts of the covariate rasters. On the other hand, the map generated with the CNN shows a smoothing effect, an expected behaviour consequence of using neighbour pixels.

It’s hard to visually evaluate a map because we inevitably judge it based on aesthetics. How smooth or sharp is reality? That is permanent discussion in my field. The traditional soil polygon is not the best way of describing a soil since soil is a continuum, but it is not hard to find cases when we see sharp changes between two soils, hence a very smooth raster might be wrong… Most likely the solution is somewhere in between.

Final words

Digital soil mapping is a very interesting and dynamic discipline, and it is nice to see that methods like convolutional neural networks are applicable here. Intuitively, a method that is capable of exploiting contextual spatial information fits perfectly within the theoretical framework of soil science.

Also, we saw again the synergistic effect of the multi-task network by regularising the predictions in depth. If the model already made an effort to predict the top layer, it should definitely use that to guide the prediction of the deeper layers! That is exactly what a soil scientist does when describing a profile!

I’ve been promising a post about transfer learning, and hopefully that will be the next post. I have a half-done draft, but my PhD is keeping me busy… so be patient please!


More details about this work can be found in the corresponding paper.

Padarian, J., Minasny, B., and McBratney, A. B., 2018. Using deep learning for Digital Soil Mapping, SOIL Discuss. Under review.

Note 04/09/2018: The paper has not been accepted yet and it is under public review and discussion until 15/10/2018. You are welcome to participate in this process.


Dokuchaev, V. V., 1883. Schematic Map of Humus Content in the Upper Horizon of Soils of the Chernozemic Zone: Supplement to the Book “Russian Chernozem”.

McBratney, A.B., Santos, M.M. and Minasny, B., 2003. On digital soil mapping. Geoderma, 117(1–2), pp.3–52.

Padarian, J., Minasny, B., and McBratney, A.: Chile and the Chilean soil grid: a contribution to GlobalSoilMap, Geoderma Regional, 9, 17–28, 2017.

Paterson, S., McBratney, A. B., Minasny, B., and Pringle, M. J.: Variograms of Soil Properties for Agricultural and Environmental Applications, In: Pedometrics, pp. 623–667, Springer, 2018.

Source: Deep Learning on Medium