AI for Good — Unit8 for WWF pt. 2

  • by Maurice Rupp
  • 15 December 2021 · 11 minutes

A report to predicting ground & soil characteristics by using machine learning on satellite data.


At Unit8 we are committed to dedicating a portion of our working time to pro bono projects. We believe that by applying advanced technology to difficult problems, we can help socially responsible organisations maximise their impact. Over the past two years, we have collaborated with the World Wildlife Fund (WWF) and their local partner Fundación amigos de la naturaleza (FAN; Spanish for “Friends of Nature”) to help improve the prediction of wildfire spread in South America, namely Bolivia.

In this post, we will not address the task of predicting fire behaviour, but a subproblem which we deem to be an important step to tackle the complex problem of predicting fire spread. Our plan is to determine so-called fuel models in the first place and use these predictions later on as feature inputs for fire spread modelling. Essentially, fuel models categorise geographical areas into distinct fire environments and are dependent on factors like the type of vegetation, the amount of burnable/dead material (fuel) and moisture content. From these categories, the potential fire behaviour can be derived and expressed as the  and , where areas with more dead material and dry wood will burn more rapidly and propagate fire faster. Expected rate of spread and flame length estimations aid humans in determining the spread as well as the likelihood of fires. Fuel models can therefore be understood as a proxy for such estimations and serve, in addition to other, less predictable factors such as heat or wind, as a key element for modeling wildfire spread.

The expected rate of spread and flame length for different grass fuel models plotted by midflame wind speed. Image taken from the initial publication.

The expected rate of spread and flame length for different grass fuel models plotted by midflame wind speed. Image taken from the initial publication.

Aside from using these predictions as additional inputs for subsequent tasks, FAN and the WWF both showed interest in having a functioning fuel model prediction tool, as fuel models can give indicators of where and how big the risk of active fires is. Unfortunately, there are no curated fuel model categories determined for South America and data acquisition in the wild for a big area is costly and inefficient. Therefore, a tool to determine fuel models by previously gathered satellite imagery and other measures could be highly beneficial.

The further structure of this post is as follows:

  • First, we will dive further into the theoretical basis of fuel models and the state-of-the-art of predicting them.
  • Second, we will present our approach to tackle the problem, how we deal with transferring the gained knowledge from one part of the earth to the other and show quantitative as well as qualitative results.
  • Third, we will conclude the post with a summary of the acquired insights, their limitations and the further direction of the project.

1. Theoretical Basis

Fuel Models

Researchers first developed  and later refined these to . For the sake of simplicity, we will only focus on the 13 initial fuel models in this blogpost, although we successfully experimented with the 40 fuel models as well.

As an example, the fuel model “short grass” (FM1) is described as aiding rapid fire spread and mostly containing surface fires only.

An exemplary image of the fuel model 1 “short grass”. Picture was taken from the initial publication.

An exemplary image of the fuel model 1 “short grass”. Picture was taken from the initial publication.

An overview of all fuel models and their characteristics can be found .

State-of-the-Art in Fuel Model Classification

For the 13 fuel models, ground truth data only exists for the  and . The areas were classified first by using a rule-based approach derived from different measures of soil and satellite data and afterwards validated and refined with fire and fuel specialists.

Unfortunately, researchers have only conducted limited experiments on predicting fuel models. Existing approaches either operate on very specific geolocations (examples  & ) or with data explicitly collected for their use-case (examples  & ) unfeasible for bigger areas. Most approaches either apply “classical” machine learning such as random forests, SVMs and linear regression or simple rule-based methods to determine the fuel models. Usually, a combination of multispectral imagery and LIDAR data is used as input. None of the approaches we found seemed to be extendable to a national or even global scale, since these highly precise LIDAR datasets used for the best performance have been gathered only for small areas.

2. Fuel Model Classification — Our Approach

Since the ground truth data for the USA offers a finer spatial resolution and a broader variety of vegetation than the Europe data, we decided to train our models on data from the USA to then transfer them to Bolivia in a second step. Certainly, we are aware that transferring geographical models to another continent with different biomes, seasons and dynamics entails certain risks, which we will address later in this post.


For the data sources, we focused on various global multispectral imagery datasets provided by the  (GEE). GEE is an immense data catalogue of satellite imagery and geospatial climate datasets with a processing engine to extract, aggregate, transform and analyse data. Since datasets vary in their spatial resolution, we chose to focus on two streams: One consisting of datasets that operate on a 30m or less scale and the other one consisting of datasets that operate on a 1000m or less scale. Generally, there are drastically fewer datasets that operate at a high resolution, explaining the difference in the number of datasets used. In both datasets, temporal features (e.g. soil moisture or water content are recorded weekly/daily/hourly) were aggregated to a single value per point by using the median over a year and all points were sampled randomly across the USA.
Each sample in the final dataset is a point in the USA and consists of the values gathered from the GEE datasets and a label indicating the fuel model for this specific geo-location. Besides the 13 fuel models, these labels also include four non-burnable categories like Urban or Open Water which, due to their nature, don’t have any fuel/fire potential.

30m Scale Dataset

For the higher resolution dataset, we gathered roughly 1 million samples from two GEE datasets:

An image of the satellite dataset Sentinel-2 of a small village in Bolivia. © Google Earth

An image of the satellite dataset Sentinel-2 of a small village in Bolivia. © Google Earth

1000m Scale Dataset

For the lower resolution dataset we gathered roughly 4 million samples from six GEE datasets:

More information to the specific datasets can be found in their description linked in the name of the dataset.

Data Pre-Processing

As general pre-processing we applied the following steps:

  • Removal of highly correlating features (e.g. the soil moisture measured at 40cm and 100cm below the ground)
  • Stratification of the dataset to have the same amount of samples for most labels (Since a few labels are highly underrepresented, we further applied sample weighting during classification to avoid bias).
  • Feature normalisation (removing the mean and scaling to unit variance for each feature)
  • Splitting the data into stratified training, validation and test sets (75%/19%/6%), where training and validation sets are used for hyperparameter tuning and the test set is used once after optimization.


We examined a variety of different machine learning classification models such as decision trees, logistic regression, random forests as well as a few neural networks such as . Over all scales and tasks, random forests outperformed the other approaches. A gridsearch over the most common hyperparameters yielded the best validation set accuracy for no capacity restraints except the amount of trees, which varied from 15 to 100 depending on the experiment. We used the Gini impurity as a splitting criterion and the square root of the amount of features as the number of features to consider when looking for the best split. Further, we applied bootstrap and balanced class weighting to deal with the remaining class imbalances. This is aligning with the default hyperparameters provided by the  in the package scikit-learn we used. Generally, adding more data to the dataset improved the performance to a significant extent.


Performance on USA

Overall, our model performed fairly decent regarding the amount of noise (clouds, seasonal shifts, measurement aggregation/accuracy) in the training data as well as the fact that the ground truth data includes some amount of uncertainty/subjectivity and limited granularity due to expert evaluation.

The test set metrics for the best-performing model.

The test set metrics for the best-performing model.

As shown in the confusion matrix below, most misclassifications happen to be classified as a fuel model with higher fire/fuel potentials, which could lead to an overestimation of fire spread later on.

Confusion matrix

Confusion matrix

Transferring the Model to Bolivia

In order to validate whether transferability from the USA to Bolivia could be possible, we compared the different features between the two areas. Criteria for possible transferability are whether the features follow similar distributions and whether they spread over the same range of values. We would want to avoid features where Bolivia contains values which are not present in the USA. Since the 30m scale data yielded better results for all metrics, we chose to focus on the features of this dataset for transferring the task. We randomly sampled 300’000 points and their feature values within Bolivia to compare them to the values within the USA.

Surprisingly, distributions and value ranges for many features are fairly similar, whereas Bolivia usually has a more narrow range of values than the USA. This could be due to the fact that the USA contains a broader variety of biomes and vegetations across the country.

Distributions in USA and Bolivia

Distributions in USA and Bolivia

However, there are two features which contain values in Bolivia not present in the USA. Unsurprisingly, Bolivia contains some points with higher elevation values than the USA. This can be explained by the fact that Bolivia contains a mountain chain with several tall mountains, which, comparatively, range over a bigger percentage of the country than in the USA. Further, the highest point within the borders of Bolivia is more than 400m higher than the corresponding point in the USA. The other feature, WVP, stands for Water Vapor Pressure. This value is causally dependent on temperature and height, whereas higher temperatures lead to higher WVP values and higher heights to lower WVP. Considering the findings from the elevation comparison (more areas with high altitude within Bolivia) and the fact that Bolivia generally has a more “extreme” climate (varying temperatures, humidity levels etc.) the differences of distribution appear reasonable.

Distributions in USA and Bolivia

Distributions in USA and Bolivia

Therefore, in the next iteration, we decided to exclude the features WVP and elevation from training to get an impression of how important these features are for classification and be able to apply the model to Bolivian data. Since there doesn’t exist any ground truth data for Bolivia, we manually labelled 150 points that were sampled randomly within the borders of Bolivia. It has to be noted that we are by no means experts in this field and classified the points to the best of our knowledge by using imagery from Google Earth and, if provided, photos of nearby locations. In order to compensate for our lack of knowledge, we bundled the 13 fuel model classes in broader categories, which contain the most similar fuel models in terms of fire potential and vegetation. This grouping follows a similar separation as in the original publication :

  • Grass: FM1 (short grass), FM2 (grass and understory), FM3 (tall grass)
  • Chaparral: FM4 (chaparral)
  • Shrub fields: FM5 (brush), FM6 (dormant brush), FM7 (southern rough)
  • Timber litter: FM8 (closed timber litter), FM9 (hardwood litter), FM10 (litter & understory), FM11/12/13 (logging slash, not present in the training and test data)
  • Non-burnable: NB1 (urban), NB2 (snow), NB3 (agricultural), NB8 (open water), NB9 (bare ground)
The test set metrics for the 30m scale model for the USA and Bolivia.

The test set metrics for the 30m scale model for the USA and Bolivia.

As expected, performance in the USA decreases by removing the two features. Nevertheless, applying the model on Bolivian data appears to be better justified this way. However, the performance on the datapoints from Bolivia is significantly worse than on the USA. Generally, a big portion of the drop in performance can be attributed to our missing possibility of validating the ground truth categories. We found a lot of ambiguity distinguishing even the broader categories due to missing photos of certain areas and insufficiently resolved satellite images.

The confusion matrix of the model applied to Bolivian data and grouped into the four supercategories.

The confusion matrix of the model applied to Bolivian data and grouped into the four supercategories.

As displayed in the figure above, the model most often confuses non-burnable areas with grass. One explanation could be that, without a photo of the exact location, it is extremely hard to determine whether the spot should be classified as grass, agriculture (non-burnable) or barren (non-burnable), and many mountainous regions contain rocky as well as grass areas side-by-side. An example of such a location can be found in the figure below. Therefore, the logical next step would be to reconsider our strategy of labelling Bolivian data.

A photography of a typical scenario where it is not determinable from satellite imagery whether a location’s surface consists of grass or barren. Top: © Augusto Moreno Prado, bottom: © Google Earth

A photography of a typical scenario where it is not determinable from satellite imagery whether a location’s surface consists of grass or barren. Top: © Augusto Moreno Prado, bottom: © Google Earth

3. Summary & Further Direction

In this post we defined the need for having an accurate fuel model prediction tool and proposed a solution to the problem by using global, publicly available datasets from Google Earth Engine and a Random Forest algorithm. However, the transfer to Bolivia was not straightforward. One could argue that there was not enough information present during labelling the Bolivian points to justify the proper evaluation of the model’s performance. In order to cope with this limitation, we will have to collaborate more closely with experts of the domain or even determine ground truth values for a set of locations with the help of said experts. Despite our attempts to address the given challenges of transferring a model from one geolocation to the other, the model will have to be examined more closely before applying it in the wild.

A focus of further improving overall model performance could be to gather more related and more accurate datasets about vegetation and other environment parameters or apply Computer Vision models such as a  or a  on the satellite data. We could also improve the American training data by making the features match the distribution of Bolivia more closely by discarding outlying points while gathering data.