Utilising Strava Metro and other open datasets to estimate visitor counts to natural spaces across England

As the organisation responsible for improving and protecting the environment in the UK, the Department for Environment Food & Rural Affairs (Defra), along with the Data Science Campus at the Office for National Statistics (ONS) and Natural England, have developed a novel solution for estimating the number of visitors to natural spaces across England.

Defra’s Environmental Improvement Plan 2023 sets out key targets and commitments on access and engagement with nature, including a commitment that everyone should live within 15 minutes’ walk of a green or blue space. But in order to track progress towards these goals, secure needed funding, and even provide for routine maintenance, we needed to develop a sustainable way to understand the current usage levels and patterns.

To move forward, we aligned on necessary criteria to guide the project:

  1. Acquisition and use of the data needed to be inexpensive.

  2. The datasets needed to have sufficient spatial and temporal coverage.

  3. The data, and the method of working with it, needed to be widely applicable to areas throughout the country.

To estimate the visitors to these natural spaces, we developed a model that combines the aggregated, de-identified data from Strava Metro with carefully selected open or free-to-access spatial datasets such as automated people counters and indicators of local environmental and social conditions. While these results are experimental, the initial outcomes are promising, showing that a reasonable performance can be achieved.

Model Development Process

Our modelling approach involves estimating the Monthly Average People Count (MAPC) to best replicate the visitor numbers from the automated people counters. The dataset used to train our model was created by extracting features from all datasets from Figure 1 below, including Strava Metro, land type, demographics, and weather, in a five-kilometre area surrounding the location of the automated people counters. Because Strava Metro data is generated only by app users, we recalibrated this data by integrating Strava Metro with the above-mentioned datasets, more details of the methodology can be found here and here.

Fig. 1: Dynamic and static variables combined to develop a spatio-temporal model to estimate Monthly Average People Count.

To evaluate and calibrate the estimations made by our model, we compared the model outcome to the visitor numbers from the automated people counters data as a means to verify the model output accuracy. As a result, we are also able to estimate visitor numbers in areas without people counters (after calibrating the model against the places where people counters are installed) making it useful across the country.

Fig. 2: Comparisons of regional mean of Monthly Avge People Count estimated by models to the automated people counter data. The performance is better in regions with a higher number of training sites.

A more in-depth look at the process of developing this model is available in our technical paper.

Summary and Next Steps

While this project has successfully developed a proof of concept machine learning model that can estimate Monthly Average People Count for a diverse range of natural sites across England, model performance is strongly influenced by the availability of ground truth people counter data. In the next phase, we plan to work with a range of partners that maintain and can share people counter data to address this limitation. Over the next phase, plans are in place to use additional sources of people counter data to improve the model estimations and recalibration of Strava Metro data.

The outputs will then be used to develop a new experimental statistic on visits to natural places.The outputs of this model can aid decision making and offer a low-cost alternative to existing methods such as automated people counters. This approach has a wide range of potential uses. For example, measuring the use of new trails like the England Coastal Path or estimating visits to National Parksaccessible greenspace or nature reserves . It could also offer a model for measuring general activity levels in a range of other policy settings such as health, active travel, recreation, and tourism.

If you have any inquires about this project please contact: datacampus@ons.gov.uk or Tim Ashelford @ Tim.Ashelford@defra.gov.uk

GitHub repository