Crowdsourced data are increasingly being used in transportation planning and operations. Take, for example, the Strava Metro dataset. The mobility data platform is powered by Strava’s more than 90 million worldwide users who track their activities such as bike rides, runs, walks, and hikes, and upload them to Strava. These trips are aggregated and de-identified to provide active transportation insights, and the data is available free of charge to urban planners as they’re making decisions about bicycle and pedestrian infrastructure.
The Strava Metro platform, called Metroview, provides bicycle and pedestrian trip counts for each road segment, accessible in hourly, daily, monthly, and yearly trip aggregations. This data is then mapped onto a representation of the transportation network OpenStreetMap (OSM).
How to overcome the sampling bias in crowdsourced data like Strava?
To ensure that the Strava Metro data provides a representative picture of the total cyclist population in a city, a common approach is to build a statistical model that takes Strava bicycling estimates along with actual/official cycling counts, different network, population, and land use characteristics to provide corrected estimates of bicycling counts. This approach has been proven to address the data representativeness concerns when it comes to crowdsourced data (see Jestico et al., 2016; Roy et al., 2019; Nelson et al., 2021).
Working with the state government
We have worked with Transport for NSW on a proof of concept research study to develop a machine learning model to estimate bicycling volumes, number of bicycling trips and total bicycling kilometers travelled across the Sydney metropolitan area over a 2–3 years period (2019–2021).
Transport for NSW leads the development of safe, integrated and efficient transport systems for people of New South Wales in Australia. It is responsible for strategy, planning, policy, regulation, funding allocation and other non-service delivery functions for all modes of transport in NSW including road, rail, ferry, light rail, point to point, regional air, cycling and walking.
What data goes into the model?
We integrated multiple data sources including Strava Metro link-level bicycling counts across Sydney, official bicycle counts from more than 120 locations, population and land use data from Census, topographical data to estimate road slope, air quality data to capture the effect of falling air quality due to bushfires, temperature and rain data among others.
Training a Decision Tree Learning model
Using the collated data, we trained and tested a tree regression model with a strong predictive power. Decision tree learning is a popular machine learning approach given its simplicity and predictive power. The trained and tested model was then applied across the network over more than a 2-year period to study the impacts of COVID-19 travel restrictions and the 2019–20 Australian bushfires on the cycling travel patterns across Sydney.
Some insights from the model outcomes
The time-series plot below provides interesting insights on the impact of seasons, low air quality due to bushfires, the first and second COVID-19 lockdowns in Sydney and the overall increase in the number of bicycling trips since 2019. The average number of bicycling trips on a weekday in Sydney increased by 22% in 2020 compared to 2019. The number of bicycling trips on an average weekend also increased by 23% in 2020 compared to 2019. The increased level of bicycling ridership sustained its high levels in 2021 despite a major and lengthy COVID-19 lockdown.
Strava Metro data is proving to be a promising and powerful data source in providing insights into bicycling ridership and travel patterns over space and time when proper methods are used to address its sampling bias.
Acknowledgment
The work presented here is sponsored by Transport for New South Wales (Sandeep Mathur, Director — Active Transport Portfolio, Data & Analytics) and is carried out by a team of researchers at the University of New South Wales (Meead Saberi, Chris Pettit, Tanapon Lilasathapornkit, and Parisa Zaré) in collaboration with footpath.ai, Ben Beck from Monash Univerity and Trisalyn Nelson from University of California, Santa Barbara.