Geospatial Feature Discovery Supercharges Models

Designing the ML model is often considered the most interesting and intellectually challenging aspect of the supervised ML process – yet it’s tempting to stick to the same simple feature types to save time and avoid making mistakes.

It’s easy to see why time-pressured data science teams may spend less time on advanced geospatial feature discovery than they should. But the variety, depth and complexity of geospatial features impacts the quality of your model’s output to such a significant degree you may see that it's really worth the effort. 

Advanced geospatial feature discovery: where to start?

Geospatial feature discovery in particular can be a complex process and it’s tempting to stick to the same simple feature types to save time and avoid making mistakes.

Even though advanced geospatial features benefit a broad range of supervised learning use cases, it’s easy to see why time-pressured data science teams may spend less time on them than they should. But the variety, depth and complexity of geospatial features impacts the quality of your model’s output to such a significant degree that advanced geospatial feature discovery is really worth the effort. 

So, in this article, we illustrate how geospatial feature discovery can become interesting really quickly and outline a few ideas to get you started. 

Straightforward geospatial features

What do we mean when we refer to a geospatial feature? It’s a broad term that covers any features based on the location of entities in your training dataset. For example, if your training data rows represent different shop locations then you can create a simple geospatial feature by using census data to find the average population age in the neighbourhood of each shop. Another example would be to look at nearby points of interest (POI) - e.g, whether a specific location is within 5 miles of an airport.

An easy way to build richness into your geospatial features is to augment your data space, enriching your features by thinking creatively about the data you use - or changing the way you use those data sets.

Consider reviewing the data sources within your organisation to see whether you’re missing something of value. Often data is siloed within different departments so take the time to ask around, you never know what data treasure you may uncover. You can also think about including open source datasets, for example OpenStreetMap for POI or open data Government portals such as data.gov in the USA. There are also a wide range of commercial organisations offering a rich variety of datasets, examples include Demyst and WorldData.AI. 

Adding additional data means your geospatial features remain simple, but you’re nonetheless adding richness by working with more data. However, it’s also worth thinking creatively to determine how you can expand the technical complexity of your geospatial features. 

Examples of more complex geospatial features

There are many ways in which you can build on feature complexity. An easy starting point is to include the attributes of POIs in your model rather than just their location. For example, rather than simply counting the number of competitor shops nearby, consider breaking this down by opening hours or whether or not the competitor has a cafe. Here are a few more examples of more advanced geospatial features:

  • Go beyond 2D – for example, if you are trying to predict basement flood risk, consider building height above sea level or even the gradient of nearby streets.

  • Work with area shapefiles as well as POI datasets - for example, if you are trying to predict house prices, link each house to all nearby census districts and aggregate their socio-demographic and economic characteristics. This is particularly useful where a location is close to the border of two areas.

  • Think about co-location - for example, is an event more likely to occur if both A and B are located nearby?

  • Test location novelty to help you determine if an event is more likely to occur if an entity is outside of their usual location, e.g. whether drivers with more consistent routes are more or less likely to have accidents.
  • Use “multi-hop” to consider scenarios where a single training dataset entity has multiple geospatial locations associated with it, e.g. analysing the performance of an oil pipeline by first identifying all locations along the pipeline where oil is ingested and then aggregating measurements such as porosity and permeability surrounding each location.

These are just a few examples of what we mean by advanced geospatial features. We’ll now examine two geospatial ideas in more detail.

Deep dive into advanced geospatial feature discovery

The first idea is to use shapefiles in conjunction with POI data. For example, instead of simply considering distance as the crow flies, try using isochrones. These measure travel time by taking into account physical characteristics including barriers such as rivers or fences as well as road layout and type.

For example, you can use isochrones to measure how many competitor shops are within ten minutes’ walk from a store or how many offices are within a thirty minute commute of a house. There are a range of open source and commercial tools available to help you to create walking, cycling, driving and commuting isochrones. These can be output as shapefiles and packages like GeoPandas in Python can be used to overlap them in a variety of ways with POI datasets. 

Another way to enrich geospatial features is to add a temporal component. A simple example could be restricting your features to nearby house sale prices within the last two years when predicting house prices in a specific location.

You can further build on temporal data by identifying nearby time series datasets and performing aggregations on them. For example, if you are predicting failure in a water pipe network you can identify all nearby pipe sections and calculate the average increase in their number of repairs over the past year.

You can also add traffic data at different points in time to enrich your geospatial features. For example, how are churn rates at different gyms impacted by the level of car or foot traffic at lunch time or in the evening rush hour? Depending on your geography of interest there are a range of different commercially available traffic data sets e.g. Carto and SafeGraph in the USA.

Advanced features lead to advanced models

We’ve provided a few examples of advanced geospatial features and, as you can see, the level of complexity quickly builds. Is it worth investing the time to investigate these complex features? If you care about uncovering interesting insights and building a great model then the answer is a resounding yes. For any use case where the target varies with location a solid foundation of geospatial features is critical to success. 

Automated feature discovery tools such as SparkBeyond can help you explore these and many more features from complex data faster, more thoroughly, and more systematically. But whether you are using a tool or coding them up yourself, we hope you have been convinced that building advanced geospatial features both provides you with a powerful tool and allows you to have fun exercising your creative side.

Features

No items found.
No items found.

By Joanna Mclenaghan, Director of Data Science at SparkBeyond.

SparkBeyond Discovery is a data science platform for supervised machine learning that helps data professionals save time, deepen their understanding of the problem space and improve model performance by automating feature discovery in complex data. 

Get a personalised demo to see the platform in action or watch the on-demand demo.

It was easier in this project since we used this outpout

Business Insights

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Predictive Models

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Micro-Segments

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Features For
External Models

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Business Automation
Rules

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Root-Cause
Analysis

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Join our event about this topic today.

Learn all about the SparkBeyond mission and product vision.

RVSP
Arrow