In the previous posts, we looked at cloud-native relational databases and NoSQL databases. Next up, data warehouses.

Data warehouses (Online Analytical Processing (OLAP) databases) are interactive data analysis tools for large datasets. They enable you to store large volumes of data cheaply and provide an interface where you can run fast complex queries across the data. They excel at analyzing data and are typically poor at writing data with uploads often done in bulk. 

 

Spatial Support

With an increasing accumulation of spatial data across all business systems, geospatial analysis has become a key requirement for any modern analytics platform. Until recently, none of the cloud data warehouses supported geometry. That has now changed with Google Big Query, Snowflake, and AWS Redshift adding support for spatial data. This means you can start thinking about how you can leverage location to help improve your decision making!

 

Database Geospatial Support FME Support
AWS Athena

Documentation

Type of Support: Native

Spatial Types: Geometry 

Supported Types: Points, Lines, Polygons, Multipoint, Multilines, Multipolygons and Geometry Collections 

Spatial Functions: Over 30 functions

Run queries on S3 datasets. Results are also stored on S3
AWS Redshift

Documentation

Type of Support: Native

Spatial Types: Geometry and 2D support only

Supported Types: Points, Lines, Polygons, Multilines and Multipolygons

Spatial Functions: Over 40 functions

Read, write and query
Synapse Analytics Type of Support: No support for spatial types
Google Big Query

Documentation

Type of Support: Native

Spatial Types: Geography

Supported Types: Points, Lines, Polygons.  

Spatial Functions: Within, Distance, Contains, Touches, Covers

Read, write and query
Snowflake Type of support: Native

Spatial types: Geography

Supported types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection, Feature, FeatureCollection 

Spatial Functions: Over 30 functions

Read, write and query

As you can see, almost all listed data warehouses support spatial data types natively which is a huge deal. There are, however, limits with each database only supporting either geography or geometry types. Since the data you are pulling into the data warehouse is likely from many different systems, you will need to clean and reproject the data before loading. FME can help with this.

 

Use Cases

The main use-case for data warehousing is to unify disparate data sources into a single location so complex queries can be run to ask questions of the data. Even large complex queries run against huge datasets will run in seconds—and relatively cheaply—this means you can afford trial and error to get the right query. Since the interface is in SQL, it is easy for analysts and GIS people who are familiar with SQL to construct queries. Compare this to something like MapReduce which is much more complicated to use and queries can take days.

Here are a few common scenarios:

 

Conclusion

You can use FME to help with all of the use-cases defined above. FME enables you to extract and load data from data warehouses using the spatial SQL functions. On loading, you can pull disparate data sources together so you can run custom analytics across all of your data silos.

If you want to have a play around with a Data Warehouse but don’t have a huge dataset, BigQuery has a public dataset (with a spatial component) which contains all of the trips taxis have taken in NYC since 2009.

About Data

Stewart Harper

Stewart is the Technical Director of Cloud Applications and Infrastructure at Safe. When he isn’t building location-based tools for the web, he’s probably skiing or mountain biking.

Related Posts