π Analyzation Part - Testing libraries and databases like GeoPandas, OSMnx, Rasterio and DuckDB (+ QuackOSM)
The first part of this repository contains the Jupyter Notebooks for each library, database or tool tested, as the primary goal is to understand how all of them work and which are their limits during Heavy Benchmarking tests.
- List of Videos: https://www.youtube.com/watch?v=slqZVgB8tIg&list=PLLxyyob7YmEEbXc1R6Tc5YvVIAYPuvoMY
- Beginner's Guide: https://youtu.be/t7lliJXFt8w?si=cgZfXHD51c-dLSgV
-
via Bash:
pip install geopandas
-
via Python:
import geopandas as gdp
-
via Bash:
pip install osmnx
-
via Python:
import osmnx as ox
- Beginner's Guide: https://youtu.be/LVt8CezezZQ?si=QmbTTG2S9PZNttDv
- GeoTIFF + Rasterio Tutorial: https://youtu.be/ieyODuIjXp4?si=7In_IOQWZodHGlmI
-
via Bash:
pip install rasterio
-
via Python:
import rasterio
-
via Bash:
pip install duckdb
-
via Python:
import duckdb
- Livestreaming from the Developer (Kamil Raczycki): https://www.youtube.com/live/r6cWiSULgYs?si=8zYZciaU-MOGZpeY&t=1928
-
via Bash:
pip install quackosm
-
via Python:
import quackosm as qosm
- https://wiki.openstreetmap.org/wiki/Using_OpenStreetMap
- https://wiki.openstreetmap.org/wiki/Relation
- https://wiki.openstreetmap.org/wiki/Map_features
- https://download.geofabrik.de/
- https://github.com/openstreetmap
- https://overpass-turbo.eu/
- https://towardsdatascience.com/how-to-read-osm-data-with-duckdb-ffeb15197390/
The second part of this repository contains the code made and the results obtained to make a Comparative performance analysis of Geospatial technologies.
The goal is to quantitatively measure and qualitatively assess the trade-offs between modern, file-based systems and traditional, server-based databases for common Geospatial tasks.
The analysis is structured around the TDL (Technologies, Data, Libraries) framework, which clearly distinguishes the core components of each benchmark.
The benchmark evaluates 3 primary technology stacks:
- DuckDB (+ Spatial Extension): A modern, in-process analytical SQL database known for its high speed and efficient handling of columnar data formats like GeoParquet.
- PostgreSQL (+ PostGIS Extension): The industry-standard, open-source object-relational database server, renowned for its robustness and comprehensive set of spatial features.
- GeoPandas (Pure Python Stack): The leading library for Geospatial analysis in Python, representing a fully in-memory, file-based approach built on
pandas
,shapely
, andpyproj
.
A series of 5 realistic Use Cases were designed to test the technologies on a range of tasks with varying complexity and data scales (using data for Pinerolo, Milan and Rome).
- Use Case 1 & 2 - Ingestion & Filtering: Measures the efficiency of reading raw data (from
.pbf
,.shp
,.tif
) and extracting specific subsets. - Use Case 3 - Single Table Analysis: Benchmarks performance on single-dataset geometric calculations and aggregations (e.g. Calculating areas, buffers).
- Use Case 4 - Complex Spatial Joins: Stress-tests the systems with computationally intensive multi-dataset joins (e.g. Point-In-Polygon, Proximity analysis).
- Use Case 5 - Vector-Raster Analysis: Evaluates performance on zonal statistics, a task combining vector and raster data.
/scripts
contains all the Python scripts used to run the benchmarks in a reproducible manner./results
contains the raw quantitative outputs of the benchmarks in a centralbenchmark_results.csv
file.- The GitHub Wiki Page contains the final, formatted tables with the summarized results and qualitative observations for each Use Case.
Created by dimuzzo