Skip to content

dimuzzo/testing-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Testing Project by Alessandro

GitHub last commit GitHub repo size GitHub stars


πŸ“Š Analyzation Part - Testing libraries and databases like GeoPandas, OSMnx, Rasterio and DuckDB (+ QuackOSM)

The first part of this repository contains the Jupyter Notebooks for each library, database or tool tested, as the primary goal is to understand how all of them work and which are their limits during Heavy Benchmarking tests.

πŸ—Ί 1. GeoPandas - Analysis and manipulation of vector geographic data

GitHub Repo

Official Site

Tutorial on YouTube

How to Install

  • via Bash:

      pip install geopandas
    
  • via Python:

      import geopandas as gdp
    

πŸ›£ 2. OSMnx - Download and view data from OpenStreetMap

GitHub Repo

Official Site

Tutorial on YouTube

How to Install

  • via Bash:

      pip install osmnx
    
  • via Python:

      import osmnx as ox
    

πŸ›° 3. Rasterio - Working with raster data and satellite imagery

GitHub Repo

Official Site

Tutorial on YouTube

How to Install

  • via Bash:

      pip install rasterio
    
  • via Python:

      import rasterio
    

πŸ₯ 4. DuckDB with spatial extension - Embedded SQL database with GIS support

GitHub Repo

Official Site

Tutorial on YouTube

How to Install

  • via Bash:

      pip install duckdb
    
  • via Python:

      import duckdb
    

πŸ¦† 5. QuackOSM - An open-source tool for reading OpenStreetMap PBF files using DuckDB

GitHub Repo

Official Site

Tutorial on YouTube

How to Install

  • via Bash:

      pip install quackosm
    
  • via Python:

      import quackosm as qosm
    

πŸ”— Additional links

CORINE Land Cover (Copernicus):

Human Settlement Layer (Copernicus):

OpenStreetMap Wiki, Data (via OSMnx or GeoFabrik), Community and Tools:

Some Videos to Check:


πŸ–₯️ Comparison Part - Comparing the performance of 3 different Geospatial technologies

The second part of this repository contains the code made and the results obtained to make a Comparative performance analysis of Geospatial technologies.

The goal is to quantitatively measure and qualitatively assess the trade-offs between modern, file-based systems and traditional, server-based databases for common Geospatial tasks.

The analysis is structured around the TDL (Technologies, Data, Libraries) framework, which clearly distinguishes the core components of each benchmark.

πŸ“Œ Technologies Under Comparison

The benchmark evaluates 3 primary technology stacks:

  1. DuckDB (+ Spatial Extension): A modern, in-process analytical SQL database known for its high speed and efficient handling of columnar data formats like GeoParquet.
  2. PostgreSQL (+ PostGIS Extension): The industry-standard, open-source object-relational database server, renowned for its robustness and comprehensive set of spatial features.
  3. GeoPandas (Pure Python Stack): The leading library for Geospatial analysis in Python, representing a fully in-memory, file-based approach built on pandas, shapely, and pyproj.

πŸ“‹ Benchmark Use Cases

A series of 5 realistic Use Cases were designed to test the technologies on a range of tasks with varying complexity and data scales (using data for Pinerolo, Milan and Rome).

  • Use Case 1 & 2 - Ingestion & Filtering: Measures the efficiency of reading raw data (from .pbf, .shp, .tif) and extracting specific subsets.
  • Use Case 3 - Single Table Analysis: Benchmarks performance on single-dataset geometric calculations and aggregations (e.g. Calculating areas, buffers).
  • Use Case 4 - Complex Spatial Joins: Stress-tests the systems with computationally intensive multi-dataset joins (e.g. Point-In-Polygon, Proximity analysis).
  • Use Case 5 - Vector-Raster Analysis: Evaluates performance on zonal statistics, a task combining vector and raster data.

πŸ—ƒοΈ Project Structure & Results

  • /scripts contains all the Python scripts used to run the benchmarks in a reproducible manner.
  • /results contains the raw quantitative outputs of the benchmarks in a central benchmark_results.csv file.
  • The GitHub Wiki Page contains the final, formatted tables with the summarized results and qualitative observations for each Use Case.

Created by dimuzzo

About

Testing libraries and databases like OSMnx, GeoPandas, Rasterio, DuckDB and QuackOSM. Comparing DuckDB, PostGIS and GeoPandas.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published