Scalable Geospatial Data Platform on AWS with STAC and eoAPI
9 March 2026
by
Suraj Barman
Context & History
The agriculture sector increasingly relies on digital tools to turn raw satellite and drone imagery into actionable field insights. BASF Digital Farmings xarvio FIELD MANAGER platform processes millions of images daily, turning them into STAC items that power decision‑making for growers worldwide. Early attempts used custom pipelines that struggled with scaling and metadata consistency, prompting a shift toward open standards like the SpatioTemporal Asset Catalog (STAC) and the eoAPI ecosystem.
Implementation & Best Practices
Before diving into individual components, outline the deployment workflow: 1) Define data models using STAC collections and items, 2) Store raw assets in Amazon S3 as Cloud Optimized GeoTIFFs or FlatGeobuf files, 3) Index metadata with pgSTAC on Amazon RDS, 4) Serve raster tiles via TiTiler and vector tiles via TiPG, 5) Orchestrate services on Amazon EKS with KEDA for auto‑scaling, and 6) Expose APIs through Amazon API Gateway. This roadmap ensures each layer is loosely coupled, testable, and can be expanded without disrupting existing services.
Architecture Overview
The solution consists of four logical layers-core services, storage, database, and ingestion-each running as containerized workloads inside Amazon EKS. An architecture diagram (not shown) visualizes data flow from ingestion queues to storage buckets, through the STAC API, and finally to client applications via the API Gateway.
Core Services
Key services include the pgSTAC API, TiTiler for raster rendering, and TiPG for on‑the‑fly vector tiling. These components are packaged as Docker images and deployed with Helm charts, allowing versioned rollouts and easy rollback. Running core services on a shared EKS cluster simplifies networking and security management.
Storage Layer
All raw and processed assets reside in Amazon S3, using bucket policies that enforce encryption at rest. Raster data is stored as Cloud Optimized GeoTIFFs, enabling TiTiler to read only required byte ranges. Vector data uses FlatGeobuf for fast streaming access. S3 lifecycle rules automatically transition older assets to Glacier, reducing long‑term storage costs.
Database Layer
Metadata lives in a PostgreSQL instance on Amazon RDS, enhanced with the pgSTAC extension. An RDS Proxy sits in front to pool connections and protect the database during traffic spikes. Indexes cover spatial, temporal, and custom properties, allowing sub‑second queries across billions of items. For deeper details on building a STAC‑based platform on AWS, see the guide on building a scalable STAC‑based geospatial data platform on AWS.
Ingestion Pipeline
Ingestion is decoupled from serving via an Amazon SQS queue. Workers poll the queue, download source imagery, apply quality checks (cloud detection, anomaly filtering), generate COGs, and push metadata to pgSTAC. The pipeline is written in Python and runs as a Kubernetes Job, ensuring idempotent processing.
Visualization Services
TiTiler provides WMTS and XYZ endpoints for dynamic raster mosaics (e.g., NDVI, false‑color composites). TiPG serves Mapbox Vector Tiles directly from PostGIS, enabling fast field‑boundary overlays in web maps. Clients access these services through a unified API Gateway endpoint that adds rate‑limiting and JWT authentication.
Autoscaling with KEDA
Kubernetes Event‑Driven Autoscaling monitors SQS queue depth and API request latency, scaling pods up or down in seconds. This approach matches compute resources to real‑time demand, keeping costs low while preserving responsiveness during peak harvest periods.
Security and Access Control
IAM roles restrict each service to the minimum required S3 buckets and RDS resources. API Gateway integrates with Amazon Cognito for user authentication, and fine‑grained authorizers enforce role‑based access to sensitive datasets. Regular security audits and automated compliance checks are part of the CI/CD pipeline.
Lessons Learned
Adopt open standards early to avoid vendor lock‑in.
Separate ingestion from serving to improve reliability.
Use serverless components (SQS, Lambda) for lightweight tasks.
Continuously monitor cost metrics S3 lifecycle and KEDA scaling deliver significant savings.
For broader best‑practice guidance on AWS architecture, refer to the AWS Well‑Architected Machine Learning Lens guide.