A comprehensive federated data management platform that provides a unified REST API for managing datasets, streaming data, and integrating multiple data services. The POP API serves as a central point of access for data discovery, registration, and management across distributed CKAN instances, Kafka streams, and cloud storage.
- Federated Data Discovery: Search and access datasets across multiple CKAN instances (local, global, pre-production)
- Multi-format Data Sources: Support for URLs, S3 buckets, Kafka streams, and various file formats (CSV, JSON, NetCDF, TXT)
- Real-time Streaming: Kafka integration for live data streams and event processing
- Centralized Authentication: Keycloak integration for secure, role-based access control
- Service Registry: Register and discover microservices and APIs
- System Monitoring: Built-in metrics collection and health monitoring
- JupyterLab Integration: Direct access to data analysis environments
- RESTful API: Comprehensive OpenAPI/Swagger documentation
Get the POP API running in under 5 minutes:
- Docker and Docker Compose
- Git
git clone https://github.com/sci-ndp/pop.git
cd popcp example.env .env
# Edit .env with your configuration (see Configuration section below)docker-compose up -d- API Documentation: http://localhost:8001/docs
- Dashboard: http://localhost:8001/
- Health Check: http://localhost:8001/status/
-
Clone and configure:
git clone https://github.com/sci-ndp/pop.git cd pop cp example.env .env -
Edit configuration (see Configuration section)
-
Start services:
docker-compose up -d
-
Prerequisites:
# Python 3.9+ python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment:
cp example.env .env # Edit .env file -
Run the application:
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
# CKAN Configuration
CKAN_LOCAL_ENABLED=false # Enable local CKAN instance
CKAN_URL=http://localhost:5000 # Your local CKAN URL
CKAN_GLOBAL_URL=https://global-ckan.example.com # Global CKAN URL
CKAN_API_KEY=your-api-key # CKAN API key
# Authentication
KEYCLOAK_URL=http://localhost:8080
REALM_NAME=your-realm
CLIENT_ID=your-client
CLIENT_SECRET=your-secret
# API Settings
SWAGGER_TITLE=POP API
SWAGGER_DESCRIPTION=Point of Presence Data Management API
ORGANIZATION=Your Organization Name# Kafka Streaming
KAFKA_CONNECTION=true
KAFKA_HOST=localhost
KAFKA_PORT=9092
# JupyterLab Integration
USE_JUPYTERLAB=true
JUPYTER_URL=https://jupyter.example.com
# DXSpaces Integration
USE_DXSPACES=true
DXSPACES_URL=https://dxspaces.example.com
# Pre-production CKAN
PRE_CKAN_ENABLED=true
PRE_CKAN_URL=https://pre-ckan.example.com
PRE_CKAN_API_KEY=pre-ckan-api-keyFor a complete list of all environment variables, see the example.env file.
# Register a CSV dataset
curl -X POST "http://localhost:8001/url" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"resource_name": "weather_data",
"resource_title": "Weather Station Data",
"owner_org": "research_org",
"resource_url": "https://example.com/weather.csv",
"file_type": "CSV",
"notes": "Daily weather measurements"
}'# Search by organization
curl "http://localhost:8001/search?owner_org=research_org"
# Search with multiple terms
curl -X POST "http://localhost:8001/search" \
-H "Content-Type: application/json" \
-d '{
"search_term": "weather,temperature",
"resource_format": "csv"
}'curl -X POST "http://localhost:8001/kafka" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataset_name": "sensor_stream",
"dataset_title": "IoT Sensor Stream",
"owner_org": "iot_team",
"kafka_topic": "sensors",
"kafka_host": "localhost",
"kafka_port": "9092"
}'- Style: Black formatter, Flake8 linter
- Documentation: NumPy-style docstrings
- Testing: pytest with coverage
- Type Hints: Required for all functions
Complete step-by-step tutorial: docs/general_dataset_api_tutorial.ipynb
Learn how to:
- Authenticate and create organizations
- Create and manage datasets with metadata
- Handle resources, tags, and custom fields
- Update datasets and handle errors
# Run all tests
pytest
# Run with coverage
pytest --cov=api
# Run specific test file
pytest tests/test_routes.py
# Run in Docker container
docker exec -it pop-api pytestAPI not starting
# Check logs
docker logs pop-api
# Verify environment variables
docker exec -it pop-api env | grep CKANCKAN connection issues
- Verify CKAN_URL is accessible
- Check API key permissions
- Ensure firewall allows connections
Keycloak authentication failing
- Verify realm and client configuration
- Check client secret
- Confirm user exists in Keycloak
Kafka streams not working
- Verify Kafka broker is running
- Check topic exists
- Confirm network connectivity
- Enable connection pooling for high traffic
- Configure appropriate worker processes
- Use Redis for session storage in production
- Set up database connection limits
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Add tests for new functionality
- Run the test suite:
pytest - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
The POP API automatically collects and logs system metrics every 10 minutes:
{
"public_ip": "XXX.XXX.XXX.XXX",
"cpu": "15%",
"memory": "65%",
"disk": "45%",
"version": "0.6.0",
"organization": "Your Organization",
"services": {
"local_ckan": {"url": "http://localhost:5000"},
"kafka": {"host": "localhost", "port": 9092}
}
}This project is licensed under the MIT License - see the LICENSE file for details.