Possible Mentors: Federico Capoano, cbeaujoin-stellar
- Fully working Elasticsearch backend implementing the complete
DatabaseClientinterface, with all existing tests passing plus new tests for charts added after the original ES PR (signal strength, iperf3, etc.) - Fully working InfluxDB 2.0 backend implementing the same interface, with Flux query equivalents for all 19 chart queries defined in
queries.pyplusdefault_chart_queryanddevice_data_query - Dedicated GitHub Actions CI build for each new backend, running the full test suite on every pull request
- Users can switch between InfluxDB 1.8, InfluxDB 2.0, and Elasticsearch via the
TIMESERIES_DATABASEsetting, with installation support in both ansible-openwisp2 and docker-openwisp - Elasticsearch optimizations from #168 applied (disable
_source, single-request index creation,indices.refresh()after writes, tighter time-range filters) - Documentation covering setup, configuration, and usage for both new backends
After reading through the OpenWISP Monitoring codebase (and contributing to it), I noticed that all database interaction is already isolated behind a single DatabaseClient class in openwisp_monitoring/db/backends/influxdb/client.py. The rest of the monitoring logic (Django models, checks, device status parsing, chart rendering) never touches InfluxDB directly. It calls methods on timeseries_db, which is a DatabaseClient instance loaded at startup via import_module() based on the TIMESERIES_DATABASE["BACKEND"] setting.
This means adding a new backend is a matter of implementing the same DatabaseClient interface in a new module under openwisp_monitoring/db/backends/. The monitoring logic stays untouched.
To technically validate this approach, I built HostWatch, a system metrics collector that implements the same abstraction pattern: all database interaction sits behind a common interface, and the collector and dashboard are completely unaware of which backend is active. All core data methods have been implemented across three backends (InfluxDB 1.8, InfluxDB 2.x, and Elasticsearch), demonstrating that the same interface can be fulfilled by each using different underlying APIs.
The purpose of building HostWatch was to work through every method that the DatabaseClient interface requires, hit the real differences between the three databases (query languages, write formats, retention mechanisms, result normalization), and document how each one maps to the OpenWISP implementation. The chart query methods (get_query(), get_list_query(), _group_by(), _fields(), _get_top_fields()) could not be replicated in HostWatch since they are tightly coupled to OpenWISP's Django chart models, so instead I analyzed how each one would translate to Flux and Elasticsearch Query DSL and documented that in the mapping doc.
The HostWatch repo has three documentation files that cover the full technical detail:
- README.md: project setup, architecture, and how backend switching works
- backend-implementation-guide.md: method-by-method breakdown of how each backend implements the interface, output format differences, and the problems that came up while unifying them
- mapping-to-openwisp-monitoring.md: how HostWatch maps to OpenWISP Monitoring, chart system method translations, my analysis of the existing Elasticsearch PR #164 and the optimizations from #168, deployment tooling for ansible-openwisp2 and docker-openwisp, CI build approach, and a
query()return type inconsistency I found in the current codebase
- Languages: Python, Flux query language, Elasticsearch Query DSL
- Frameworks: Django, Django REST Framework
- Database clients:
influxdb-client(InfluxDB 2.x),elasticsearch(Elasticsearch) - Testing: Django test framework, pytest
- CI/CD: GitHub Actions
- Deployment: Ansible (ansible-openwisp2), Docker Compose (docker-openwisp)
I have read through the OpenWISP Monitoring source code during my contributions and am familiar with the NetJSON format used for device status data, including how metrics like interface traffic and signal strength are extracted from NetJSON device status and stored as time series data.
| Week | Focus | Deliverable |
|---|---|---|
| Pre-GSoC | Early start, environment setup | Dev environment ready, all three database services running locally, existing test suite passing |
| Week 1-2 | InfluxDB 2.x core backend | Implement all core data methods using influxdb-client library |
| Week 3-5 | InfluxDB 2.x Flux query migration | Translate all 19 chart queries from InfluxQL to Flux, implement chart query methods for Flux |
| Week 6 | InfluxDB 2.x testing and CI | All existing InfluxDB 1.8 tests replicated and passing on the 2.x backend, dedicated CI build configured |
| Week 7-8 | Elasticsearch core backend | Implement all core data methods using elasticsearch library, apply optimizations from #168 |
| Week 9-10 | Elasticsearch query layer and testing | Implement chart query methods using date_histogram aggregations, all tests passing, dedicated CI build configured |
| Week 11 | Deployment tooling | Update ansible-openwisp2 roles/variables and docker-openwisp Compose files to support backend selection |
| Week 12 | Documentation and buffer | Usage docs for both backends, coverage check, mentor review feedback addressed |
I can commit 20-30 hours per week to the project. I am currently doing a remote internship, which eliminates commute time and gives me a flexible daily schedule to accommodate consistent contributions throughout the program. I have no exams or travel planned during the GSoC period.
If time permits after the core deliverables are complete, I would like to explore adding TimescaleDB as a fourth backend. TimescaleDB is a time-series extension for PostgreSQL, and OpenWISP already uses PostgreSQL as its primary Django database. This means operators would not need to install or maintain a separate database service for time-series data at all. Metrics could live in the same PostgreSQL instance that already stores device configurations, users, and organizations. The DatabaseClient interface built during GSoC would make this addition straightforward since all the abstraction work would already be in place.