Reports manager for constructing batch reports.
Implements both report submission/status tracking web service and backend report processor.
0.2.2- minor dependency updates0.2.1- major dependency updates to java 21, tomcat 10, jersey 30.1.3- prior production deployed version
Submit a request (defined by query parameters) via POST, returns location request status information:
POST /sr-manager/report-request?...
Get status information in JSON for the identified request:
GET /sr-manager/report-request/{id}
Retrieve a prepared report:
GET /sr-manager/report/{id}.{csv|xlsx}
Get the most recent month for which data is available as a plain string (e.g. "2015-09"):
GET /sr-manager/latest-month-available
Clear the transient cache:
POST /sr-manager/system/clear-cache
Clear the transient and persistent caches:
POST /sr-manager/system/clear-cache-all
Prune old completed records from the queue manager:
POST /sr-manager/system/clear-old-records
Suspend report processing (after current report finishes):
POST /sr-manager/system/suspend
Resume report processing:
POST /sr-manager/system/resume
Return the current status of report processing (Running, Suspending, or Suspended):
GET /sr-manager/system/status
Prometheus metrics scrape endpoint:
GET /sr-manager/system/metrics
| Parameter | Values | Default |
|---|---|---|
areaType |
AreaType (see below) | |
area |
Name of area (use EW for country) | |
aggregate |
AreaType or none |
none |
age |
new old any |
any |
period |
2015 2015-Q1 2015-03 | |
report |
avgPrice banded |
avgPrice |
sticky |
true false |
true |
test |
true false |
false |
Where AreaType is one of:
countryregioncountydistrictpcAreapcDistrictpcSector
The test flag bypasses all live processing and just uploads a fixed pair of pre-canned reports after a 10s pause. All the other parameter are ignored in this case so long as they are legal.
Status is reported as a JSON object will the following fields:
| Field | Value | Notes |
|---|---|---|
key |
id string | unique identifier for report |
status |
Pending InProgress Completed Failed Unknown |
|
positionInQueue |
number | Valid number if Pending, 0 if InProgress, absent otherwise |
eta |
time left in ms | Only available of Pending/InProgress |
started |
date-time string | Time an InProgress request started processing |
The configuration of the request processing (queue manager and location, store manager and location, sparql endpoint) is all specified in an appbase configuration file in /etc/standard-reports/app.conf. The desired configuration file should be mounted into this location in the container.
The default configuration is suitable for testing only and uses an in-memory queue, file store manager (in /tmp) and assumes a sparql endpoint of http://localhost:3030/landregistry_to/query
In production configure to use AWS Dynamo DB tables for queue management:
queue = com.epimorphics.armlib.impl.DynQueueManager
queue.checkInterval = 1500
queue.tablePrefix = hmlr-XX-
Tables {prefix}-Queue and {prefix}-Completed will be created and used for the queue and to the record of completed jobs. These will be created as provisioned table with a default provisioning of 5(read)/5(write) for the Queue table, and 3(read)/1(write) for the Completed table.
Note: Recent growth in the number of standard reports processed has meant that the provisioning of the {prefix}-Completed table is not enough for production and clearing that table can take 10 minutes or more. Recommend changing the provisioning to on-demand.
Mapping between regions and top level "counties" (which may actually be higher level groupings like GREATER LONDON and GREATER MANCHESTER) are implemented in query fragments in src/main/resources/WEB-INF/queryTemplates. If new top level authorities are added then these will need to be updated. Add the value to both the region_XXX.sq values list and the regionAggregate.sq switch statement.
A future improvement would be to put the region mapping into the data itself and rewrite the queries to use that.
The docker image includes scripts to manage standard reports processing. These can be invoked by an exec into the image.
| Script | Action | Notes |
|---|---|---|
suspend |
Halt report processing after the current report has completed | Allows up to 15 min for the suspend. If report is stuck and can't be suspended in that time the script will abort with exit code 1. |
resume |
Resume report processing after a suspension | |
clearcache |
Clear the cache of generated reports (on S3) |
To run the container locally with the default configuration (sparql endpoint of http://localhost:3030/landregistry_to/query):
AWS_PROFILE=lr docker run -p 8080:8080 018852084843.dkr.ecr.eu-west-1.amazonaws.com/epimorphics/standard-reports-manager/dev:${VERSION}
To use the public LR sparql endpoint (which is subject to a 90s/120s timeout) then use the configuration file in dev/app.conf e.g.:
AWS_PROFILE=lr docker run -v $(pwd)/dev/app.conf:/etc/standard-reports/app.conf -p 8080:8080 018852084843.dkr.ecr.eu-west-1.amazonaws.com/epimorphics/standard-reports-manager/dev:${VERSION}
Example test cases:
curl -i -X POST "http://localhost:8080/sr-manager/report-request?areaType=county&area=HAMPSHIRE&aggregate=district&period=2015-Q3&report=avgPrice"
curl -i -X POST "http://localhost:8080/sr-manager/report-request?areaType=county&area=DEVON&aggregate=district&period=2015-06&age=new&report=avgPrice"
curl -i http://localhost:8080/sr-manager/latest-month-available
# Large result test case
curl -i -X POST "http://localhost:8080/sr-manager/report-request?area=EW&period=2015&areaType=country&report=banded&age=any&aggregate=pcSector"
curl -i -X POST "http://localhost:8080/sr-manager/report-request?areaType=district&area=KENSINGTON+AND+CHELSEA&aggregate=district&period=2015&age=any&report=banded"
curl -i -X POST http://localhost:8080/sr-manager/system/suspend
curl -i -X POST http://localhost:8080/sr-manager/system/resume
curl -i http://localhost:8080/sr-manager/system/status
Test sequence: curl -i -X POST "http://localhost:8080/sr-manager/report-request?areaType=county&area=HAMPSHIRE&aggregate=district&period=2015-Q3&report=avgPrice" curl -i -X POST "http://localhost:8080/sr-manager/report-request?areaType=county&area=DEVON&aggregate=district&period=2015-06&age=new&report=avgPrice" curl -i -X POST http://localhost:8080/sr-manager/system/suspend curl -i http://localhost:8080/sr-manager/system/status
curl -i -X POST http://localhost:8080/sr-manager/system/resume
Extra band check case:
curl -i -X POST "http://localhost:8080/sr-manager/report-request?areaType=district&area=KENSINGTON+AND+CHELSEA&aggregate=district&period=2015-Q4&age=new&report=banded"
Should have:
1001k - 1250k 1 1251k - 1500k 1 1501k - 1750k 2 1751k - 2000k 7
Rerunning this in 2025 (presumably after data backfilled) those values are:
1001k - 1250k 1 1251k - 1500k 2 1501k - 1750k 3 1751k - 2000k 11