ProductNormaliser.AdminApi is the operational, intelligence, and dashboard management API for the platform. It exposes queue state, discovery progress, crawl logs, conflicts, canonical product detail, change history, quality analytics, category metadata, and managed crawl-source administration over the shared MongoDB database.
This is an internal-facing service designed to help operators, analysts, and developers inspect what the system is doing and why.
- expose operational monitoring endpoints
- expose discovery-aware job and source progress views
- expose product and change-history views
- expose quality and source-intelligence analytics
- expose category catalog and schema metadata for dashboard discovery
- expose managed crawl-source administration for the web UI
- expose source-candidate discovery and guarded onboarding settings for the web UI
- translate persisted domain records into API DTOs suitable for inspection or dashboarding
At startup the API registers:
- ASP.NET Core controllers
- OpenAPI support for development
- MongoDB-backed stores and infrastructure services
IAdminQueryServicefor operational read modelsIDataIntelligenceServicefor quality and historical intelligence read modelsICategoryManagementServicefor category catalog and combined detail lookupsISourceManagementServicefor source registration, validation, and policy updatesISourceCandidateDiscoveryServicefor explainable source-candidate evaluationISourceOperationalInsightsProviderfor source readiness, health, and recent-activity summaries
The API composition also registers an optional local page-classification service used by source probing. It is treated as a supporting signal behind the existing heuristic pipeline, not as a separate decision system.
The API composition now supports full source discovery profiles end to end, including category entry pages, sitemap hints, allow or deny rules, URL patterns, max depth, and per-run budgets.
GET /api/stats
Returns a high-level operational summary.
The stats payload now includes both catalogue counts and an operational snapshot covering:
- active crawl jobs
- crawl queue depth, discovery queue depth, retry backlog, and failed queue items
- throughput and failures over the trailing 24 hours
- per-source operational health metrics
- per-category crawl pressure metrics
- discovered URL counts and confirmed product target counts in the aggregate and per-category views
GET /api/queueGET /api/queue/priorities
The priorities endpoint is especially useful because it surfaces the reasoning behind ordering, including source quality, change frequency, volatility, missing attributes, freshness, and the next scheduled attempt.
GET /api/crawl/logsGET /api/crawl/logs/{id}
Use these endpoints to inspect crawl behavior and individual processing outcomes.
GET /api/conflicts
Returns merge conflicts where evidence is competing or ambiguous.
GET /api/productsGET /api/products/{id}GET /api/products/{id}/history
These endpoints let you inspect filtered product lists, individual canonical records, and the time series of meaningful change events.
GET /api/crawljobsPOST /api/crawljobsGET /api/crawljobs/{jobId}
These endpoints support the operator crawl console and quick-launch flow.
The crawl-job payloads now carry discovery-aware progress such as discovered URL counts, confirmed product target counts, discovery queue depth, product queue depth, rejected pages, robots-blocked pages, and per-category or per-source coverage.
GET /api/categoriesGET /api/categories/familiesGET /api/categories/enabledGET /api/categories/{categoryKey}GET /api/categories/{categoryKey}/schemaGET /api/categories/{categoryKey}/detail
These endpoints exist so the dashboard can discover supported electrical-goods categories, group them by family, and render one-call detail pages that combine metadata and schema.
GET /api/sourcesGET /api/sources/automation-settingsGET /api/sources/{sourceId}POST /api/sourcesPUT /api/sources/{sourceId}POST /api/sources/{sourceId}/enablePOST /api/sources/{sourceId}/disablePUT /api/sources/{sourceId}/categoriesPUT /api/sources/{sourceId}/throttling
These endpoints manage the dedicated crawl-source registry used by the web UI. They include OpenAPI response annotations and concrete example payloads in the generated document so dashboard and client developers can inspect the expected shapes directly.
The source payloads now include readiness, health, recent activity, discovery queue depth, confirmed product counts, automation policy, and the full source discovery profile so the operator UI can surface crawl posture without separately stitching together telemetry.
POST /api/sources and PUT /api/sources/{sourceId} now support source discovery profile data directly. If a profile is omitted during registration, the application layer applies conservative startup defaults so a newly added source can participate in the boot-and-populate flow immediately.
POST /api/sources/candidates/discover
This endpoint returns candidate sources with market and locale evidence, probe signals, recommendation reasons, and the output of the optional classification layer where available. The API also exposes conservative automation settings so the web UI can keep onboarding controls visible and operator-led.
GET /api/quality/coverage/detailedGET /api/quality/unmappedGET /api/quality/sourcesGET /api/quality/merge-insightsGET /api/quality/source-historyGET /api/quality/attribute-stabilityGET /api/quality/source-disagreements
Most of the quality endpoints accept a categoryKey query parameter and default to the tv category.
The source-history and source-disagreements endpoints also support optional source filtering.
The two main read-model services are:
IAdminQueryService: crawl logs, queue state, product detail, conflict lists, stats, and product historyIDataIntelligenceService: coverage, unmapped attributes, source quality, merge insights, source history, attribute stability, and disagreement analytics
These services isolate controller logic from the shape of the underlying persisted model.
The API now sits on top of a stronger observability model for crawl operations:
- crawl job lifecycle events are logged as structured entries in the application layer
- worker and queue services emit
ProductNormaliser.Operationsmetrics and traces IAdminQueryService.GetStatsAsyncaggregates persisted queue state, jobs, crawl logs, crawl sources, and source-quality snapshots into one dashboard-friendly operational summary
This means the API can answer both business-health and runtime-health questions without asking the web layer to stitch several endpoints together.
Verified in tests:
- operational summary aggregation from persisted state
- contract parity for the extended stats payload
Observed operationally:
- the actual metric stream from the
Meter - the trace stream from the
ActivitySource - live log collection and search in your hosting environment
Configuration is read from appsettings.json, appsettings.Development.json, and the environment.
The Admin API key is configured under ManagementApiSecurity.
ManagementApiSecurity:ApiKeyHeaderNamecontrols which header is inspected. The default isX-Management-Api-Key.ManagementApiSecurity:ApiKeyscontains the configured management keys. Set theSecretfor the operator key here or override it with environment-specific configuration before exposing the API beyond your machine.ManagementApiSecurity:AllowDevelopmentLoopbackBypassstaysfalseby default. Only set it totruefor explicit local loopback development when you intentionally want to skip the header on localhost requests.
Because the API reads from MongoDB through shared infrastructure registration, it also needs the same Mongo settings used by the worker when running outside the existing local defaults.
If the optional classification layer is enabled, the API host also reads Llm settings such as enablement, timeout, confidence threshold, evaluation mode, and local model path. These settings are deliberately optional; the API continues to function with heuristics only when the classification layer is disabled or unavailable.
From the repository root:
dotnet run --project ProductNormaliser.AdminApiSend the configured operator key in the X-Management-Api-Key header. The checked-in bootstrap values live in appsettings.json and appsettings.Development.json; replace them in your own environment before using the API anywhere other than local development.
OpenAPI is enabled in development.
The included HTTP scratch file suggests a local development base address of http://localhost:5209.
- inspect which pages are queued and when they will be retried
- review why a source is trusted less than before
- identify attributes that are still frequently unmapped
- understand which sources repeatedly disagree with consensus
- view the change history of a canonical product
- build dashboards over coverage, freshness, and merge confidence
- queue write flows are still not exposed as a public ingestion API
- the API now requires a configured management API key for non-bypassed access, but it is still best treated as an internal operational surface rather than a public internet API
- it is best treated as an operational admin surface, not a public internet API
dotnet build ProductNormaliser.AdminApi/ProductNormaliser.AdminApi.csprojCommercial product-intelligence systems are often difficult to interrogate when data quality is questioned. This project is the inspection layer that makes ProductNormaliser operationally useful: it exposes the queue, the evidence trail, and the longitudinal quality signals needed to trust the system.