You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ProductNormaliser.Domain/README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ The goal is not just to pick a winner, but to pick a winner with an audit trail.
71
71
72
72
## Category schema
73
73
74
-
The first fully modelled category is `tv` in the TV schema provider. Additional category providers for `monitor`, `laptop`, and `refrigerator` now exist so the rest of the platform can score completeness, route normalisation, and expose dashboard metadata without assuming TV-only semantics.
74
+
The first fully modelled category is `tv` in the TV schema provider. Additional category providers now exist for `monitor`, `laptop`, `smartphone`, `tablet`, `headphones`, `speakers`, and `refrigerator` so the rest of the platform can score completeness, route normalisation, and expose dashboard metadata without assuming TV-only semantics.
75
75
76
76
The TV schema still defines the richest required and optional canonical attributes such as:
77
77
@@ -91,6 +91,8 @@ Adding a new category typically means:
91
91
3. deciding identity heuristics and attribute reliability rules for that category
92
92
4. registering metadata so the admin API and web dashboard can discover the category safely
93
93
94
+
At the current maturity line, `tv`, `monitor`, `laptop`, and `smartphone` are the supported categories with the strongest schema, normalisation, and identity coverage. `tablet`, `headphones`, and `speakers` are enabled experimental categories with broader canonical field sets and category-specific normalisers, but they are still being hardened before promotion.
95
+
94
96
## How other projects use Domain
95
97
96
98
- Infrastructure implements many of the Domain interfaces and persists Domain models.
Copy file name to clipboardExpand all lines: ProductNormaliser.Web/README.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,8 @@ The web host currently delivers:
9
9
- an operator landing page that keeps the active category context visible
10
10
- an operator landing page operational health panel for queue depth, retry backlog, recent failures, at-risk sources, and category pressure
11
11
- an operator landing page boot-and-populate panel showing boot-ready sources, categories in context, estimated discovery seeds, and recent confirmed-product throughput
12
-
- category selection for the rollout set: TVs, Monitors, and Laptops
12
+
- category selection for the current supported set: TVs, Monitors, Laptops, and Smartphones
13
+
- category selectors and explorer filters that continue to expose enabled experimental categories such as Tablets, Headphones, and Speakers with their maturity badges intact
13
14
- seeded crawl launch and crawl-job monitoring with discovery and product progress shown together
14
15
- canonical product exploration with quality-aware filters and paging
15
16
- product detail pages with source comparison, evidence, conflicts, and history
Copy file name to clipboardExpand all lines: README.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
ProductNormaliser is an open product-intelligence engine for turning messy retail and manufacturer page data into clean, canonical, comparable product records. It crawls source pages, extracts structured product evidence, normalises attributes into a category schema, resolves identity across sources, merges competing claims into a canonical product, and keeps learning over time from quality history, disagreement patterns, and page volatility.
4
4
5
-
Milestone 1 is centered on an end-to-end operator workflow for three rollout categories: `tv`, `monitor`, and `laptop`. The platform still keeps category and normalisation extension points broad enough for future electrical-goods expansion, but the completed milestone scope is the crawl, management, product, and quality experience for those three categories.
5
+
Milestone 1 is centered on an end-to-end operator workflow for four supported categories: `tv`, `monitor`, `laptop`, and `smartphone`. The platform still keeps category and normalisation extension points broad enough for broader electrical-goods expansion, while `tablet`, `headphones`, and `speakers` remain enabled experimental categories that share the same workflow surface.
6
6
7
7
## What problem this solves
8
8
@@ -76,7 +76,7 @@ The solution now contains ten projects:
76
76
77
77
## Architecture at a glance
78
78
79
-
1. Operators register or enable managed crawl sources and assign categories such as `tv`, `monitor`, and `laptop`.
79
+
1. Operators register or enable managed crawl sources and assign categories such as `tv`, `monitor`, `laptop`, and `smartphone`.
80
80
2. Each source carries a discovery profile with category entry pages, sitemap hints, allow or deny path rules, URL patterns, depth limits, and per-run budgets.
81
81
3. A category crawl job now seeds deterministic discovery from eligible managed sources instead of relying only on pre-known targets.
82
82
4. The discovery worker fetches sitemaps and listing pages while respecting robots rules, source throttling, depth limits, and URL budgets.
@@ -93,8 +93,8 @@ The solution now contains ten projects:
93
93
The solution currently includes:
94
94
95
95
- category metadata and schema discovery for electrical-goods families
96
-
- category registry support for the Milestone 1 rollout set: TVs, Monitors, and Laptops
97
-
- schema-driven attribute normalisation with category-specific providers for TVs, Monitors, and Laptops
96
+
- category registry support for the current supported set: TVs, Monitors, Laptops, and Smartphones
97
+
- schema-driven attribute normalisation with category-specific providers for TVs, Monitors, Laptops, Smartphones, and enabled experimental next-wave categories
98
98
- alias handling and measurement parsing
99
99
- structured data extraction from HTML and JSON-LD
100
100
- MongoDB persistence for source and canonical records
0 commit comments