During data aggregation, much ambiguous and sometimes conflicting information is received from the different external sources. To select the "best" piece of information in case of merge conflicts (i.e. different data values from different sources), we usually define different quality levels (e.g. "source X provides higher quality/accuracy then source Y). This is currently implemented separately for different pieces of data, e.g. geographical summit coordinates, sectors or grades.
Please unify all of these implementation by:
- Implementing a generic "ranked value" which combines a certain value with a priority
- Using this new way in all related places
- Creating factories for easily creating entity objects with a certain priority implementation (i.e. each filter has its own factory)
- Documenting the approach and the priority definition with the scraper architecture
Also note that in some cases we really have a prioritized list of values, while in others data of certain sources must never be written into the route database at all (but must still be available during scraper execution).
The main goals of this refactoring are:
- Make the code easier by reusing the same mechanism in all similar situations
- Simplify unit tests by using the entity factories
- Simplify future extensions with similar behaviour
- Clearly document which data is used when and why
During data aggregation, much ambiguous and sometimes conflicting information is received from the different external sources. To select the "best" piece of information in case of merge conflicts (i.e. different data values from different sources), we usually define different quality levels (e.g. "source X provides higher quality/accuracy then source Y). This is currently implemented separately for different pieces of data, e.g. geographical summit coordinates, sectors or grades.
Please unify all of these implementation by:
Also note that in some cases we really have a prioritized list of values, while in others data of certain sources must never be written into the route database at all (but must still be available during scraper execution).
The main goals of this refactoring are: