Imagine that we would like to deliver a new feature for enhancing our sellers' journey when onboarding new products on our Marketplace.
For that, we would like to provide a service that automatically detects the brand of added products from the related unstructured data it has e.g. title
For legal reasons, we are providing products titles extracted from publicly available amazon reviews dataset (https://nijianmo.github.io/amazon/index.html).
Tasks:
- Transform the provided dataset into the needed format for training a Named Entity recognition model
- Train a model of your choice for tagging provided products titles
- Provide the evaluation result of the trained model, and explain why the selected evaluation metric was used in this case
- Implement a simple Rest API with an endpoint getting as a parameter a product title and returning the tagged text
- Implement a unit test to make sure that the data transformation process implemented in task 1 is working as expected
- Provide a Dockerized version of the API
Optional:
- Implement an integration test for the implemented API route
Training Spacy small model for 200 iterations resulted in accuracy of 0.75