API to type words ("Paris" -> "city")
Copyright zefrenchwan, 2024 MIT license
- Webapp to get and put data you ask for. Not just a cache, there is type inheritance and type inference
- Put data in bootstrap folder, change my config if necessary
- Create an env file and put values for
DBUSER,DBPASSandDBNAME - Launch
deploy.sh - Use it, for instance:
curl http://localhost:8000/check/value/paris/as/city/
- Data recognition for some specific values. You may want precise, hard coded results sometimes because your model is not always sufficient
- Looking for typed values, because usually you know the type of a data.
NLP models (spaCy for instance) tend to learn from corpus and then recognize what looks like a named entity. It usually is good enough. But when a company claims to be an expert of a given field, not finding or not typing common knowledge is a real issue. For instance, you may want to recognize "Michel Barnier" as a french prime minister when your job is social network analysis applied to politics. To do so, a model is not sufficient. You need a tool that contains "hard coded information":
- named entities on a specific field
- specific values (first names usually used)
Assume you ask for a word, "Paris" for instance.
It matches the first name of a famous american person, a city in France, music albums, etc.
Why would you load all those entries when you know that you ask for a city ?
It sounds like a better option to look for /check/value/Paris/as/CITY/ than /check/value/Paris/ and then filter on the client side.
Still, we may do better.
Paris is a capital city, then a city, then a location, then a physical entity.
When asking for locations, you do not want to request location, and city and capital and ...
You want to receive cities when asked for locations because cities are a sort of location.
- create a
.envfile at the same level as the Dockerfile - Set in there
DBUSERandDBPASSfor auth,DBNAMEfor database name - Run script
./deploy.sh
/check/value/{value}/will return information for that value, no matter the type/check/value/{value}/as/{tag}/will return information for that value, filtered as instances oftagor its subclasses/add/value/{value}/as/{tag}/will addvalueastag. It changes values, not the inheritance tree. Tag may be stored if not already here/link/child/{child}/to/parent/{parent}/will add the link child -> parent, may add child or parent if not already inserted
Sure !
Simple, there is a bootstrap folder that contains csv files and json files.
Put your data in it:
- in
jsonfiles, follow the example structure to add your own data, note thatcontentis an object - in
csvfiles, put your own inheritance tree. Structure is parent then child, left blanks mean 'same as the upper line'
Nope, because rebooting means losing all pg data. It is useful for me to test my code, not useful at all for you. So, that would be the first change you want to make. Then, security audit. So far, there is no auth at all. You may want to secure this data
Probably not. You want to change the docker file to keep the stored data. My project is a POC, yours may not be.
No, I did not, calm down. I said that models are good, they adapt to new information when a hard code reference data tool will not. Still, humans share a common basic knowledge about some named entities. This project may be combined to a model, to detect specific values. It will not replace a model, unless you work on a given specific corpus.
- First, because I like Python, and you may want to copy my design for your super fast project. This is why I used MIT license.
- You may want to deploy more instances, or cache data. There are solutions for this issue.
- I work on a problem, and I try code based solutions. Consider this code as a POC, not a final prod ready code
This is a good question. To me, coding in SQL with low-level access is really efficient and not that difficult. So, you will find stored procedures and all a database may offer. No ORM, then.
So far, the one I used was slower than my beloved postgresql.
Might be an idea, but you would need to store the tree as is. Interesting, maybe on a next project !