Conversation
| index: "sap_index" | ||
| ground_truth_predictions_path: "/data/pubmed_mesh_prediction_output.npy" | ||
| ground_truth_id_name_pairs_path: "/data/pubmed_mesh_name_ids.csv" | ||
| ground_truth_data_id_type_pairs_path: "/data/pubmed_mesh_name_types.csv" No newline at end of file |
There was a problem hiding this comment.
Renaming these ground truth configuration variables as shown below will probably have better consistency:
ground_truth_data_predictions_path: "/data/pubmed_mesh_prediction_output.npy"
ground_truth_data_name_id_pairs_path: "/data/pubmed_mesh_name_ids.csv"
ground_truth_data_id_type_pairs_path: "/data/pubmed_mesh_id_types.csv"
| username: "elastic" | ||
| password: "" | ||
| index: "sap_index" | ||
| """ |
There was a problem hiding this comment.
The comments here don't sound relevant and probably can be removed?
| self.all_reps_ids = all_reps_name_ids['ID'] | ||
| self.elastic_client = SAPElastic( | ||
| **elastic_search_config | ||
| ) |
There was a problem hiding this comment.
I am wondering whether we need to have a configuration variable to indicate whether it uses elastic search for prediction or not. Then we can keep both elastic search configuration initialization as shown here as well as the old non-elastic search configurations, i.e., loading ground truth data directly into memory. This way the same code base can support both setups, i.e., babel sapbert using elastic search and pubmed sapbert using in-memory nearest neighbor search.
| ) | ||
|
|
||
| def __call__(self, query_text, count): | ||
| async def __call__(self, query_text, count=10, similarity="cosine", bl_type=""): |
There was a problem hiding this comment.
Should we use knn as the default similarity measure method since knn is faster than cosine?
| bl_type=bl_type, | ||
| algorithm=similarity | ||
| ) | ||
|
|
There was a problem hiding this comment.
Nice simplified code to use elastic search for searching, but as I mentioned in my previous comment, it is probably beneficial to have a configuration variable to indicate whether to use elastic search backend or not so we can keep both elastic search and in-memory search setup for supporting different use cases.
No description provided.