Skip to content

Elastic stuff#21

Open
YaphetKG wants to merge 62 commits intomainfrom
elastic-stuff
Open

Elastic stuff#21
YaphetKG wants to merge 62 commits intomainfrom
elastic-stuff

Conversation

@YaphetKG
Copy link
Copy Markdown
Contributor

@YaphetKG YaphetKG commented Mar 6, 2023

No description provided.

@YaphetKG YaphetKG requested review from gaurav and hyi March 6, 2023 16:27
Copy link
Copy Markdown
Contributor

@hyi hyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YaphetKG Nice work! Have left some comments for your consideration.

Comment thread config.yaml
index: "sap_index"
ground_truth_predictions_path: "/data/pubmed_mesh_prediction_output.npy"
ground_truth_id_name_pairs_path: "/data/pubmed_mesh_name_ids.csv"
ground_truth_data_id_type_pairs_path: "/data/pubmed_mesh_name_types.csv" No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming these ground truth configuration variables as shown below will probably have better consistency:

ground_truth_data_predictions_path: "/data/pubmed_mesh_prediction_output.npy"
ground_truth_data_name_id_pairs_path: "/data/pubmed_mesh_name_ids.csv"
ground_truth_data_id_type_pairs_path: "/data/pubmed_mesh_id_types.csv"

Comment thread requirements.txt
Comment thread src/ModelSingleton.py Outdated
username: "elastic"
password: ""
index: "sap_index"
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here don't sound relevant and probably can be removed?

Comment thread src/ModelSingleton.py Outdated
self.all_reps_ids = all_reps_name_ids['ID']
self.elastic_client = SAPElastic(
**elastic_search_config
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we need to have a configuration variable to indicate whether it uses elastic search for prediction or not. Then we can keep both elastic search configuration initialization as shown here as well as the old non-elastic search configurations, i.e., loading ground truth data directly into memory. This way the same code base can support both setups, i.e., babel sapbert using elastic search and pubmed sapbert using in-memory nearest neighbor search.

Comment thread src/ModelSingleton.py
)

def __call__(self, query_text, count):
async def __call__(self, query_text, count=10, similarity="cosine", bl_type=""):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use knn as the default similarity measure method since knn is faster than cosine?

Comment thread src/ModelSingleton.py
bl_type=bl_type,
algorithm=similarity
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplified code to use elastic search for searching, but as I mentioned in my previous comment, it is probably beneficial to have a configuration variable to indicate whether to use elastic search backend or not so we can keep both elastic search and in-memory search setup for supporting different use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants