Elastic stuff by YaphetKG · Pull Request #21 · RENCI-NER/nemo-serve

YaphetKG · 2023-03-06T16:27:06Z

No description provided.

hyi

@YaphetKG Nice work! Have left some comments for your consideration.

hyi · 2023-03-06T17:17:42Z

+    index: "sap_index"
+  ground_truth_predictions_path: "/data/pubmed_mesh_prediction_output.npy"
+  ground_truth_id_name_pairs_path: "/data/pubmed_mesh_name_ids.csv"
+  ground_truth_data_id_type_pairs_path: "/data/pubmed_mesh_name_types.csv"


Renaming these ground truth configuration variables as shown below will probably have better consistency:

ground_truth_data_predictions_path: "/data/pubmed_mesh_prediction_output.npy" ground_truth_data_name_id_pairs_path: "/data/pubmed_mesh_name_ids.csv" ground_truth_data_id_type_pairs_path: "/data/pubmed_mesh_id_types.csv"

hyi · 2023-03-06T17:21:45Z

+    username: "elastic"
+    password: ""
+    index: "sap_index"
+    """


The comments here don't sound relevant and probably can be removed?

hyi · 2023-03-06T17:25:38Z

-        self.all_reps_ids = all_reps_name_ids['ID']
+        self.elastic_client = SAPElastic(
+            **elastic_search_config
+        )


I am wondering whether we need to have a configuration variable to indicate whether it uses elastic search for prediction or not. Then we can keep both elastic search configuration initialization as shown here as well as the old non-elastic search configurations, i.e., loading ground truth data directly into memory. This way the same code base can support both setups, i.e., babel sapbert using elastic search and pubmed sapbert using in-memory nearest neighbor search.

hyi · 2023-03-06T17:26:18Z

+        )

-    def __call__(self, query_text, count):
+    async def __call__(self, query_text, count=10, similarity="cosine", bl_type=""):


Should we use knn as the default similarity measure method since knn is faster than cosine?

hyi · 2023-03-06T17:29:12Z

+                bl_type=bl_type,
+                algorithm=similarity
+            )
+


Nice simplified code to use elastic search for searching, but as I mentioned in my previous comment, it is probably beneficial to have a configuration variable to indicate whether to use elastic search backend or not so we can keep both elastic search and in-memory search setup for supporting different use cases.

YaphetKG added 27 commits February 13, 2023 20:00

adding cli tooling for indexing data to elastic

c973d06

Merge branch 'main' into elastic-stuff

a74310a

simplifying args and maintianing config in one spot

7164d58

build from a branch

b74f8d3

build from a branch

f31dcae

fix list parsing when more than one ids are in csv

ceb2fc4

fix list parsing when more than one ids are in csv

f804866

format docs for bulk load

24790ee

elastic for model

16b03a8

elastic for model

f8d0a78

elastic for model

6c116b9

elastic for model

f0b8850

elastic for model

5afa6d9

elastic for model

28492a3

adding timeouts and retry

de64f73

adding timeouts and retry

f14d7f2

increase chunk size, bump refresh interval

50c39a1

increase chunk size, bump refresh interval

a4b4720

async elastic search options

25ae515

make create index async

41eaca6

log counts

d282b04

remove cruft

a6de96b

add args

8fae1be

add args

8a24cd3

revert enums to acccept json

d8c8edd

everything async

07bd2fc

use client .search

ca760f0

YaphetKG requested review from gaurav and hyi March 6, 2023 16:27

hyi reviewed Mar 6, 2023

View reviewed changes

YaphetKG and others added 30 commits May 11, 2023 15:55

default normalize to false

14e2101

generalize doc generation

948152c

correct key

d2b5ae1

correct key

463f45d

adding redis search to singleton

f7a4fb6

more generic config

0427ed6

adding bl type filter

1a4746d

trim biolink from categories Field

d19dfd0

add biolink: prefix to results

987b6ce

remove args.storage in favor of config

a9e39a7

format tags search

84a4a98

rename variable

36659f4

fix passing bl_type

49055dd

Merge branch 'main' into elastic-stuff

4b904f5

adding sapbert qdrant

bd3cf6e

bump nemo version

c46716f

add qdrant client for req

bac8579

bump nemo version

a80bb97

pushing some more edits

d24673e

detect gpu

718ad0e

gpu usage, reformat result

ceb235c

gpu usage, reformat result

7dbc1f4

correct indexing files

ab34c03

rename config file, add arg for cli

6a2ebcf

convert token classification to async call

e55b6e7

fix import for no gpu

829545f

fix import for no gpu

574ca59

docker

94c3813

remove dup req

58c5350

remove cruft comment

9eaf305

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic stuff#21

Elastic stuff#21
YaphetKG wants to merge 62 commits intomainfrom
elastic-stuff

YaphetKG commented Mar 6, 2023

Uh oh!

hyi left a comment

Uh oh!

hyi Mar 6, 2023

Uh oh!

Uh oh!

hyi Mar 6, 2023

Uh oh!

hyi Mar 6, 2023

Uh oh!

hyi Mar 6, 2023

Uh oh!

hyi Mar 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YaphetKG commented Mar 6, 2023

Uh oh!

hyi left a comment

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hyi Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants