Hey,
I found that the location of the tfidf_vectorizer file is not specified in the kbguided_pretrain when creating raw_ptdata, and it seems that this file is not provided on GitHub. I'm curious about what this file is.
kbguided_pretrain/datagen/generate_raw_ptdata.py
tfidf_vectorizer = ''
vectorizer = joblib.load(tfidf_vectorizer)
def generate_pair(y, mentions, select_scheme):
if select_scheme == 'random':
return random.choice(mentions)
elif select_scheme == 'sample':
similarity_estimate = cal_similarity_tfidf(mentions, y, vectorizer)
print(similarity_estimate.shape) ##
return np.random.choice(mentions, 1, p = similarity_estimate/np.sum(similarity_estimate))[0]
elif select_scheme == 'most_sim':
similarity_estimate = cal_similarity_tfidf(mentions, y, vectorizer)
return mentions[similarity_estimate.argmax()]
elif select_scheme == 'least_sim':
similarity_estimate = cal_similarity_tfidf(mentions, y, vectorizer)
return mentions[similarity_estimate.argmin()]
else:
print('Wrong mention selection scheme input!!!')
same is missing in the data_utils>ncbi>prepare_dataset.py
Looking forward to your reply,
Best,
Hey,
I found that the location of the tfidf_vectorizer file is not specified in the kbguided_pretrain when creating raw_ptdata, and it seems that this file is not provided on GitHub. I'm curious about what this file is.
kbguided_pretrain/datagen/generate_raw_ptdata.py
same is missing in the data_utils>ncbi>prepare_dataset.py
Looking forward to your reply,
Best,