Replace deprecated 'punkt' with 'punkt_tab'#83
Open
emielsteerneman wants to merge 1 commit intopinecone-io:mainfrom
Open
Replace deprecated 'punkt' with 'punkt_tab'#83emielsteerneman wants to merge 1 commit intopinecone-io:mainfrom
emielsteerneman wants to merge 1 commit intopinecone-io:mainfrom
Conversation
Author
2 tasks
Author
|
The maintainers suggest upgrading NLTK to at least version 3.9.1 |
|
if someone doesn't upgrade their version of nltk, their pinecone will break. |
Author
|
Any update on this? A library like Pinecone should avoid introducing security risks, especially considering the issue mentioned here: nltk/nltk#3266 (comment). It would be best to fully migrate to punkt_tab and enforce a minimum NLTK version to prevent breakage and vulnerabilities. |
RubenCata
approved these changes
Nov 14, 2024
|
Just ran into this error on production. Is there a reason why the fix is not in yet? Any workarounds while this is getting delayed? |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The NLTK package
punkthas been deprecated, resulting in an error when calling a BM25TokenizerSolution
Replace
punktwith the newpunkt_tabType of Change
This might be a breaking change. NLTK 3.8.1 and lower use
punktwhereas NLTK 3.8.2 and above will usepunkt_tab. Thepyproject.tomlfile referencesnltk = "^3.6.5", meaning it will install NLTK 3.8.2 if possible, thus breaking. Introducing this breaking change on a patch version is something that the NLTK maintainers not should have done, but alas.Another fix would be to freeze the NLTK version.
Test Plan
I tried it locally and it fixed my issue.