Skip to content

Add lemmatization option for normalizing loaded documents#189

Open
yetra wants to merge 1 commit intoboudinfl:masterfrom
yetra:load-document-normalization
Open

Add lemmatization option for normalizing loaded documents#189
yetra wants to merge 1 commit intoboudinfl:masterfrom
yetra:load-document-normalization

Conversation

@yetra
Copy link
Copy Markdown

@yetra yetra commented Mar 30, 2022

According to #75, there used to be a lemmatization option for the load_document() method's normalization parameter.

This doesn't seem to be the case any longer - stemming is applied or word surface forms are used as stems - even though lemmas are extracted during text loading.

I'm (re)adding the lemmatization option as it would be very useful to have for e.g. TF-IDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant