Skip to content

Convenience functions for simple workflows #288

@jamesalster

Description

@jamesalster

Having started to use this package coming from quanteda in R, I feel some convenience features could greatly streamline common workflows:

  • Top features in corpus/subset of corups/document
  • Filtering DTMs by eg. proportion of documents that contain a word, or a regex pattern
  • Clearer documentation around DocumentTermMatrix vs dtm() functions, eg. on the ordering of terms
  • Coding a document term matrix with e.g. a sentiment dictionary (cf. quanteda.dictionaries) package inR

I'd be happy to contribute PRs for these, if any/all are desired functionality. That said, I'm quite new to the ecosystem so if I'm missing something do let me know!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions