Skip to content

gerryhocks/kuromoji-multiplex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kuromoji-multiplex

Simple way to tokenize using multiple Kuromoji dictionaries.

Usage examples

Tokenize using a specific dictionary

Tokenizer tokenizer = new Tokenizer("unidic");
List<Token> tokens = tokenizer.tokenize("お寿司がたべたい"))

Tokenize the same string with all discovered dictionaries

for (String dictionaryName : Dictionary.getDictionaryNames()) {
    Tokenizer tokenizer = new Tokenizer(dictionaryName);
    for (Token token : tokenizer.tokenize("お寿司がたべたい")) {
        System.out.println(dictionaryName + "\t" + token.getSurface());
        for (String name : token.getFeatureNames()) {
            System.out.println("\t" + name + ": " + token.getFeature(name));
        }
    }
}

About

Simple way to tokenize using multiple Kuromoji dictionaries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages