Skip to content

Latest commit

 

History

History
93 lines (54 loc) · 5.75 KB

File metadata and controls

93 lines (54 loc) · 5.75 KB

Add Components to ClearNLP

  • Create a package for the component.

Setup

In order to add any additional component to ClearNLP, you will have to clone the entire clearnlp repo to your local machine.

cd ~/TARGET_DIRECTORY_PATH/
git clone https://github.com/clir/clearnlp.git

There are five parts of the component that your have to setup/initialize to integrate the componenet with ClearNLP:

1. COLLECT: Collect lexicon before training
2. TRAIN: Train the componenet model
3. BOOTSTRAP: Decode on train data to pick up error cases for model improvement
4. EVALUATE: Test componenet model on development data
5. DECODE: Decode

These are declared enums for componenets ( ... .component.utils.CFlag)

Tutorial

  1. Create a package (Name of your component) under ... .clearnlp.component.mode

  2. Create a component state java file (ie. SeqState extends AbstractLRState/AbstractState)

    1. Overwite all abstract methods according the functionality of your component
    2. You can pass in a dictionary to the constructor if needed

    ***clearOracle: For clearing predefined information to avoid confusing data during feature extraction

    • State files has an OracelType(Ground truth input) and a LabelType(the way to produce label)
    • Extend AbstractLRState if you are doing equence classification from left to right (i.e. POSTagging or NameEntityRecognition) else extend AbstractState
  3. Create a component evaluattion java file and finish implementing all abstract methods (ie. SeqEval extends AbstractEval)

    1. ***countCorrect: Counting the number of correct prediction compared to the ground truth
    2. ***getScore[]: Return a score array if the evaluation contains multiple scores
    3. ***getScore: Return the score that you try to optimized
    4. ***clear: clear all scores
  4. Create a component train configuration java file and finish implementing all abstract methods (ie. SeqTrainConfiguration extends AbstractTrainConfiguration)

    1. Fill in paramenters that you wish to initialize in the xml file under init()
    2. Update clearnlp.nlp.NLPMode for triain configuration component specification
  5. Create a component feature extractor java file and finish implementing all abstract methods (ie. SeqFeatureExtractor extends CommonFeatureExtractor)

  6. Create a component (abstract) classifier java files and finish implementing all abstract methods (ie. AbstractSeqClassifier & DefaultSeqClassifier)

  7. Create a component trainer java file and finish implementing all abstract methods (ie. SeqTrainer extends AbstractNLPTrainer)

    1. Initialize constructors for each part of the component
  8. Modify ... .clearnlp.nlp.NLPUtils

    1. Update getTrainer() with the new component mode/type

    2. Create getter for your component's classifier model (ie. AbstractSequenceClassifier())

    Source code: clearnlp.nlp.NLPUtils

  9. Modify ... clearnlp.bin.NLPDecode

    1. Update all needed switch fuinctions by adding case NLPMode.YOUR_COMPONENT:

    Source code: clearnlp.bin.NLPDecode

  10. Test out and run the newly added component

    java  -Xmx8g -XX:+UseConcMarkSweepGC edu.emory.clir.clearnlp.bin.NLPTrain -c CONFIG -f FEATURE -mode MODE -t TRAINDATA -d DEVDATA
    

Note

This tutorial shows how to add components to ClearNLP by creating a sequence classifier component to ClearNLP. Due to the similiarity between sequence classifier and POS classifer, the tutorial above simply replicates clearnlp.component.mode.pos and rewrites the exisiting component into a sequence classifier.

If you were going to add a component that structurally different from any existing component in ClearNLP, you will have to create most of the abstract classes (ie. AbstractEval, AbstractState, etc). However you do have to make sure to implement the five parts of the ClearNLP component mentioned above.