Skip to content

Solve nested entities problems by using SpanCategorizer#88

Open
hungvo304ml wants to merge 2 commits intoGeorgetown-IR-Lab:masterfrom
hungvo304ml:Nested_NER_SC
Open

Solve nested entities problems by using SpanCategorizer#88
hungvo304ml wants to merge 2 commits intoGeorgetown-IR-Lab:masterfrom
hungvo304ml:Nested_NER_SC

Conversation

@hungvo304ml
Copy link

Using doc.spans["sc"] (SpanCategorizer) to solve the problem of overlapped tokens in nested NER for spacy. By replacing doc.ents with doc.spans["sc"], all possible entities are able to be stored without any errors.
After storing all possible spans, we filter out overlapping spans before adding them to doc.ents. Here we remove overlapping spans using spacy.util.filter_spans. When spans overlap, the rule is to prefer the first longest span over shorter ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant