Skip to content

Update United to be VERB, as per discussion at https://github.com/Uni…#596

Open
AngledLuffa wants to merge 1 commit intodevfrom
united
Open

Update United to be VERB, as per discussion at https://github.com/Uni…#596
AngledLuffa wants to merge 1 commit intodevfrom
united

Conversation

@AngledLuffa
Copy link
Copy Markdown
Contributor

Update United to be VERB, as per discussion at #480

@AngledLuffa
Copy link
Copy Markdown
Contributor Author

@nschneid and/or @amir-zeldes

@amir-zeldes
Copy link
Copy Markdown
Contributor

Yes for me, I think this is in line with tagging non-nominal components of names based on their underlying morphological categories. For me "united" is a verbal passive participle, transparently derived from a verbal lemma (even if used adjectivally in context, i.e. deprel amod). I think maybe @nschneid wanted to think about this some more?

@nschneid
Copy link
Copy Markdown
Contributor

nschneid commented Jun 3, 2025

I think somebody should do a full review of the passive participle/adjective distinction in EWT to make it consistent.

@amir-zeldes
Copy link
Copy Markdown
Contributor

I can extract a list of forms whose tags don't match between GUM and EWT pretty easily, but a manual review of all cases seems unfeasible, no? Another approach would be to use a model trained on either corpus to predict on the other and the do some adjudication, but this also feels like it would be a very big manual effort. If we limit it to the top K cases it might be possible, but if by full review you mean all cases I really don't think we can do it.

@nschneid
Copy link
Copy Markdown
Contributor

nschneid commented Jun 3, 2025

By full review I didn't necessarily mean annotating every case but looking broadly at the range of lexical items. It seems like a good project for a student RA.

@AngledLuffa
Copy link
Copy Markdown
Contributor Author

Looking at the first few, it's already getting kind of grim

For example, there's

respected - currently ADJ in EWT. VERB in GUM. sure, you can respect someone
involved - you can also involve someone. involved_ADJ in Baath activities in EWT vs a more egregious example involved_VERB efforts by ... in GUM

Even the IP and ICDC have abandoned the neighbourhood, and those are trained and armed ...

trained and armed - again, you can train or arm people
both ADJ in EWT
trained is VERB in GUM: trained to walk in a straight line
armed though is ADJ: armed_ADJ conflict. i suppose that isn't a person. there's also armed camp though

scared: both treat it as ADJ, although you can definitely scare people
I was a little scared_ADJ in GUM
and before anyone says transitive vs intransitive, it is also possible to say I do not scare easily

500 Iraqis are reported_VERB wounded_ADJ in EWT vs thousands of wounded_VERB soldiers in GUM

so I suppose that we'd want to really come up with some overall guidelines / tests for VERB vs ADJ. certainly that would be necessary before throwing models or undergrad backhoes at the problem. in the meantime, it's a little difficult that United is treated differently, seeing as how common it is in both treebanks

random aside, Google's explanation of "undergrad backhoe": Backhoe operator positions typically do not require a bachelor's degree. You can become a backhoe operator with a high school diploma or equivalent

It's true. I got a (small) construction vehicle stuck in mud before I had a high school diploma, and someone with a high school diploma used an bigger vehicle to pull it out.

@amir-zeldes
Copy link
Copy Markdown
Contributor

a more egregious example involved_VERB efforts by

This is a finite verb (VBD), so that example is not the same construction

trained to walk in a straight line ... armed_ADJ conflict

These both look right to me. If you are trained to walk in a straight line, then you've been trained by someone, so it's a passive participle. And armed conflict is not a conflict that has been armed by someone (the combatants were probably armed by someone, which I would also consider to be passive, but the conflict is just armed). In other words, it fails the paraphrase test: an armed conflict is not a conflict that has been armed by someone.

I was a little scared_ADJ

This is the only one I'm hesitant about - of course, we can be scared by someone, but it's pretty odd to paraphrase that way. ENCOW puts the proportion of "scared by" to "scared of" at just under 10%. If 90% of cases express the source of the fear (if it's even expressed) without using a by-phrase, that makes me think that to a large degree this is already a lexicalized adjective. But if I had a sentence with "scared by", I would probably annotated it as passive all the same, so that would create a discrepancy.

500 Iraqis are reported_VERB wounded_ADJ

I would prefer to analyze cases like these as passives (somebody wounded them - it's morphologically and semantically transparent, easy to add a by-phrase, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants