-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The GeoNames data set contains entries like these:
Row 2 here writes "西博寮海峽", which has a NAME_LINK entry to Row 4 "West Lamma Channel". However, while they are both names of the same geographic location, they are not related in transliteration. The actual transliterated row of Row 2 is Row 1, "Sai Puk Liu Hoi Hap".
There are two problems here:
-
Row 1 should have
NAME_LINKpointing to Row 2 (i.e. itsNAME_LINKshould be-1950489, because Row 2 has thisUIDandNAME_LINKis supposed to be bi-directional) and should haveTRANSL_CDcode set to the Cantonese transliteration system because it is generated by transliterating Row 2. -
Row 3 is also generated by Row 2, and should have
TRANSL_CDcode set to the Mandarin transliteration system because it is generated by transliterating Row 2. However, it is unclear what it should be set to becauseNAME_LINKseems to only support pairing of two entities, not a one-to-many relationship.
The point in this task is to detect that Row 3 comes from Row 2, detect the transliteration system, and pair them in the output we produce.
