Detecting transliteration systems used in GeoNames data set

The GeoNames data set contains entries like these:

Row 2 here writes "西博寮海峽", which has a `NAME_LINK` entry to Row 4 "West Lamma Channel". However, while they are both names of the same geographic location, they are not related in transliteration. The actual transliterated row of Row 2 is Row 1, "Sai Puk Liu Hoi Hap". 

<img width="657" alt="Screen Shot 2020-01-12 at 3 14 53 PM" src="https://user-images.githubusercontent.com/11865/72215508-4c2d9b80-354e-11ea-8f06-99afbe726590.png">

There are two problems here:

1. Row 1 should have `NAME_LINK` pointing to Row 2 (i.e. its `NAME_LINK` should be `-1950489`, because Row 2 has this `UID` and `NAME_LINK` is supposed to be bi-directional) and should have `TRANSL_CD` code set to the Cantonese transliteration system because it is generated by transliterating Row 2.

2. Row 3 is also generated by Row 2, and should have `TRANSL_CD` code set to the Mandarin transliteration system because it is generated by transliterating Row 2. However, it is unclear what it should be set to because `NAME_LINK` seems to only support pairing of two entities, not a one-to-many relationship.

The point in this task is to detect that Row 3 comes from Row 2, detect the transliteration system, and pair them in the output we produce.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detecting transliteration systems used in GeoNames data set #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detecting transliteration systems used in GeoNames data set #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions