Skip to content

Detecting transliteration systems used in GeoNames data set #3

@ronaldtse

Description

@ronaldtse

The GeoNames data set contains entries like these:

Row 2 here writes "西博寮海峽", which has a NAME_LINK entry to Row 4 "West Lamma Channel". However, while they are both names of the same geographic location, they are not related in transliteration. The actual transliterated row of Row 2 is Row 1, "Sai Puk Liu Hoi Hap".

Screen Shot 2020-01-12 at 3 14 53 PM

There are two problems here:

  1. Row 1 should have NAME_LINK pointing to Row 2 (i.e. its NAME_LINK should be -1950489, because Row 2 has this UID and NAME_LINK is supposed to be bi-directional) and should have TRANSL_CD code set to the Cantonese transliteration system because it is generated by transliterating Row 2.

  2. Row 3 is also generated by Row 2, and should have TRANSL_CD code set to the Mandarin transliteration system because it is generated by transliterating Row 2. However, it is unclear what it should be set to because NAME_LINK seems to only support pairing of two entities, not a one-to-many relationship.

The point in this task is to detect that Row 3 comes from Row 2, detect the transliteration system, and pair them in the output we produce.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions