Skip to content

[Metadata] Documenting current metadata in th.csv #1723

@bact

Description

@bact

This is to documenting the way "notes" field is being used in th.csv, the test list for Thailand, to encoded some metadata about the URL itself or about the test status of the URL.

  • Almost all of them are not well-formatted, still not totally "free text".
  • The one that is consistency coded is the language code. This one can be easily parse.

Available metadata:

  • “[<lang_code>]” -- Appended to the end of the notes. Natural language of the page at the URL. <lang_code> is ISO 639-1 language code (2 characters).

  • “Regional site” -- Appended to the end of the notes. Telling if the website at the URL is a “regional site” where the same site is intended to serve more than one country. Useful when reporting about the characteristic of the website.

  • “blocked in ” or “blocked on “ -- Date the URL got issued a block order from court or is known to be blocked, from media or other sources. Currently some dates are now in ISO, some are not. Ideally, it should be all ISO 8601.

  • “blocked in , see:“ -- Date the URL got issued a block order or is known to be blocked, with a reference.

  • “last updated on” -- Date where a human annotator can verify (from the web content) that the page was most recently get updated.

Examples from the actual th.csv

  • Thai politics review journal [en]

  • Asian politics review and analysis [en]

  • Thai Lawyers for Human Rights (old website) [th] [en]

  • Issues in Deep South of Thailand [th] [en] [ms]

  • Telecom Asia, also cover ICT news in Asia. Announced closure on 2019-05-31. No longer updated. [en]

  • Anti-censorship group. (As of June 2020, the blog was last updated on 13 April 2019)

  • Asian porn. Found blocked in 2014, see: https://citizenlab.ca/2014/07/information-controls-thailand-2014-coup/ [en]

  • Human Rights Watch - Thailand page. Was blocked on Nov 2014, after the coup. https://www.blognone.com/node/63330 [en]

  • Think tank on civil society. Based in Singapore. [en] Regional site

  • Midnight University, was blocked in 2006. Found anomaly on OONI Explorer (most recent on 2020-06-09, as of 2020-06-11). [th]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions