Skip to content

Latest commit

 

History

History
39 lines (31 loc) · 1.12 KB

File metadata and controls

39 lines (31 loc) · 1.12 KB

Use of data elements

The data entities within the config file sections allows Auto-CORPus to parse details out of the sections, such as the section header, table title or footer. Some data entity elements are required to allow Auto-CORPus to parse source HTML files correctly.

Sections which do not make use of the data entity:

  • Title
  • Keywords
  • paragraphs
  • abbreviations-table

All other sections make use of the data entity. The following sections and corresponding data entity elements are required by Auto-CORPus:

  • Sections
    • headers
  • Sub-sections
    • headers
  • Tables
    • caption
    • table-content
    • title
    • footer
    • table-row
    • header-row
    • header-element
  • Figures
    • caption

In addition, the References section does not require any entries within the data entity to function correctly, but does allow the use of:

  • title
  • journal
  • volume

Some HTML source files include markup within the reference text to identify the above information. The inclusion of these data entries in the config will enable Auto-CORPus to identify and process this information and include it in the output file.