lexicon.json: the schema lexicon
Each sub-corpus is stored in a self-contained directory, e.g. mozart_sonatas.
Within each of these directories (called <dir>/ below) the same substructure is used:
<dir>/mscore: contains musescore source files<dir>/musicxml: contains prepared musicxml files, generated from the sources and augmented with note IDs<dir>/notelist/contains the representation of each piece as a JSON note list.<dir>/annotations: contains schema annotations<dir>/annotations/<schema>: contains the annotations for a specific schema template, e.g.<dir>/schemata/fonte.2for thefonte.2schema template.
<dir>/groups/:<dir>/groups/<schema>contains the suggestions for a schema in JSON.
Contains tools for data preparation, post-processing, etc. (see below).
Contains the documentation for annotators, developers, and curators.
Contains the annotation manual, including LaTeX sources.
For every corpus, follow these steps:
- Make sure that the encoding is correct and unambiguous. In particular, take care of repetitions.
- Convert the MuseScore sources (
data/<dir>/mscore) to MusicXML (data/<dir>/musicxml). - Add note IDs to the MusicXML files in
data/<dir>/musicxml. - Generate note lists (tbd.)
- Precompute suggestions (tbd.)
Steps 2 and 3 can be performed by running mscx_to_xml.sh from the tools/ directory.
In order for this to work, you first need to setup a virtual environment in the tools/ directory
and install some python dependencies.
This is done automatically by the setup.sh script.
You only need to run this script once.
$ cd tools
$ ./setup.sh # only do this the first time
$ ./mscx_to_xml ../data/<dir>/mscore ../data/<dir>/musicxmlThis is currently very slow, because it compiles every file every time. In the future, this functionality will be moved to a makefile.
Steps 4 and 5 still need to be documented.