This repository contains code and configuration for the Gene Ontology's
Blazegraph SPARQL endpoint service (rdf.geneontology.org). It packages
Blazegraph into a Docker container behind an Apache reverse proxy and
deploys to AWS EC2 instances using Terraform and Ansible.
Canonical deployment and operations documentation lives at: devops-documentation / README.graphstore.md
General devops environment setup (credentials, AWS, SSH keys) is at: devops-documentation / README.setup.md
See the Makefile for details. You can build with:
make all
make load-blazegraph
By default the load-blazegraph target starts blazegraph with 32 gigs of memory. For a local build, you can set an environment variable before running:
BGMEM=8G make load-blazegraph
Where 8G can be substituted for however much memory you want to allocate.
To start blazegraph run (with the optional BGMEM variable):
make bg-start
For now this must be constructed yourself but in future we will host
the RDF, the blazegraph.jnl file, and provide a query endpoint.
This prototype uses blazegraph. We are also investigating RDFox and Neo4j; for the latter we will use the SciGraph RDF to Neo mappings.
The procedure places all triples to be loaded into the rdf/ directory:
- ontology: go-lego.owl (imports other ontologies)
- GAFs translated to LEGO using OWLTools/Minerva
- Native LEGO models
After this various transformations take place (TODO)
- sparql/delete-NamedIndividual-ul.rq - clogs querying
- sparql/insert-oban-mf.rq - adds derived simple representation
- todo - bp, cc
Here we describe the modeling used and how to query the database. See also the sparql directory.
The contents of the store can be broken down into:
- the ontology (both GO and other ontologies)
- functional annotations: descriptions of gene products using GO
- other support information, e.g. orthology/trees
The store has two different ways of modeling functional annotations superimposed. A simple model that allows for basic gene associations and a richer more expressive lego model. For more on lego, see Noctua
We use the standard OWL to RDF mapping.
Note this results in a pattern that is complex to query for existential restrictions. We may consider superimposing simple instance-level relationships over this.
We use the OBAN association model. Simple triples with a reification like pattern.
TODO - document
The core unit is an annoton. It describes how any specific molecular entity (e.g. a gene product or protein complex)
?functionInstance a ?functionClass ;
occurs_in: ?locationInstance ;
part_of: ?processInstance ;
enabled_by: ?molecularInstance .
?locationInstance a ?locationClass .
?processInstance a ?processClass .
?molecularInstance a ?molecularClass .
TODO: causal relations
The evidence model is the same regardless of whether simple or lego annotations are used. For now, see:
https://github.com/geneontology/minerva/blob/master/specs/owl-model.md
- users
- GO REFs
- ...
We can do a variety of transformations in SPARUL
TODO: document reasoning strategy
- TODO: sparql-checks, SHACL, taxon constraints, ...
TODO: We can eventually move the GO golr export to this framework (currently requires in-memory loading). One possibility is to take the RDF load into SciGraph and use the golr exporter there. Or we can explore use of SPARQL.