-
Notifications
You must be signed in to change notification settings - Fork 2
CIM Semantic data
The Common Information Model (CIM) provides the schema backbone of the project. Across WP1 the team:
- curates the CIM/CGMES/NC ontology stack;
- authors a small set of project-specific extensions;
- reshapes the schema to fit within an LLM prompt window;
- loads and validates the Norwegian grid data.
The work is organised into the sub-tasks below.
Goal: assemble a single, coherent schema layer that every downstream component — the chatbot, the Visual Graph configurations and the diagram tools — can rely on.
-
Source distribution. Base CIM/CGMES/NC schemas are loaded into GraphDB from the
Inst4CIM-KGrdfs-improveddistribution, which republishes the IEC-sourced profile RDFS files in a more complete OWL form:- inverse relations declared;
- inter-profile redundancies removed;
- consistent Turtle formatting applied.
-
Profiles retained. Only the profiles exercised by the Norwegian grid data are kept:
- core CGMES profiles: Equipment, Topology, Steady-State Hypothesis, State Variables, Diagram Layout, Geographical Location;
- Network Codes extensions.
-
External alignments. The stack is aligned with standard external vocabularies so CIM, measurements, geography and dataset metadata share a single graph:
- QUDT for units of measure and physical quantity kinds;
- GeoSPARQL for geometry;
-
dcatfor dataset metadata.
This curated stack serves as the schema backbone for the remainder of WP1 and for every tool in Talk2PowerSystem.
The curated stack is too large to hand to an LLM as-is:
- CIM/CGMES + Network Codes ≈ 900 classes and ≈ 5,500 properties;
- the same term is often re-declared across up to twenty profiles;
- the vast majority of those terms never appear in Statnett's instance data.
Carrying the right triples is only half the problem; the physical layout of the Turtle file matters just as much, because the LLM consumes it as a single long string. Ontologies emitted by most RDF stores:
- split each term's description across many distant blocks;
- bury axioms inside blank-node-bound OWL restrictions;
- omit any stable ordering.
WP1 addresses both problems in a two-stage pipeline — subset, then pretty-print — that turns the full schema stack into a compact, deterministic artefact.
A SPARQL CONSTRUCT query ontology-query.rq:
- discovers — with inference enabled, so superclasses and super-properties are picked up — the exact set of classes, object/datatype properties, and enumeration members instantiated in the target dataset;
- describes those terms without inference so that asserted axioms are preserved verbatim;
- drops UML and administrative noise (
cims:stereotype,cims:belongsToCategory, XML mapping metadata, QUDT cross-references) relevant to schema engineers but distracting to a query-generating LLM; - removes unused bookkeeping predicates;
- truncates long
rdfs:commentvalues at a word boundary using a regex written to handle SPARQL's treatment of.across line breaks.
Result: a compact subset of roughly 285 classes and 445 properties (245 object + 200 datatype) that retains everything the chatbot needs and discards everything it does not.
A survey of Java Turtle serializers (Inst4CIM-KG#turtle-serialization) leads the project to adopt:
- Andreas Textor's
turtle-formatterlibrary; - the
owl-clitool.
Together they pretty-print the subsetted ontology with:
- a fixed prefix table;
- a deterministic subject order: ontology header → classes → object properties → datatype properties → enumerations;
- all statements about a given subject grouped into a single block;
- OWL restrictions expanded inline rather than hidden behind blank nodes.
The output (cim-subset-pretty.ttl) is both human-readable and diff-friendly, so schema changes can be reviewed in pull requests like any other artefact.
Size comparison on a representative CIM/CGMES 16 subset:
| Form | Size | Reduction |
|---|---|---|
| Raw Turtle ontology | ~1.56 MB | — |
| SOML rendering | 260 KB | ≈ 16× smaller |
| Subsetted + pretty-printed (LLM context) | 37 KB | ≈ 42× smaller |
The final form fits comfortably into the model's prompt window while still covering every term referenced in the Norwegian grid data. Full details are published in the blog post Ontology Simplification for LLM.
A small, purpose-built ontology — cimr.ttl, namespace https://cim.ucaiug.io/rules# — materializes "shortcut" predicates so the LLM does not have to generate a five- or six-hop SPARQL path every time a user asks about containment or connectivity.
Common parent class. To make shortcuts type-correct across the full CIM hierarchy:
-
cimr:EquipmentOrContaineris declared as the common parent ofcim:Equipmentandcim:ConnectivityNodeContainerundercim:PowerSystemResource; - it is used as
rdfs:domainandrdfs:rangeof every containment and connectivity shortcut, so a single predicate spans Substation, VoltageLevel, Bay and Equipment uniformly.
Containment shortcuts. Flattened by cimr:hasPart / cimr:isPart:
- declared via
rdfs:subPropertyOfas a disjunction of the three explicit CIM containment relations:-
cim:EquipmentContainer.Equipments; -
cim:Substation.VoltageLevels; -
cim:VoltageLevel.Bays;
-
- their inverses (
cim:Equipment.EquipmentContainer,cim:VoltageLevel.Substation,cim:Bay.VoltageLevel) bundle symmetrically intocimr:isPart; - transitive closures
cimr:hasPartTransitive/cimr:isPartTransitiveuse the custompsys:transitiveOverconstruct rather thanowl:TransitiveProperty, so closure is taken specifically over the bundledhasPartwithout dragging in unrelated transitive relations.
Terminal/equipment navigation. Generalized by:
-
cimr:Terminal.Equipmentand its inversecimr:Equipment.Terminals; - bundles
cim:Terminal.ConductingEquipmentandcim:Terminal.AuxiliaryEquipmentunder a single predicate, so "which equipment terminates here?" no longer has to enumerate the two terminal cases.
Electrical connectivity. Expressed by the symmetric cimr:connectedTo:
- five-node, length-4 property chain: Equipment → Terminal → ConnectivityNode → Terminal → Equipment;
- premises:
cimr:Equipment.Terminals/cim:Terminal.ConnectivityNode/cim:ConnectivityNode.Terminals/cimr:Terminal.Equipment; - encoded by reifying the chain as a
psys:PropChain4instance with explicitpsys:premise1..4andpsys:conclusionslots, because GraphDB's rule language matches these fixed-arity patterns much more efficiently thanrdf:List-basedowl:propertyChainAxiom(#218).
Higher-level connectivity. cimr:connectedThroughPart:
- a symmetric
psys:PropChain3whose premises arehasPartTransitive / connectedTo / isPartTransitive; - lets a user ask "is this substation connected to that substation?" without ever mentioning terminals or connectivity nodes at the SPARQL level.
Datatype extensions. Two small additions complete the file:
-
cimr:mridSignificantPart(functional) — isolates the non-varying portion of CIM UUIDs (whose leading component drifts across exports and confuses full-text and autocomplete indexers); -
cimr:Measurement.isInCognite— flagscim:Analogmeasurements that have a matching Cognite time-series; carries acims:pragmaticsannotation instructing the LLM to reuse the measurement's mRID as the Cogniteexternal_idwhen federating out to the time-series API.
All cimr: shortcuts are driven by a single hand-written GraphDB PIE ruleset (cim.pie) implementing only the minimal rule fragment required:
- subclass:
cax_sco,scm_sco; - subproperty:
prp_spo1,scm_spo; - inverse:
prp_inv1,prp_inv2; - symmetric:
prp_symp; -
psys:transitiveOver; - the two fixed-arity
psys:PropChain3/psys:PropChain4rules.
rdfs:domain / rdfs:range propagation is deliberately omitted because its covariant behaviour on CIM hierarchies produces large amounts of counter-intuitive and query-useless inference (#93, #270).
Inference footprint on the Nordic44 sample:
| Ruleset | Inferred / Explicit | Expansion |
|---|---|---|
| Basic CIM rules only | — | 1.28× |
Full cimr ruleset |
≈ 60.4 k / ≈ 63.5 k (≈ 124 k total) | 1.95× |
| Stock OWL2-RL | — | 2.63× |
A pared-down footprint that still delivers every shortcut the chatbot and visual graph tools depend on. Design rationale, worked SPARQL examples and the link back to earlier work on CIDOC CRM "Fundamental Relations" are documented in the wiki page Inference and the blog post Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM.
Out of the box, the CIM Diagram Layout profile models only the abstract geometry of a diagram (cim:Diagram with its DiagramObject / DiagramObjectPoint substructure) and says nothing about the renderable artefacts (an SVG file on disk, a clickable HTML app, a Visual Graph saved configuration) that a chatbot is expected to display.
The project publishes a small, purpose-built ontology to close that gap: cim-diagrams.ttl (namespace https://cim.ucaiug.io/diagrams#, prefix cimd:).
- Declared as a
cim:IdentifiedObjectsubclass. - Carries a
cims:pragmaticshint instructing the LLM: "cimd:Diagraminstances can be displayed by the diagram tool;cim:Diagraminstances cannot be displayed". - Each instance is typed by the
cimd:DiagramKindenumeration:-
cimd:DiagramKind.PowSyBl-SingleLineDiagram— single substation; -
cimd:DiagramKind.PowSyBl-SingleLineDiagram-Multi— 2-D matrix of connected substations; -
cimd:DiagramKind.PowSyBl-NetworkAreaDiagram— voltage-level network views (whole grid,SubGeographicalRegion,LoadArea, orSubLoadArea); -
cimd:DiagramKind.GraphDB-VizGraph— saved Visual Graph explorations.
-
- Every enum member carries a Boolean
cimd:DiagramKind.isClickablethat the chatbot UI consults to decide whether to enable the SVG → GraphDB hyperlink behaviour described in Electrical Diagrams.
Each cimd:Diagram instance carries:
-
cimd:Diagram.link— portable relative path (so a repository can move between Statnett RNDP and Graphwise CIM environments without rewriting URLs); -
dct:format— MIME type (image/svg+xmlfor PowSyBl,text/htmlfor Visual Graph); -
cimd:Diagram.PowerSystemResource— n..n back-link to the depictedcim:PowerSystemResourcesubjects, with inversecimd:PowerSystemResource.Diagramsso "what diagrams exist for this substation?" reduces to a single-triple pattern; -
cimd:Diagram.mRIDs(PowSyBl only) — opaque string pre-formatted exactly as the Python PowSyBl API expects (single mRID, 2-D[["…","…"]]matrix, or flat("…","…"…)list), so the generator script consumes the TSV verbatim.
Models a GraphDB Visual Graph saved configuration that, given a focus node supplied as uri=<URI> query-string parameter on top of cimd:DiagramConfiguration.link, renders a tailored subgraph around that node.
-
cimd:DiagramConfiguration.appliesTo— class of resources legal as focus node (cim:Substation,cim:VoltageLevel,cim:PowerSystemResource,owl:Class, …); -
cimd:DiagramConfiguration.displaysInstances— Boolean flag separating configurations that work on instance data from those restricted to class/schema views; - both annotated with
cims:pragmaticsstrings instructing the LLM:- "only allow resources of the types indicated here as a focus node";
- "if the value is false do not use the configuration to display instance data, only classes".
The actual catalogue of cimd:Diagram / cimd:DiagramConfiguration instances is produced out-of-band and loaded alongside the grid data.
For PowSyBl — six CONSTRUCT queries traverse the grid via cimr:connectedThroughPart and the containment shortcuts:
-
PowSyBl-SLD-substation.rq; -
PowSyBl-SLD-2substations.rq(usesgeo:asWKTto order each substation pair west-to-east); -
PowSyBl-NAD-all.rq; -
PowSyBl-NAD-SubGeographicalRegion.rq; -
PowSyBl-NAD-LoadArea.rq; -
PowSyBl-NAD-SubLoadArea.rq.
These emit one cimd:Diagram per diagrammable scope and concatenate into diagrams.trig. A Python driver script then:
- loads Nordic44 and Telemark-120 into PowSyBl;
- iterates over the TSV of
(kind, mRIDs, link)triples; - writes each SVG to disk;
- runs
add_iri.pyto post-process every<svg>element — using companion JSON metadata and the_45_/_95_PowSyBl ID-encoding rules — to attach aurn:uuid:iriattribute, so the chatbot UI can turn any click into a GraphDB resource navigation (issue #366).
For Visual Graph — an analogous extraction reads the GraphDB Workbench saved/config REST endpoints and materializes diagrams.trig / diagramConfigs.trig (task #293).
Because every diagram and every configuration lands in the KG as proper RDF, the Display Graphics tool described in WP2 can discover them by purely semantic means:
- filter by
cimd:DiagramKind; - dereference by
cimd:Diagram.PowerSystemResource; - check
cimd:DiagramConfiguration.appliesTobefore binding a focus node.
No file paths or configuration IDs are hard-coded, and the same approach scales to any future diagram technology as soon as a new cimd:DiagramKind enum member is added.
CIM, QUDT and GeoSPARQL together cover most of what the Norwegian grid data references, but a handful of concrete terms are missing on one side or the other, and several terms that do exist are not aligned across the two vocabularies. WP1 closes those gaps in four small OWL files, all loaded alongside the base schema stack through load/ontologies.txt:
Wherever a CIM term has a clean semantic counterpart in QUDT, the two sides are bridged by a single skos:exactMatch triple, so a downstream SPARQL consumer can traverse the mapping in either direction without any hard-coded alignment table.
cim-units.ttl introduces CIM-side terms required for Norwegian and cross-border market prices:
-
cim:RealEnergyPricequantity kind — matched toquantitykind:CostPerEnergy; - two
cim:UnitSymbolmembers used by theKraftpriser_*price streams:-
cim:UnitSymbol.EURperMWh; -
cim:UnitSymbol.NOKperMWh;
-
- a
cim:CurrencyExchangeRateenumeration (matched to the newquantitykind:ExchangeRate) with three members, eachskos:exactMatch-linked to the corresponding QUDT fractional currency unit so the enumeration is simultaneously a legal CIMuml:enumerationvalue and a machine-resolvable QUDT unit:-
cim:CurrencyExchangeRate.DKKperEUR; -
cim:CurrencyExchangeRate.NOKperEUR; -
cim:CurrencyExchangeRate.SEKperEUR.
-
QUDT ships a rich unit system but lacks fractional currency-per-energy units and a dimensionless ExchangeRate quantity kind; the project authors them in cim-qudt.ttl.
Energy-price units — two new qudt:Unit individuals:
-
unit:CCY_EUR-PER-MegaW-HR(symbol "€/(MW·h)"); -
unit:CCY_NOK-PER-MegaW-HR(symbol "NOK/(MW·h)").
Each declared with:
- correct
qudt:hasDimensionVector; -
qudt:conversionMultiplierinto SI (2.777…×10⁻¹⁰); - three-way
qudt:hasFactorUnitdecomposition intounit:CCY_EUR/unit:CCY_NOK,unit:MegaWandunit:HR; -
qudt:hasQuantityKindlink toquantitykind:CostPerEnergy.
Currency-per-currency units — three factor-unit fractions linked to the new quantitykind:ExchangeRate:
-
unit:CCY_DKK-PER-CCY_EUR; -
unit:CCY_NOK-PER-CCY_EUR; -
unit:CCY_SEK-PER-CCY_EUR.
The new quantitykind:ExchangeRate:
- declared with the dimensionless dimension vector
qkdv:A0E0L0I0M0H0T0D1(currency being dimensionless in the QUDT model); - carries a
qudt:informativeReferenceto the Wikipedia article on exchange rates.
Upstream proposals. All these additions are proposed upstream:
- in
qudt-public-repovia issue #1283 "add quantitykind:ExchangeRate"; - companion CIM-side gap tracked by Inst4CIM-KG issue #168 "how to represent fractional units (Price and CurrencyExchange)".
The post-load 08-add-qudt-terms.ru step (see Instance-Data Loading) then federates out to the public QUDT endpoint and pulls only those QUDT terms actually referenced by the data into a dedicated https://cim.ucaiug.io/qudt graph, so locally-authored new terms and upstream-sourced existing terms sit side by side.
cim-external.ttl re-declares external-vocabulary terms that neither CIM nor the upstream ontologies publish in a directly-usable form:
-
dct:requiresis promoted to a properowl:ObjectPropertyso that CGMESModel.DependentOn-style profile-dependency links are correctly typed; - four GeoSPARQL core terms —
geo:Feature,geo:Geometry,geo:asWKT,geo:hasGeometry— are declared with labels and domain/range so the04-add-WKT.rupost-load step has a concrete schema to emit against; -
qudt:hasUnitis declared as anowl:ObjectProperty(QUDT itself leaves the full axiomatization off the public endpoint); -
qudt:plainTextDescription rdfs:subPropertyOf dct:descriptionlets any tool consumingdct:descriptionautomatically see QUDT textual descriptions as well.
Statnett's operational CIM17 exports reference vendor- and TSO-specific extensions not part of any published IEC profile:
-
alstom:→http://www.alstom.com/grid/CIM-schema-cim15-extension#(Alstom EMS); -
entsoe_sch:→http://entsoe.eu/CIM/SchemaExtension/3/1#(ENTSO-E schema); -
form:→https://form.statnett.no/voc/form-ksd-extensions#(Form KSD); -
nek:→http://NEK.no/NK57/CIM/CIM100-Extension/1/0#(NEK national); -
statnett:→http://www.statnett.no/CIM-schema-cim15-extension#(Statnett local).
Without action these would leave orphan predicates in the loaded KG. WP1 consolidates the subset referenced by Nordic44 and Telemark-120 into statnett-cim17-extra.ttl — "Statnett Supplementary Vocabulary", dct:modified 2025-11-18 — which re-declares every extension property as a proper owl:DatatypeProperty / owl:ObjectProperty with rdfs:label, rdfs:comment, rdfs:domain, rdfs:range, UML uml:hasStereotype and preserved EA uml:id. Examples:
- Alstom DC-tie-corridor ramp controls:
alstom:DCTieCorridor.manualRegXRampDC,rampDCLimitOp,rampDCLimitRef; - current-relay and protective-action back-links:
alstom:CurrentRelay.CurrentRelayAction,alstom:Measurement.ProtectiveAction*; - ENTSO-E
entsoe_sch:OperationalLimitType.limitTypepointer intoentsoe_sch:LimitTypeKind.
Because the file is loaded by the standard pipeline through load/ontologies.txt, every term present in the instance data is defined somewhere in the schema graph and the SPARQL generator neither hallucinates URIs nor falls back to a blind predicate lookup.
CIM's official rdfs:label and rdfs:comment strings are written for human schema engineers and routinely omit exactly the information an LLM needs in order to generate correct SPARQL:
- whether a class is AC- or DC-only;
- which string literals are legal values of a type enumeration;
- how to reuse an mRID when federating out to a non-RDF system;
- what the sign convention of a Boolean flag actually means on a given kind of equipment.
Asking the LLM to recover these conventions from the stock CIM documentation proves unreliable; adding them to the system prompt does not scale past a handful of terms and quickly exhausts the prompt window.
Solution: cims:pragmatics. A dedicated annotation property declared in the simplified ontology as an owl:DatatypeProperty:
- definition: "Specific SPARQL guidance and important practical information for an LLM. Applies to Classes and Properties.";
- attached directly to the term it governs.
Because the annotation is a plain triple inside cim-subset-pretty.ttl, it travels with the schema into whatever context window or retrieval step the chatbot uses, so every hint is delivered to the model precisely when — and only when — the term it refers to is in scope.
Defined in CIM generically as "Contains equipment beyond a substation belonging to a power transmission line" — wording a naive LLM would happily apply to DC interconnectors as well as AC lines, producing silently-wrong answers for "all lines" in a mixed AC/DC grid. The ontology therefore carries one extra triple:
cim:Line a owl:Class ;
rdfs:label "Line" ;
rdfs:comment "Contains equipment beyond a substation belonging to a power transmission line."@en ;
cims:pragmatics "This means only AC Line (not DC)" ;
rdfs:subClassOf cim:EquipmentContainer .With this single hint, the model consistently disambiguates cim:Line from cim:DCLine in generated SPARQL without any additional prompt engineering.
-
cim:ConductingEquipment— restricts the scope to AC. -
cim:Measurement.measurementType(typed asxsd:stringin CIM) — enumerates the legal values present in the Statnett data: "Possible values: CurrencyExchange-Actual, Price-Actual, ThreePhaseActivePower, ThreePhaseActivePower-Flow-Estimated", effectively turning a free-text column into a closed vocabulary. -
cim:Analog.positiveFlowIn— encodes the domain-specific sign convention for bidding-zone borders: "true" means the flow is from BiddingZoneOne to BiddingZoneTwo, "false" means the opposite. -
cimr:Measurement.isInCognite— instructs the agent to "Use the mRID of this measurement to query a timeseries in the Cognite API by external_id", bridging the semantic and time-series halves of the architecture (§T1.3). -
cim:Diagram/cimd:Diagram/cimd:DiagramConfiguration— tells the UI layer which classes are renderable at all and which focus-node types a given Visual Graph saved configuration accepts.
In each case a single annotation, authored once by a domain expert and stored next to the term it qualifies, removes an entire class of failure mode from the LLM's generated SPARQL — without bloating the system prompt or requiring any changes to the chatbot's code path.
Datasets loaded:
- Nordic44 — transmission level (44-bus reference model split across Enterprise, Grid and Network Code sub-profiles);
- Telemark-120 — distribution level (down to 120 V, split across MV1 and LV1 sub-profiles);
- the PowSyBl and Visual Graph diagram catalogues.
Pipeline driver: load/load.py, driven by three plain-text manifests:
-
data/repo-config.ttl— repository configuration (ruleset, indexing, plugin settings); -
load/ontologies.txt— every CIM/CGMES/NC profile pluscimr:,cimd:, unit/currency/vendor, QUDT, Cognite-metadata and CIM4Enterprise schemas; -
load/instances.txt— TriG files making up the two grid datasets.
(1) Repository provisioning. load.py (re-)creates the target repository from repo-config.ttl via the GraphDB REST API:
- optionally deletes an existing one under
--force; - subsequent loads start from a known-clean state with
emptybase ruleset, context indexing on, and FTS staged for later activation.
(2) Ontology loading. Each URL in ontologies.txt:
- is downloaded;
- gzip-compressed on the fly;
- posted into a single ontology named graph
https://cim.ucaiug.io/ns#graph— so the full schema stack lives in one place and can be swapped atomically.
(3) Instance-data loading. Each URL in instances.txt:
- loaded as gzipped TriG;
- populates the named graph declared inside the file itself, giving the KG a clean per-profile and per-dataset graph layout that downstream tools can query or isolate.
(4) Post-load SPARQL pipeline. load.py executes every .ru file in data/queries/ in filename order, each performing one narrowly-scoped transformation that cannot be expressed statically in the schema:
-
01-add-inference.ru— registers the customcim.pieruleset and triggers a full reinference pass socimr:shortcuts, inverse relations andPropChain3/PropChain4chains are materialized. -
02-add-mridSignificantPart.ru— extracts the stable leading segment of everycim:IdentifiedObjectURN so full-text and autocomplete indexes can match mRIDs across CGMES export variants. -
03-add-missing-mrid.ru— backfillscim:IdentifiedObject.mRIDon resources exported without one. -
04-add-WKT.ru— builds GeoSPARQLgeo:wktLiteralgeometries from thecim:PositionPointchain, grouping points bycim:LocationandsequenceNumber, emittingPOINTorLINESTRINGas appropriate, and promoting the associatedcim:PowerSystemResourceto ageo:Feature. -
05-delete-redundant-geo.ru— removes allcim:PositionPointindividuals and any leftovergeo:asGML/geo:asGeoJSONencodings, leaving the harmonizedgeo:wktLiteralas the sole geometry representation. -
07-enable-geosparql.ru— activates the GraphDB GeoSPARQL plugin so those literals are indexed for spatial joins and bounding-box queries. -
08-add-qudt-terms.ru— federates out to the public QUDT SPARQL endpoint and pulls only those units, quantity kinds and related terms referenced by the loaded data into a dedicatedhttps://cim.ucaiug.io/qudtgraph, avoiding the cost of mirroring the full QUDT vocabulary. -
09-clean-up-crap.ru— strips exporter noise (dcat:Resource,dct:Resource, CGMESModelmetadata); removes a spuriousdct:description rdfs:domain dcat:Datasetaxiom that would otherwise mis-type all of QUDT; dropsrdfs:label/skos:altLabel/dct:descriptionvalues in languages other than English and Norwegian; removes two redundantrdfs:subClassOf cim:PowerSystemResourceassertions that thecimr:EquipmentOrContainerhierarchy already covers. -
10-add-autocomplete-labels.ru— reconfigures the autocomplete index to match oncim:IdentifiedObject.name,cim:IdentifiedObject.aliasNameandcim:CoordinateSystem.crsUrnrather than the defaultrdfs:label, then enables the index. -
11-compute-rank.ru— computes the RDF Rank used downstream by the Visual Graph tool and the chatbot to prefer salient resources when disambiguating names.
- idempotent under
--force; - runs against both Statnett RNDP and Graphwise CIM environments without code changes (a parallel entry point
load/rndp-load.pyadapts the same flow for the RNDP bastion); - produces on either side a repository with:
- the same named-graph layout;
- the same inferred closure (≈ 60 k inferred for ≈ 63.5 k explicit triples on Nordic44, a 1.95× expansion);
- the same GeoSPARQL indexes;
- the same autocomplete configuration.
To extend the reach and utility of the Knowledge Graph, WP1 integrates data sources that do not exist in semantic form.
The most challenging technical aspect of WP1 is the integration of time-series readings (sensor data, production/consumption fluctuations) with the Knowledge Graph.
Why not store time-series as RDF? Duplicating millions of timestamped values into RDF would be:
- prohibitively expensive to store, index and reason over;
- a forcing function on the KG to track every new measurement ingested by the SCADA layer.
Architecture: clean separation of concerns.
-
Knowledge Graph keeps structural and descriptive metadata — what a measurement is:
- which quantity it measures;
- which
cim:Terminalandcim:Equipmentit is attached to; - which substation and voltage level contain it;
- its
cim:Measurement.measurementType, unit andcim:Analog.positiveFlowInflow-direction semantics.
- Cognite Data Fusion (CDF) — Statnett's operational time-series platform — keeps the high-volume numerical readings.
Stitched together by a deterministic identifier contract:
- the
cim:IdentifiedObject.mRIDof eachcim:Analogmeasurement is reused verbatim as the Cogniteexternal_id; - the
cimr:Measurement.isInCogniteBoolean flag (materialized at load time, carrying thecims:pragmaticsLLM hint introduced in §T1.2) tells the agent exactly which measurements have a corresponding time-series on the Cognite side, so the model never issues a datapoint fetch guaranteed to miss.
Two-step federated workflow at query time — rather than a single SPARQL statement:
- SPARQL against GraphDB — translate a natural-language question (e.g. "what is the active-power flow through that 420 kV transformer at Sylling over the last 24 hours?") into the concrete set of mRIDs, units, measurement types and flow-direction flags that answer the structural half of the question.
-
Cognite tools (described in WP2) — hand those mRIDs to dedicated tools and call the Cognite API directly:
-
Retrieve time series — maps each mRID to its Cognite
external_idand resolves the series' metadata; - Retrieve datapoints — fetches raw or aggregated datapoints (min, max, average, step-interpolation) over a user-specified timestamp range;
- user-scoped access enforced end-to-end via OAuth2 On-Behalf-Of (OBO), so the data operation runs under the authenticated user's own CDF permissions rather than a shared service principal.
-
Retrieve time series — maps each mRID to its Cognite
Benefits of this division of labour:
- KG remains a lightweight semantic index — small enough to inference over, serialize into an LLM prompt and visualize in the graph tools;
- Cognite does what it is designed for — serving large, appropriately-aggregated numerical slices at interactive latency;
- enables genuinely hybrid questions in a single coherent answer, combining:
-
topology — "all measurements on equipment connected through this substation via
cimr:connectedThroughPart"; -
ontology-driven filtering — "only active-power
Analogs withpositiveFlowIn = trueand units in MW"; - time-series aggregation — "averaged over the last week, hourly buckets";
-
topology — "all measurements on equipment connected through this substation via
- without ever materializing time-series values as RDF triples.
Beyond Statnett's own CIM assets, WP1 also experiments with enriching the KG at query time with public geospatial data, rather than by bulk ingestion.
The enabler is twofold:
-
Statnett side — the
04-add-WKT.rustep of the load pipeline:- normalizes every
cim:Locationchain into a GeoSPARQLgeo:wktLiteral; - promotes the associated
cim:PowerSystemResourceto ageo:Feature; - so grid objects become directly comparable with any other GeoSPARQL dataset.
- normalizes every
-
OpenStreetMap side — the planet is published as RDF/GeoSPARQL by the Freiburg
osm2rdfproject and served on the public QLever endpointhttps://qlever.dev/api/osm-planet:- administrative boundaries, rivers, roads and other real-world features are already addressable as triples with WKT geometries.
Mechanism. A single SPARQL SERVICE clause is sufficient to:
- fetch (e.g.) the polygon of a Norwegian county or municipality from OSM;
- intersect it — via
geof:sfIntersectsor the indexedgeo:sfIntersectsmagic predicate — with the local CIM geometries in GraphDB; - without duplicating the (very large) OSM corpus into Statnett's repository and without any bespoke ETL or schema-alignment step.
Questions unlocked that the raw CIM cim:PositionPoint data cannot answer on its own:
- "which substations and AC line segments sit inside the county of Viken?";
- "which power lines pass within a given distance of a lake or a protected area?";
- "attach every geolocated grid asset to its administrative territorial hierarchy".
Done on demand, so any update to OSM is immediately reflected in the answers. The approach, the WKT conversion query, the YasGUI-based map visualizations and worked federated queries against OSM are documented in the project blog post Mapping Electrical Resources with GeoSPARQL.
During Work Package 1, effort is split across two separate infrastructure environments where the Talk2PowerSystem project is developed and deployed:
- Statnett's side — the Research and Development Platform (RNDP) https://rndp.statnett.no/, where Graphwise assists with the infrastructure and deployment by implementing Helm charts (IaC) for the different services and promoting Helm deployments via Jupyter Hub as a bastion.
- Graphwise's own infrastructure — the CIM environment https://cim.ontotext.com/chat/, where Graphwise designs and implements the full infrastructure and deployment process.