CIM Semantic data

CIM Semantic Data Management (T1.2)

The Common Information Model (CIM) provides the schema backbone of the project. Across WP1 the team:

curates the CIM/CGMES/NC ontology stack;
authors a small set of project-specific extensions;
reshapes the schema to fit within an LLM prompt window;
loads and validates the Norwegian grid data.

The work is organised into the sub-tasks below.

Ontology Stack Curation

Goal: assemble a single, coherent schema layer that every downstream component — the chatbot, the Visual Graph configurations and the diagram tools — can rely on.

Source distribution. Base CIM/CGMES/NC schemas are loaded into GraphDB from the Inst4CIM-KG rdfs-improved distribution, which republishes the IEC-sourced profile RDFS files in a more complete OWL form:
- inverse relations declared;
- inter-profile redundancies removed;
- consistent Turtle formatting applied.
Profiles retained. Only the profiles exercised by the Norwegian grid data are kept:
- core CGMES profiles: Equipment, Topology, Steady-State Hypothesis, State Variables, Diagram Layout, Geographical Location;
- Network Codes extensions.
External alignments. The stack is aligned with standard external vocabularies so CIM, measurements, geography and dataset metadata share a single graph:
- QUDT for units of measure and physical quantity kinds;
- GeoSPARQL for geometry;
- dcat for dataset metadata.

This curated stack serves as the schema backbone for the remainder of WP1 and for every tool in Talk2PowerSystem.

Subsetting and LLM-Friendly Serialization

The curated stack is too large to hand to an LLM as-is:

CIM/CGMES + Network Codes ≈ 900 classes and ≈ 5,500 properties;
the same term is often re-declared across up to twenty profiles;
the vast majority of those terms never appear in Statnett's instance data.

Carrying the right triples is only half the problem; the physical layout of the Turtle file matters just as much, because the LLM consumes it as a single long string. Ontologies emitted by most RDF stores:

split each term's description across many distant blocks;
bury axioms inside blank-node-bound OWL restrictions;
omit any stable ordering.

WP1 addresses both problems in a two-stage pipeline — subset, then pretty-print — that turns the full schema stack into a compact, deterministic artefact.

Subsetting

A SPARQL CONSTRUCT query ontology-query.rq:

discovers — with inference enabled, so superclasses and super-properties are picked up — the exact set of classes, object/datatype properties, and enumeration members instantiated in the target dataset;
describes those terms without inference so that asserted axioms are preserved verbatim;
drops UML and administrative noise (cims:stereotype, cims:belongsToCategory, XML mapping metadata, QUDT cross-references) relevant to schema engineers but distracting to a query-generating LLM;
removes unused bookkeeping predicates;
truncates long rdfs:comment values at a word boundary using a regex written to handle SPARQL's treatment of . across line breaks.

Result: a compact subset of roughly 285 classes and 445 properties (245 object + 200 datatype) that retains everything the chatbot needs and discards everything it does not.

LLM-friendly serialization

A survey of Java Turtle serializers (Inst4CIM-KG#turtle-serialization) leads the project to adopt:

Andreas Textor's turtle-formatter library;
the owl-cli tool.

Together they pretty-print the subsetted ontology with:

a fixed prefix table;
a deterministic subject order: ontology header → classes → object properties → datatype properties → enumerations;
all statements about a given subject grouped into a single block;
OWL restrictions expanded inline rather than hidden behind blank nodes.

The output (cim-subset-pretty.ttl) is both human-readable and diff-friendly, so schema changes can be reviewed in pull requests like any other artefact.

Combined effect

Size comparison on a representative CIM/CGMES 16 subset:

Form	Size	Reduction
Raw Turtle ontology	~1.56 MB	—
SOML rendering	260 KB	≈ 16× smaller
Subsetted + pretty-printed (LLM context)	37 KB	≈ 42× smaller

The final form fits comfortably into the model's prompt window while still covering every term referenced in the Norwegian grid data. Full details are published in the blog post Ontology Simplification for LLM.

`cimr:` — CIM Inferred Extension Ontology

A small, purpose-built ontology — cimr.ttl, namespace https://cim.ucaiug.io/rules# — materializes "shortcut" predicates so the LLM does not have to generate a five- or six-hop SPARQL path every time a user asks about containment or connectivity.

Common parent class. To make shortcuts type-correct across the full CIM hierarchy:

cimr:EquipmentOrContainer is declared as the common parent of cim:Equipment and cim:ConnectivityNodeContainer under cim:PowerSystemResource;
it is used as rdfs:domain and rdfs:range of every containment and connectivity shortcut, so a single predicate spans Substation, VoltageLevel, Bay and Equipment uniformly.

Containment shortcuts. Flattened by cimr:hasPart / cimr:isPart:

declared via rdfs:subPropertyOf as a disjunction of the three explicit CIM containment relations:
- cim:EquipmentContainer.Equipments;
- cim:Substation.VoltageLevels;
- cim:VoltageLevel.Bays;
their inverses (cim:Equipment.EquipmentContainer, cim:VoltageLevel.Substation, cim:Bay.VoltageLevel) bundle symmetrically into cimr:isPart;
transitive closures cimr:hasPartTransitive / cimr:isPartTransitive use the custom psys:transitiveOver construct rather than owl:TransitiveProperty, so closure is taken specifically over the bundled hasPart without dragging in unrelated transitive relations.

Terminal/equipment navigation. Generalized by:

cimr:Terminal.Equipment and its inverse cimr:Equipment.Terminals;
bundles cim:Terminal.ConductingEquipment and cim:Terminal.AuxiliaryEquipment under a single predicate, so "which equipment terminates here?" no longer has to enumerate the two terminal cases.

Electrical connectivity. Expressed by the symmetric cimr:connectedTo:

five-node, length-4 property chain: Equipment → Terminal → ConnectivityNode → Terminal → Equipment;
premises: cimr:Equipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cimr:Terminal.Equipment;
encoded by reifying the chain as a psys:PropChain4 instance with explicit psys:premise1..4 and psys:conclusion slots, because GraphDB's rule language matches these fixed-arity patterns much more efficiently than rdf:List-based owl:propertyChainAxiom (#218).

Higher-level connectivity. cimr:connectedThroughPart:

a symmetric psys:PropChain3 whose premises are hasPartTransitive / connectedTo / isPartTransitive;
lets a user ask "is this substation connected to that substation?" without ever mentioning terminals or connectivity nodes at the SPARQL level.

Datatype extensions. Two small additions complete the file:

cimr:mridSignificantPart (functional) — isolates the non-varying portion of CIM UUIDs (whose leading component drifts across exports and confuses full-text and autocomplete indexers);
cimr:Measurement.isInCognite — flags cim:Analog measurements that have a matching Cognite time-series; carries a cims:pragmatics annotation instructing the LLM to reuse the measurement's mRID as the Cognite external_id when federating out to the time-series API.

`cim.pie` — Custom Inference Ruleset

All cimr: shortcuts are driven by a single hand-written GraphDB PIE ruleset (cim.pie) implementing only the minimal rule fragment required:

subclass: cax_sco, scm_sco;
subproperty: prp_spo1, scm_spo;
inverse: prp_inv1, prp_inv2;
symmetric: prp_symp;
psys:transitiveOver;
the two fixed-arity psys:PropChain3 / psys:PropChain4 rules.

rdfs:domain / rdfs:range propagation is deliberately omitted because its covariant behaviour on CIM hierarchies produces large amounts of counter-intuitive and query-useless inference (#93, #270).

Inference footprint on the Nordic44 sample:

Ruleset	Inferred / Explicit	Expansion
Basic CIM rules only	—	1.28×
Full `cimr` ruleset	≈ 60.4 k / ≈ 63.5 k (≈ 124 k total)	1.95×
Stock OWL2-RL	—	2.63×

A pared-down footprint that still delivers every shortcut the chatbot and visual graph tools depend on. Design rationale, worked SPARQL examples and the link back to earlier work on CIDOC CRM "Fundamental Relations" are documented in the wiki page Inference and the blog post Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM.

`cimd:` — CIM Diagrams Ontology

Out of the box, the CIM Diagram Layout profile models only the abstract geometry of a diagram (cim:Diagram with its DiagramObject / DiagramObjectPoint substructure) and says nothing about the renderable artefacts (an SVG file on disk, a clickable HTML app, a Visual Graph saved configuration) that a chatbot is expected to display.

The project publishes a small, purpose-built ontology to close that gap: cim-diagrams.ttl (namespace https://cim.ucaiug.io/diagrams#, prefix cimd:).

`cimd:Diagram` — renderable artefacts as first-class resources

Declared as a cim:IdentifiedObject subclass.
Carries a cims:pragmatics hint instructing the LLM: "cimd:Diagram instances can be displayed by the diagram tool; cim:Diagram instances cannot be displayed".
Each instance is typed by the cimd:DiagramKind enumeration:
- cimd:DiagramKind.PowSyBl-SingleLineDiagram — single substation;
- cimd:DiagramKind.PowSyBl-SingleLineDiagram-Multi — 2-D matrix of connected substations;
- cimd:DiagramKind.PowSyBl-NetworkAreaDiagram — voltage-level network views (whole grid, SubGeographicalRegion, LoadArea, or SubLoadArea);
- cimd:DiagramKind.GraphDB-VizGraph — saved Visual Graph explorations.
Every enum member carries a Boolean cimd:DiagramKind.isClickable that the chatbot UI consults to decide whether to enable the SVG → GraphDB hyperlink behaviour described in Electrical Diagrams.

Each cimd:Diagram instance carries:

cimd:Diagram.link — portable relative path (so a repository can move between Statnett RNDP and Graphwise CIM environments without rewriting URLs);
dct:format — MIME type (image/svg+xml for PowSyBl, text/html for Visual Graph);
cimd:Diagram.PowerSystemResource — n..n back-link to the depicted cim:PowerSystemResource subjects, with inverse cimd:PowerSystemResource.Diagrams so "what diagrams exist for this substation?" reduces to a single-triple pattern;
cimd:Diagram.mRIDs (PowSyBl only) — opaque string pre-formatted exactly as the Python PowSyBl API expects (single mRID, 2-D [["…","…"]] matrix, or flat ("…","…"…) list), so the generator script consumes the TSV verbatim.

`cimd:DiagramConfiguration` — parametric Visual Graph views

Models a GraphDB Visual Graph saved configuration that, given a focus node supplied as uri=<URI> query-string parameter on top of cimd:DiagramConfiguration.link, renders a tailored subgraph around that node.

cimd:DiagramConfiguration.appliesTo — class of resources legal as focus node (cim:Substation, cim:VoltageLevel, cim:PowerSystemResource, owl:Class, …);
cimd:DiagramConfiguration.displaysInstances — Boolean flag separating configurations that work on instance data from those restricted to class/schema views;
both annotated with cims:pragmatics strings instructing the LLM:
- "only allow resources of the types indicated here as a focus node";
- "if the value is false do not use the configuration to display instance data, only classes".

Catalogue generation

The actual catalogue of cimd:Diagram / cimd:DiagramConfiguration instances is produced out-of-band and loaded alongside the grid data.

For PowSyBl — six CONSTRUCT queries traverse the grid via cimr:connectedThroughPart and the containment shortcuts:

PowSyBl-SLD-substation.rq;
PowSyBl-SLD-2substations.rq (uses geo:asWKT to order each substation pair west-to-east);
PowSyBl-NAD-all.rq;
PowSyBl-NAD-SubGeographicalRegion.rq;
PowSyBl-NAD-LoadArea.rq;
PowSyBl-NAD-SubLoadArea.rq.

These emit one cimd:Diagram per diagrammable scope and concatenate into diagrams.trig. A Python driver script then:

loads Nordic44 and Telemark-120 into PowSyBl;
iterates over the TSV of (kind, mRIDs, link) triples;
writes each SVG to disk;
runs add_iri.py to post-process every <svg> element — using companion JSON metadata and the _45_/_95_ PowSyBl ID-encoding rules — to attach a urn:uuid: iri attribute, so the chatbot UI can turn any click into a GraphDB resource navigation (issue #366).

For Visual Graph — an analogous extraction reads the GraphDB Workbench saved/config REST endpoints and materializes diagrams.trig / diagramConfigs.trig (task #293).

Because every diagram and every configuration lands in the KG as proper RDF, the Display Graphics tool described in WP2 can discover them by purely semantic means:

filter by cimd:DiagramKind;
dereference by cimd:Diagram.PowerSystemResource;
check cimd:DiagramConfiguration.appliesTo before binding a focus node.

No file paths or configuration IDs are hard-coded, and the same approach scales to any future diagram technology as soon as a new cimd:DiagramKind enum member is added.

Unit, Currency and Vendor Extensions

CIM, QUDT and GeoSPARQL together cover most of what the Norwegian grid data references, but a handful of concrete terms are missing on one side or the other, and several terms that do exist are not aligned across the two vocabularies. WP1 closes those gaps in four small OWL files, all loaded alongside the base schema stack through load/ontologies.txt:

Wherever a CIM term has a clean semantic counterpart in QUDT, the two sides are bridged by a single skos:exactMatch triple, so a downstream SPARQL consumer can traverse the mapping in either direction without any hard-coded alignment table.

CIM units, currencies and energy-price terms

cim-units.ttl introduces CIM-side terms required for Norwegian and cross-border market prices:

cim:RealEnergyPrice quantity kind — matched to quantitykind:CostPerEnergy;
two cim:UnitSymbol members used by the Kraftpriser_* price streams:
- cim:UnitSymbol.EURperMWh;
- cim:UnitSymbol.NOKperMWh;
a cim:CurrencyExchangeRate enumeration (matched to the new quantitykind:ExchangeRate) with three members, each skos:exactMatch-linked to the corresponding QUDT fractional currency unit so the enumeration is simultaneously a legal CIM uml:enumeration value and a machine-resolvable QUDT unit:
- cim:CurrencyExchangeRate.DKKperEUR;
- cim:CurrencyExchangeRate.NOKperEUR;
- cim:CurrencyExchangeRate.SEKperEUR.

New QUDT terms

QUDT ships a rich unit system but lacks fractional currency-per-energy units and a dimensionless ExchangeRate quantity kind; the project authors them in cim-qudt.ttl.

Energy-price units — two new qudt:Unit individuals:

unit:CCY_EUR-PER-MegaW-HR (symbol "€/(MW·h)");
unit:CCY_NOK-PER-MegaW-HR (symbol "NOK/(MW·h)").

Each declared with:

correct qudt:hasDimensionVector;
qudt:conversionMultiplier into SI (2.777…×10⁻¹⁰);
three-way qudt:hasFactorUnit decomposition into unit:CCY_EUR / unit:CCY_NOK, unit:MegaW and unit:HR;
qudt:hasQuantityKind link to quantitykind:CostPerEnergy.

Currency-per-currency units — three factor-unit fractions linked to the new quantitykind:ExchangeRate:

unit:CCY_DKK-PER-CCY_EUR;
unit:CCY_NOK-PER-CCY_EUR;
unit:CCY_SEK-PER-CCY_EUR.

The new quantitykind:ExchangeRate:

declared with the dimensionless dimension vector qkdv:A0E0L0I0M0H0T0D1 (currency being dimensionless in the QUDT model);
carries a qudt:informativeReference to the Wikipedia article on exchange rates.

Upstream proposals. All these additions are proposed upstream:

in qudt-public-repo via issue #1283 "add quantitykind:ExchangeRate";
companion CIM-side gap tracked by Inst4CIM-KG issue #168 "how to represent fractional units (Price and CurrencyExchange)".

The post-load 08-add-qudt-terms.ru step (see Instance-Data Loading) then federates out to the public QUDT endpoint and pulls only those QUDT terms actually referenced by the data into a dedicated https://cim.ucaiug.io/qudt graph, so locally-authored new terms and upstream-sourced existing terms sit side by side.

Alignment fills in external vocabularies

cim-external.ttl re-declares external-vocabulary terms that neither CIM nor the upstream ontologies publish in a directly-usable form:

dct:requires is promoted to a proper owl:ObjectProperty so that CGMES Model.DependentOn-style profile-dependency links are correctly typed;
four GeoSPARQL core terms — geo:Feature, geo:Geometry, geo:asWKT, geo:hasGeometry — are declared with labels and domain/range so the 04-add-WKT.ru post-load step has a concrete schema to emit against;
qudt:hasUnit is declared as an owl:ObjectProperty (QUDT itself leaves the full axiomatization off the public endpoint);
qudt:plainTextDescription rdfs:subPropertyOf dct:description lets any tool consuming dct:description automatically see QUDT textual descriptions as well.

Statnett supplementary vocabulary (CIM17 vendor / TSO extensions)

Statnett's operational CIM17 exports reference vendor- and TSO-specific extensions not part of any published IEC profile:

alstom: → http://www.alstom.com/grid/CIM-schema-cim15-extension# (Alstom EMS);
entsoe_sch: → http://entsoe.eu/CIM/SchemaExtension/3/1# (ENTSO-E schema);
form: → https://form.statnett.no/voc/form-ksd-extensions# (Form KSD);
nek: → http://NEK.no/NK57/CIM/CIM100-Extension/1/0# (NEK national);
statnett: → http://www.statnett.no/CIM-schema-cim15-extension# (Statnett local).

Without action these would leave orphan predicates in the loaded KG. WP1 consolidates the subset referenced by Nordic44 and Telemark-120 into statnett-cim17-extra.ttl — "Statnett Supplementary Vocabulary", dct:modified 2025-11-18 — which re-declares every extension property as a proper owl:DatatypeProperty / owl:ObjectProperty with rdfs:label, rdfs:comment, rdfs:domain, rdfs:range, UML uml:hasStereotype and preserved EA uml:id. Examples:

Alstom DC-tie-corridor ramp controls: alstom:DCTieCorridor.manualRegXRampDC, rampDCLimitOp, rampDCLimitRef;
current-relay and protective-action back-links: alstom:CurrentRelay.CurrentRelayAction, alstom:Measurement.ProtectiveAction*;
ENTSO-E entsoe_sch:OperationalLimitType.limitType pointer into entsoe_sch:LimitTypeKind.

Because the file is loaded by the standard pipeline through load/ontologies.txt, every term present in the instance data is defined somewhere in the schema graph and the SPARQL generator neither hallucinates URIs nor falls back to a blind predicate lookup.

LLM Guidance via `cims:pragmatics`

CIM's official rdfs:label and rdfs:comment strings are written for human schema engineers and routinely omit exactly the information an LLM needs in order to generate correct SPARQL:

whether a class is AC- or DC-only;
which string literals are legal values of a type enumeration;
how to reuse an mRID when federating out to a non-RDF system;
what the sign convention of a Boolean flag actually means on a given kind of equipment.

Asking the LLM to recover these conventions from the stock CIM documentation proves unreliable; adding them to the system prompt does not scale past a handful of terms and quickly exhausts the prompt window.

Solution: cims:pragmatics. A dedicated annotation property declared in the simplified ontology as an owl:DatatypeProperty:

definition: "Specific SPARQL guidance and important practical information for an LLM. Applies to Classes and Properties.";
attached directly to the term it governs.

Because the annotation is a plain triple inside cim-subset-pretty.ttl, it travels with the schema into whatever context window or retrieval step the chatbot uses, so every hint is delivered to the model precisely when — and only when — the term it refers to is in scope.

Representative example: `cim:Line`

Defined in CIM generically as "Contains equipment beyond a substation belonging to a power transmission line" — wording a naive LLM would happily apply to DC interconnectors as well as AC lines, producing silently-wrong answers for "all lines" in a mixed AC/DC grid. The ontology therefore carries one extra triple:

cim:Line a owl:Class ;
  rdfs:label "Line" ;
  rdfs:comment "Contains equipment beyond a substation belonging to a power transmission line."@en ;
  cims:pragmatics "This means only AC Line (not DC)" ;
  rdfs:subClassOf cim:EquipmentContainer .

With this single hint, the model consistently disambiguates cim:Line from cim:DCLine in generated SPARQL without any additional prompt engineering.

Other kinds of guidance carried by the same mechanism

cim:ConductingEquipment — restricts the scope to AC.
cim:Measurement.measurementType (typed as xsd:string in CIM) — enumerates the legal values present in the Statnett data: "Possible values: CurrencyExchange-Actual, Price-Actual, ThreePhaseActivePower, ThreePhaseActivePower-Flow-Estimated", effectively turning a free-text column into a closed vocabulary.
cim:Analog.positiveFlowIn — encodes the domain-specific sign convention for bidding-zone borders: "true" means the flow is from BiddingZoneOne to BiddingZoneTwo, "false" means the opposite.
cimr:Measurement.isInCognite — instructs the agent to "Use the mRID of this measurement to query a timeseries in the Cognite API by external_id", bridging the semantic and time-series halves of the architecture (§T1.3).
cim:Diagram / cimd:Diagram / cimd:DiagramConfiguration — tells the UI layer which classes are renderable at all and which focus-node types a given Visual Graph saved configuration accepts.

In each case a single annotation, authored once by a domain expert and stored next to the term it qualifies, removes an entire class of failure mode from the LLM's generated SPARQL — without bloating the system prompt or requiring any changes to the chatbot's code path.

Instance-Data Loading

Datasets loaded:

Nordic44 — transmission level (44-bus reference model split across Enterprise, Grid and Network Code sub-profiles);
Telemark-120 — distribution level (down to 120 V, split across MV1 and LV1 sub-profiles);
the PowSyBl and Visual Graph diagram catalogues.

Pipeline driver: load/load.py, driven by three plain-text manifests:

data/repo-config.ttl — repository configuration (ruleset, indexing, plugin settings);
load/ontologies.txt — every CIM/CGMES/NC profile plus cimr:, cimd:, unit/currency/vendor, QUDT, Cognite-metadata and CIM4Enterprise schemas;
load/instances.txt — TriG files making up the two grid datasets.

Pipeline stages

(1) Repository provisioning. load.py (re-)creates the target repository from repo-config.ttl via the GraphDB REST API:

optionally deletes an existing one under --force;
subsequent loads start from a known-clean state with empty base ruleset, context indexing on, and FTS staged for later activation.

(2) Ontology loading. Each URL in ontologies.txt:

is downloaded;
gzip-compressed on the fly;
posted into a single ontology named graph https://cim.ucaiug.io/ns#graph — so the full schema stack lives in one place and can be swapped atomically.

(3) Instance-data loading. Each URL in instances.txt:

loaded as gzipped TriG;
populates the named graph declared inside the file itself, giving the KG a clean per-profile and per-dataset graph layout that downstream tools can query or isolate.

(4) Post-load SPARQL pipeline. load.py executes every .ru file in data/queries/ in filename order, each performing one narrowly-scoped transformation that cannot be expressed statically in the schema:

01-add-inference.ru — registers the custom cim.pie ruleset and triggers a full reinference pass so cimr: shortcuts, inverse relations and PropChain3/PropChain4 chains are materialized.
02-add-mridSignificantPart.ru — extracts the stable leading segment of every cim:IdentifiedObject URN so full-text and autocomplete indexes can match mRIDs across CGMES export variants.
03-add-missing-mrid.ru — backfills cim:IdentifiedObject.mRID on resources exported without one.
04-add-WKT.ru — builds GeoSPARQL geo:wktLiteral geometries from the cim:PositionPoint chain, grouping points by cim:Location and sequenceNumber, emitting POINT or LINESTRING as appropriate, and promoting the associated cim:PowerSystemResource to a geo:Feature.
05-delete-redundant-geo.ru — removes all cim:PositionPoint individuals and any leftover geo:asGML/geo:asGeoJSON encodings, leaving the harmonized geo:wktLiteral as the sole geometry representation.
07-enable-geosparql.ru — activates the GraphDB GeoSPARQL plugin so those literals are indexed for spatial joins and bounding-box queries.
08-add-qudt-terms.ru — federates out to the public QUDT SPARQL endpoint and pulls only those units, quantity kinds and related terms referenced by the loaded data into a dedicated https://cim.ucaiug.io/qudt graph, avoiding the cost of mirroring the full QUDT vocabulary.
09-clean-up-crap.ru — strips exporter noise (dcat:Resource, dct:Resource, CGMES Model metadata); removes a spurious dct:description rdfs:domain dcat:Dataset axiom that would otherwise mis-type all of QUDT; drops rdfs:label/skos:altLabel/dct:description values in languages other than English and Norwegian; removes two redundant rdfs:subClassOf cim:PowerSystemResource assertions that the cimr:EquipmentOrContainer hierarchy already covers.
10-add-autocomplete-labels.ru — reconfigures the autocomplete index to match on cim:IdentifiedObject.name, cim:IdentifiedObject.aliasName and cim:CoordinateSystem.crsUrn rather than the default rdfs:label, then enables the index.
11-compute-rank.ru — computes the RDF Rank used downstream by the Visual Graph tool and the chatbot to prefer salient resources when disambiguating names.

Properties of the resulting pipeline

idempotent under --force;
runs against both Statnett RNDP and Graphwise CIM environments without code changes (a parallel entry point load/rndp-load.py adapts the same flow for the RNDP bastion);
produces on either side a repository with:
- the same named-graph layout;
- the same inferred closure (≈ 60 k inferred for ≈ 63.5 k explicit triples on Nordic44, a 1.95× expansion);
- the same GeoSPARQL indexes;
- the same autocomplete configuration.

Integration of Non-Semantic and External Data (T1.3)

To extend the reach and utility of the Knowledge Graph, WP1 integrates data sources that do not exist in semantic form.

Time-Series Integration

The most challenging technical aspect of WP1 is the integration of time-series readings (sensor data, production/consumption fluctuations) with the Knowledge Graph.

Why not store time-series as RDF? Duplicating millions of timestamped values into RDF would be:

prohibitively expensive to store, index and reason over;
a forcing function on the KG to track every new measurement ingested by the SCADA layer.

Architecture: clean separation of concerns.

Knowledge Graph keeps structural and descriptive metadata — what a measurement is:
- which quantity it measures;
- which cim:Terminal and cim:Equipment it is attached to;
- which substation and voltage level contain it;
- its cim:Measurement.measurementType, unit and cim:Analog.positiveFlowIn flow-direction semantics.
Cognite Data Fusion (CDF) — Statnett's operational time-series platform — keeps the high-volume numerical readings.

Stitched together by a deterministic identifier contract:

the cim:IdentifiedObject.mRID of each cim:Analog measurement is reused verbatim as the Cognite external_id;
the cimr:Measurement.isInCognite Boolean flag (materialized at load time, carrying the cims:pragmatics LLM hint introduced in §T1.2) tells the agent exactly which measurements have a corresponding time-series on the Cognite side, so the model never issues a datapoint fetch guaranteed to miss.

Two-step federated workflow at query time — rather than a single SPARQL statement:

SPARQL against GraphDB — translate a natural-language question (e.g. "what is the active-power flow through that 420 kV transformer at Sylling over the last 24 hours?") into the concrete set of mRIDs, units, measurement types and flow-direction flags that answer the structural half of the question.
Cognite tools (described in WP2) — hand those mRIDs to dedicated tools and call the Cognite API directly:
- Retrieve time series — maps each mRID to its Cognite external_id and resolves the series' metadata;
- Retrieve datapoints — fetches raw or aggregated datapoints (min, max, average, step-interpolation) over a user-specified timestamp range;
- user-scoped access enforced end-to-end via OAuth2 On-Behalf-Of (OBO), so the data operation runs under the authenticated user's own CDF permissions rather than a shared service principal.

Benefits of this division of labour:

KG remains a lightweight semantic index — small enough to inference over, serialize into an LLM prompt and visualize in the graph tools;
Cognite does what it is designed for — serving large, appropriately-aggregated numerical slices at interactive latency;
enables genuinely hybrid questions in a single coherent answer, combining:
- topology — "all measurements on equipment connected through this substation via cimr:connectedThroughPart";
- ontology-driven filtering — "only active-power Analogs with positiveFlowIn = true and units in MW";
- time-series aggregation — "averaged over the last week, hourly buckets";
without ever materializing time-series values as RDF triples.

Open Geospatial Data via SPARQL Federation (OpenStreetMap)

Beyond Statnett's own CIM assets, WP1 also experiments with enriching the KG at query time with public geospatial data, rather than by bulk ingestion.

The enabler is twofold:

Statnett side — the 04-add-WKT.ru step of the load pipeline:
- normalizes every cim:Location chain into a GeoSPARQL geo:wktLiteral;
- promotes the associated cim:PowerSystemResource to a geo:Feature;
- so grid objects become directly comparable with any other GeoSPARQL dataset.
OpenStreetMap side — the planet is published as RDF/GeoSPARQL by the Freiburg osm2rdf project and served on the public QLever endpoint https://qlever.dev/api/osm-planet:
- administrative boundaries, rivers, roads and other real-world features are already addressable as triples with WKT geometries.

Mechanism. A single SPARQL SERVICE clause is sufficient to:

fetch (e.g.) the polygon of a Norwegian county or municipality from OSM;
intersect it — via geof:sfIntersects or the indexed geo:sfIntersects magic predicate — with the local CIM geometries in GraphDB;
without duplicating the (very large) OSM corpus into Statnett's repository and without any bespoke ETL or schema-alignment step.

Questions unlocked that the raw CIM cim:PositionPoint data cannot answer on its own:

"which substations and AC line segments sit inside the county of Viken?";
"which power lines pass within a given distance of a lake or a protected area?";
"attach every geolocated grid asset to its administrative territorial hierarchy".

Done on demand, so any update to OSM is immediately reflected in the answers. The approach, the WKT conversion query, the YasGUI-based map visualizations and worked federated queries against OSM are documented in the project blog post Mapping Electrical Resources with GeoSPARQL.

During Work Package 1, effort is split across two separate infrastructure environments where the Talk2PowerSystem project is developed and deployed:

Statnett's side — the Research and Development Platform (RNDP) https://rndp.statnett.no/, where Graphwise assists with the infrastructure and deployment by implementing Helm charts (IaC) for the different services and promoting Helm deployments via Jupyter Hub as a bastion.
Graphwise's own infrastructure — the CIM environment https://cim.ontotext.com/chat/, where Graphwise designs and implements the full infrastructure and deployment process.

CIM Semantic data

CIM Semantic Data Management (T1.2)

Ontology Stack Curation

Subsetting and LLM-Friendly Serialization

Subsetting

LLM-friendly serialization

Combined effect

cimr: — CIM Inferred Extension Ontology

cim.pie — Custom Inference Ruleset

cimd: — CIM Diagrams Ontology

cimd:Diagram — renderable artefacts as first-class resources

cimd:DiagramConfiguration — parametric Visual Graph views

Catalogue generation

Unit, Currency and Vendor Extensions

CIM units, currencies and energy-price terms

New QUDT terms

Alignment fills in external vocabularies

Statnett supplementary vocabulary (CIM17 vendor / TSO extensions)

LLM Guidance via cims:pragmatics

Representative example: cim:Line

Other kinds of guidance carried by the same mechanism

Instance-Data Loading

Pipeline stages

Properties of the resulting pipeline

Integration of Non-Semantic and External Data (T1.3)

Time-Series Integration

Open Geospatial Data via SPARQL Federation (OpenStreetMap)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`cimr:` — CIM Inferred Extension Ontology

`cim.pie` — Custom Inference Ruleset

`cimd:` — CIM Diagrams Ontology

`cimd:Diagram` — renderable artefacts as first-class resources

`cimd:DiagramConfiguration` — parametric Visual Graph views

LLM Guidance via `cims:pragmatics`

Representative example: `cim:Line`