Skip to content

Time Series

Nikola Tulechki edited this page Apr 29, 2026 · 7 revisions

Time-Series Integration

This page describes how Talk2PowerSystem stitches Statnett's time-series readings (day-ahead energy prices and inter-zonal active-power flows) into the Knowledge Graph (KG), and how the LLM agent uses the resulting setup to answer time-series questions in natural language.

Why not put time-series in RDF?

Duplicating millions of timestamped readings into RDF would be prohibitively expensive to store, index and reason over, and would tie the KG to the cadence of the SCADA / market-data ingestion layer. Instead, Talk2PowerSystem cleanly separates the two stores:

  • The Knowledge Graph holds the structural and descriptive metadata about each measurement — what it is, which equipment / bidding zone / border it is attached to, its measurement type, its unit and its flow-direction semantics.
  • Cognite Data Fusion (CDF) — Statnett's operational time-series platform — holds the high-volume numerical readings.

The two stores are stitched together by a deterministic identifier contract: the cim:IdentifiedObject.mRID of each cim:Analog measurement is reused verbatim as the Cognite external_id (see §Mapping mRID ↔ external_id below for the practical wrinkle). The cimr:Measurement.isInCognite Boolean flag (materialized at load time and carrying a cims:pragmatics hint for the LLM) tells the agent exactly which measurements have a corresponding time-series on the Cognite side, so the model never issues a datapoint fetch that is guaranteed to miss.

Time-series in Cognite

The final list of CDF time-series exposed to the agent is captured in cognite-time-series.ttl26 cim:Analog measurements in two families:

Family Count cim:Measurement.measurementType Unit Bearing resource Description
Power flow 12 ThreePhaseActivePower-Flow-Estimated unit:MegaW (cim:UnitSymbol.W × cim:UnitMultiplier.M) nc:BiddingZoneBorder Aggregated active-power flow per corridor / line in Norway over 10 seconds (CDF naming Elspot ZONE1-ZONE2 MW_agg_value); 7 borders are modelled with cim:Analog.positiveFlowIn = true, 5 with false
Energy price 14 Price-Actual unit:CCY_EUR-PER-MegaW-HR (cim:UnitSymbol.EURperMWh) nc:BiddingZone Day-ahead (Elspot) energy prices for the 14 Nordic / European bidding zones — NO1–NO5, SE1–SE4, DK1, DK2, FI, EE, DE — all in EUR/MWh

The mapping is built from the subset of CDF records that carry an RNDP_mrid metadata field (i.e. for which Statnett added the bare mRID as a second external_id). Two CDF record classes are deliberately not exposed to the agent:

  • the per-zone NOK price feeds (Kraftpriser_*_NOK_13) — deemed obsolete by Statnett, the EUR feeds are authoritative;
  • the 3 currency exchange rates (Valutakurs_EUR_NOK / EUR_SEK / EUR_DKK) — modelled in QUDT (see §Additions to QUDT and the quantitykind:ExchangeRate work below) but not currently linked to a cim:PowerSystemResource.

Power Flow

12 time-series of inter-zonal active-power flow, fed from the 10-second aggregated observations_grid_flow_agg_cor_frq_10s_avro_v3 Kafka topic. Example: power flow on the NO3 ↔ NO4 border.

{
    "external_id": "9bb00faf-0f2f-831a-e040-1e828c94e833_agg_value",
    "name": "Elspot NO3-NO4 MW_agg_value",
    "is_string": false,
    "metadata": {
        "source": "eTerra",
        "topic": "observations_grid_flow_agg_cor_frq_10s_avro_v3",
        "measurement_type": "ThreePhaseActivePower",
        "timeseries_type": "agg_value",
        "mrid": "9bb00faf-0f2f-831a-e040-1e828c94e833",
        "unit": "MW",
        "RNDP_mrid": "9bb00faf-0f2f-831a-e040-1e828c94e833"
    },
    "unit": "MW",
    "asset_id": 7963992857657991,
    "description": "Aggregated flow per corridor/line in Norway over 10 seconds",
    "data_set_id": 718055453943647,
    "id": 544839464297326
}

Price

14 time-series of day-ahead (Elspot) energy prices, one per bidding zone, all in EUR/MWh. Example: energy price for NO1.

{
    "external_id": "Kraftpriser_NO1_13",
    "name": "Kraftpriser_NO1",
    "is_string": false,
    "metadata": {
        "PRIS_TYPE": "PRI7004",
        "mrid": "Kraftpriser_NO1",
        "measurementType": "Price",
        "PRIS_OMRAADE_TYPE": "1.0",
        "PRIS_OMRAADE": "NO1",
        "timeseriesType": "value",
        "table": "MMS.RAPPORT_KRAFTPRIS_ELSPOT_NO",
        "PRIS_VALUTA": "EUR",
        "RNDP_mrid": "7d90224c-ab77-4786-8e08-7f9b275c46da"
    },
    "unit": "EUR per MWh",
    "asset_id": 5759894790406853,
    "description": "Kraftpriser for prisområde NO1",
    "data_set_id": 7519740957219860,
    "id": 1306479619538873
}

The corresponding NOK price feeds (Kraftpriser_NO1_NOK_13 and friends) and the three EUR currency exchange rates (Valutakurs_EUR_NOK / EUR_SEK / EUR_DKK) exist in CDF but do not carry an RNDP_mrid metadata field, so they are not surfaced in the KG and the agent does not see them. The QUDT modelling work for NOK/MWh and the exchange-rate quantity kind was nevertheless completed in anticipation of those feeds being re-enabled — see §Additions to QUDT.

Time-series in RDF

The Cognite metadata above is materialized as cim:Analog nodes in the KG by the cognite-metadata loading step (loaded into the cim:Measurement.isInCognite.graph named graph). Three properties carry the federation contract:

  • The mapping between a KG node and its CDF time-series is done via shared mRIDs (cim:IdentifiedObject.mRID on the KG side, mRID / RNDP_mrid in Cognite metadata).
  • Each cim:Analog is connected to its bearing equipment (a nc:BiddingZone for prices, a nc:BiddingZoneBorder for flows) via cim:Measurement.PowerSystemResource / cim:PowerSystemResource.Measurements.
  • The Boolean cimr:Measurement.isInCognite flag tells the agent that a CDF time-series exists for this measurement.

Example — the day-ahead energy price for NO1 (one of the 14 Price-Actual records in cognite-time-series.ttl):

<urn:uuid:7d90224c-ab77-4786-8e08-7f9b275c46da> a cim:IdentifiedObject, cim:Measurement, cim:Analog ;
  cim:IdentifiedObject.description "Kraftpriser for prisområde NO1" ;
  cim:IdentifiedObject.mRID "7d90224c-ab77-4786-8e08-7f9b275c46da" ;
  cim:IdentifiedObject.name "Kraftpriser_NO1" ;
  qudt:hasUnit unit:CCY_EUR-PER-MegaW-HR ;
  cim:Measurement.PowerSystemResource <urn:uuid:83aa03e5-5fd0-431c-b8dd-acc08c21ed6a> ;
  cim:Measurement.unitSymbol cim:UnitSymbol.EURperMWh ;
  cim:Measurement.measurementType "Price-Actual" ;
  cim:Analog.positiveFlowIn true ;
  cimr:Measurement.isInCognite true .

<urn:uuid:83aa03e5-5fd0-431c-b8dd-acc08c21ed6a> a cim:IdentifiedObject, cim:PowerSystemResource, nc:BiddingZone ;
  cim:IdentifiedObject.name "NO1" ;
  cim:PowerSystemResource.Measurements <urn:uuid:7d90224c-ab77-4786-8e08-7f9b275c46da> .

For power-flow measurements, the connecting resource is a nc:BiddingZoneBorder and cim:Analog.positiveFlowIn carries the direction semantics: true means the flow is from the first to the second zone in the border name, false is the opposite direction. In the final dataset, 7 of the 12 borders are modelled as positiveFlowIn = true and 5 as false.

The cimr:Measurement.isInCognite flag is declared in cimr.ttl and surfaces a cims:pragmatics hint that is injected into the LLM system prompt:

cimr:Measurement.isInCognite a owl:DatatypeProperty ;
   rdfs:label "Measurement is in Cognite" ;
   rdfs:comment "Flag indicating if a cim:Analog measurement represents a Cognite timeseries" ;
   rdfs:domain cim:Analog ;
   rdfs:range xsd:boolean ;
   cims:pragmatics """Use the mRID of this measurement to query a timeseries in the Cognite API by "external_id" """ ;
   rdfs:isDefinedBy cimr: .

Companion pragmatics for cim:Measurement.measurementType and cim:Analog.positiveFlowIn further constrain the agent's SPARQL generation. The pragmatics enumerate four possible values for measurementType (CurrencyExchange-Actual, Price-Actual, ThreePhaseActivePower, ThreePhaseActivePower-Flow-Estimated); only the two highlighted in the table above (Price-Actual and ThreePhaseActivePower-Flow-Estimated) appear in the current dataset.

List of all time-series in the KG

PREFIX cimr: <https://cim.ucaiug.io/rules#>
PREFIX cim: <https://cim.ucaiug.io/ns#>
SELECT * WHERE {
    ?x a cim:Analog ;
       cimr:Measurement.isInCognite true ;
       cim:Measurement.measurementType ?measurementType ;
       cim:IdentifiedObject.name ?name ;
       cim:IdentifiedObject.description ?desc .
}

Run on cim.ontotext.com

Mapping mRID ↔ external_id

The architectural contract is "the mRID is the Cognite external_id". In practice the existing CDF time-series were originally created with non-mRID external_ids (e.g. Kraftpriser_NO3_NOK_13, 9bb00faf-…_estimated_value). To honour the contract without breaking historical references, Statnett added the bare mRID to each time-series as a second external_id (CDF supports multiple external IDs per time-series), surfaced in the time-series metadata as RNDP_mrid. A dump of the mapping is committed to the repo as cognite-time-series-mrid-mapping.csv:

external_id,mrid
Kraftpriser_NO3_NOK_13,3d972481-ddff-4e7a-bec7-d581a6e4b36a
9bb00faf-0f2f-831a-e040-1e828c94e833_estimated_value,9bb00faf-0f2f-831a-e040-1e828c94e833
Valutakurs_EUR_SEK,b6e55917-e9cf-4559-ad8f-548efea41588

This is what the Retrieve Time Series tool actually filters on — see §Cognite query tools.

Additions to QUDT

To represent monetary units cleanly, several fractional units were contributed to the QUDT ontology, e.g. "Euro per Megawatt Hour":

unit:CCY_EUR-PER-MegaW-HR a qudt:Unit ;
    qudt:applicableSystem sou:SI ;
    qudt:conversionMultiplier 0.000000000277777777777777777777777777778 ;
    qudt:hasDimensionVector qkdv:A0E0L-2I0M-1H0T2D0 ;
    qudt:hasFactorUnit [ qudt:exponent -1 ; qudt:hasUnit unit:HR ] ;
    qudt:hasFactorUnit [ qudt:exponent -1 ; qudt:hasUnit unit:MegaW ] ;
    qudt:hasFactorUnit [ qudt:exponent  1 ; qudt:hasUnit unit:CCY_EUR ] ;
    qudt:hasQuantityKind quantitykind:CostPerEnergy ;
    qudt:plainTextDescription "Unit for measuring the cost of electricity." ;
    qudt:symbol "€/(MW·h)" ;
    rdfs:label "Euro per Megawatt Hour"@en .

and "Danish Krone per Euro":

unit:CCY_DKK-PER-CCY_EUR a qudt:Unit ;
    qudt:hasQuantityKind quantitykind:ExchangeRate ;
    qudt:hasFactorUnit [ qudt:hasUnit unit:CCY_DKK ; qudt:exponent  1 ] ;
    qudt:hasFactorUnit [ qudt:hasUnit unit:CCY_EUR ; qudt:exponent -1 ] ;
    rdfs:label "Danish Krone per Euro" ;
    qudt:symbol "DKK/EUR" ;
    qudt:plainTextDescription "Exchange rate of Danish Krone to Euros" .

A new quantity kind, quantitykind:ExchangeRate, was added to host the currency-pair fractions (NOK/EUR, SEK/EUR, DKK/EUR):

quantitykind:ExchangeRate a qudt:QuantityKind ;
    qudt:applicableUnit unit:CCY_NOK-PER-CCY_EUR ;
    qudt:applicableUnit unit:CCY_SEK-PER-CCY_EUR ;
    qudt:applicableUnit unit:CCY_DKK-PER-CCY_EUR ;
    qudt:hasDimensionVector qkdv:A0E0L0I0M0H0T0D1 ;
    qudt:informativeReference "https://en.wikipedia.org/wiki/Exchange_rate"^^xsd:anyURI ;
    qudt:plainTextDescription "Rate at which one currency will be exchanged for another currency" ;
    rdfs:label "Exchange rate"@en .

Two-step federated query workflow

At query time, the agent answers a time-series question with a two-step federated workflow rather than a single SPARQL statement:

  1. SPARQL against GraphDB — translate the natural-language question into the concrete set of mRIDs, units, measurement types and flow-direction flags that answer the structural half of the question. Typical filters used by the agent:

    ?m a cim:Analog ;
       cimr:Measurement.isInCognite true ;
       cim:Measurement.measurementType "Price-Actual" ;
       cim:Measurement.PowerSystemResource ?zone ;
       cim:IdentifiedObject.mRID ?mrid ;
       qudt:hasUnit unit:CCY_EUR-PER-MegaW-HR .
  2. Cognite tools — hand the resulting mRIDs to the Cognite tools, which call the CDF API directly to fetch the actual datapoints, with user-scoped access enforced end-to-end via the OAuth2 On-Behalf-Of (OBO) flow.

This division of labour keeps the KG a lightweight semantic index — small enough to inference over and serialize into an LLM prompt — while CDF does what it is designed for: serving large, appropriately-aggregated numerical slices at interactive latency.

Cognite query tools

Two tools, defined in Talk2PowerSystem_LLM/src/talk2powersystemllm/tools/cognite/, are wired into the agent whenever a cognite: block is present in the agent config (see agent.py and config/dev+cognite.yaml). Both wrap the official Cognite Python SDK via a small CogniteSession helper that handles authentication and proactive token refresh.

retrieve_time_series

Source — bridges the KG and CDF by mapping mRIDs to the Cognite external_ids required to fetch datapoints.

  • Filter: restricts results to time-series whose metadata has the RNDP_mrid field, i.e. exactly the 39 time-series listed in cognite-time-series-mrid-mapping.csv.
  • Arguments:
    • rndp_mrid — a single mRID or a list of mRIDs to look up;
    • limit — defaults to 25; -1 removes the limit.
  • Returns: the matching TimeSeries objects (including external_id, name, unit, metadata, …), which the agent then feeds into retrieve_data_points.

retrieve_data_points

Source — fetches raw or aggregated datapoints for one or more time-series identified by their external_id.

  • Arguments:
    • external_id — single string or list;
    • start, end — ISO-8601 timestamps in UTC (e.g. 2025-06-04T14:30:30Z) or relative shorthand (3w-ago, 3h-ahead, now);
    • aggregates — one of the CDF aggregates (average, min, max, count, sum, stepInterpolation, …);
    • granularity — bucket size, e.g. 30s, 5m, 1h, 1d, 1w, 1mo;
    • limit — only used for raw (non-aggregated) queries.

The tool's argument schema includes detailed examples that the LLM uses verbatim when binding user-supplied dates and aggregation requests.

Authentication and session management

CogniteSession (source) supports four mutually exclusive authentication modes — exactly one must be configured under tools.cognite.* in the agent YAML config:

Mode Config key When to use
Interactive (browser pop-up) interactive_client_id + tenant_id Local dev / Jupyter
Service principal (client credentials) client_id + client_secret + tenant_id Service deployments (e.g. cim.ontotext.com)
Token from file token_file_path RNDP / shared bastion deployments — the JWT is auto-refreshed when it has < 60 s left to expiry
On-Behalf-Of (OBO) obo_client_secret Production chatbot — at request time the user's incoming token is exchanged for a CDF token, so every CDF call runs under the user's own permissions

In OBO mode the tools are not instantiated at startup; instead Talk2PowerSystemAgentFactory.get_agent(cognite_obo_token=...) builds a fresh CogniteSession per request from the user's OBO token (see agent.py:411). In all other modes the tools are bound once at agent construction.

Competency questions

Example time-series questions the agent can answer end-to-end (the first three are wired into the chatbot UI starter-question panel — see questions.yaml):

  • "Energy prices in EUR for NO1; for the second half of 2024, monthly average and standard deviation"
  • "Power flow from NO1 to NO3; for 2025, weekly average, min, max"
  • "Hi, I would like to get the maximum power price for each day in March 2026, for the price area NO1 and NO2. Listed as a table"
  • "Power flow from NO3 to NO1; for 2025, weekly average, min, max" — exercises the cim:Analog.positiveFlowIn = false direction (NO1-NO3 is one of the 5 borders modelled with false).
  • "Energy prices for NO1; for the second half of 2024, monthly average and standard deviation" — currency unspecified; the agent should default to EUR, since the NOK feeds are obsolete and not loaded into the KG.
  • "Energy prices in NOK for NO1; for the second half of 2024, monthly average and standard deviation" — out-of-coverage probe: the agent should report that no NOK price feed is available, not silently fall back to EUR or hallucinate.
  • "What is the EIC code of zone NO1; and its energy price for last year (monthly average)" — combines a pure-KG lookup (EIC code) with a CDF aggregation in a single answer.
  • "Give me the percentage of time for the past month where the voltage readings on the line ARENDAL–KRISTIANSAND are above its operational limit" — aspirational: combines KG (operational limits, line topology) with CDF (datapoints); exercises the path but is currently out of coverage as the dataset only contains inter-zonal flows and zone prices, not per-line voltage telemetry.

See also issue #117 for the full evaluation set.

Related material

Issues

Clone this wiki locally