Talk2PowerSystem Chatbot

General Overview

The Talk2PowerSystem chatbot is implemented using the LangChain ReAct (Reasoning + Acting) agent framework in conjunction with the ttyg-langgraph library. LangChain is a widely-adopted open-source Python platform designed for the rapid development of Large Language Model (LLM) applications. A core advantage of this implementation is its LLM-agnostic design, which facilitates seamless transitions between diverse commercial services and local open-source models, as well as various execution APIs (e.g., OpenAI Completions API vs. OpenAI Responses API).

The agent system prompt defines its role and a reasoning flow including general guidance of the available tools usage. The simplified ontology schema, resulting from the simplification procedure, is read from a Turtle-formatted file and injected into the agent’s system instructions. This ensures that the agent's generative process is grounded in the specific classes and properties present in the target knowledge graph.

The chatbot utilizes the LangGraph Redis Checkpoint to manage conversation-based memory, but it doesn't support history.

To navigate and resolve complex power system queries, the agent utilizes a specialized suite of tools:

SPARQL Query Tool: Executes agent-generated SPARQL queries against the knowledge graph.
Autocomplete Search Tool: Uses the GraphDB Autocomplete index to search by name and class for Internationalized Resource Identifiers (IRIs) of named entities mentioned in the users' questions.
N-shot Tool: Implements a dynamic In-Context Learning (ICL) mechanism. Upon receiving a user question, the tool retrieves semantically similar questions, indexed in a vector store, and their corresponding SPARQL queries, and provides them as examples to the LLM agent.
Display Graphics Tool: Generates dynamic links to pre-generated PowSyBl SVG diagrams or GraphDB Visual Graphs, enabling interactive visual feedback in the chatbot UI.
Cognite Query Tools:
- Retrieve Time Series Tool: Retrieves one or more time series from Cognite and facilitates the mapping of the Master Resource Identifiers (mRIDs) for the analog measurements (cim:Analog) from the knowledge graph to the external_id, which is required by Cognite in order to retrieve datapoints.
- Retrieve Datapoints Tool: Retrieves datapoints from Cognite for one or more time series, supporting temporal filtering and statistical aggregation.
Now Tool: Provides the agent with the current timestamp to anchor and resolve time-relative user queries.

History vs memory

History keeps all messages between the user and AI intact. History is what the user sees in the UI. It represents what was actually said. Memory keeps some information, which is presented to the LLM to make it behave as if it "remembers" the conversation. Memory is quite different from history. Depending on the memory algorithm used, it can modify history in various ways: evict some messages, summarize multiple messages, summarize separate messages, remove unimportant details from messages, inject extra information (e.g., for RAG) or instructions (e.g., for structured outputs) into messages, and so on. Memory can also be shared between users.

Tools

SPARQL Query Tool

The SPARQL Query Tool orchestrates the execution of agent-generated queries against the knowledge graph.

To mitigate common LLM failures, such as namespace hallucination or syntax errors, the tool implements a multi-stage pre-execution validation and sanitization pipeline. Before transmitting a query to the GraphDB server, the following operations are performed:

Syntax and Operation Validation: The query is parsed to verify compliance with SPARQL syntax and to enforce security constraints, restricting execution to read-only operations (SELECT, ASK, CONSTRUCT, or DESCRIBE).
Namespace Reconciliation: The tool dynamically retrieves the authoritative namespaces for the repository via the GraphDB /repositories/{repositoryID}/namespaces endpoint.
- If the generated query contains prefix definitions that conflict with the repository's known namespaces, they are automatically reconciled to the correct IRI.
- If the query utilizes prefixes that are not explicitly defined but exist in the repository's namespace registry, the tool automatically injects the requisite PREFIX headers.
IRI Existence Verification: To ensure the query doesn't contain non-existing IRIs, the tool verifies that all included IRIs are present in the repository using the high-performance http://www.ontotext.com/owlim/entity#id predicate. A whitelist of exceptions is maintained for standard system and plugin namespaces, including XML Schema (http://www.w3.org/2001/XMLSchema#), Graphwise plugins (http://www.ontotext.com/owlim/RDFRank#, http://www.ontotext.com/plugins/autocomplete#, http://www.ontotext.com/connectors/, http://www.ontotext.com/fts, http://www.ontotext.com/describe/outgoing), and standard RDF frameworks (http://www.openrdf.org/schema/sesame#, http://spinrdf.org/spif#). For IRIs starting with these namespaces, this verification is omitted.

Source code: https://github.com/Ontotext-AD/ttyg-langgraph/blob/main/ttyg/tools/graphdb_tools/sparql_query_tool.py

Autocomplete Search Tool

The Autocomplete Search Tool enables the agent to resolve mentions of named entities to their corresponding IRIs. This component facilitates entity linking by interfacing with the GraphDB Autocomplete index, allowing the agent to perform targeted searches based on entity labels and class types.

The index is constructed using a specific subset of CIM properties that hold naming and identifier metadata:

cim:IdentifiedObject.name
cim:IdentifiedObject.aliasName
cim:CoordinateSystem.crsUrn

Within the CIM ontology, approximately 61% of instances inherit from cim:IdentifiedObject, making this index highly effective for primary entity discovery. However, certain resources lack explicit naming properties, including enumerations such as cim:Currency and "value objects" like cim:DiagramObjectPoint, cim:SvStatus, and cim:SvPowerFlow. To resolve enumerations, the agent relies on the ontology schema definition provided during initialization, which contains a complete export of enumeration values. For value objects, explicit reference in user queries is not anticipated, and they are typically accessed via their relationships to named entities.

Source code: https://github.com/Ontotext-AD/ttyg-langgraph/blob/main/ttyg/tools/graphdb_tools/autocomplete_search_tool.py

N-Shot Tool

The N-Shot Tool provides the LLM agent with dynamically retrieved SPARQL query examples to improve generation accuracy. Rather than including a static set of examples in the system prompt, which consumes valuable context window space and may include irrelevant patterns, this tool fetches only the most semantically similar question-query pairs from the Q&A dataset at runtime. This modular approach ensures the agent receives relevant examples without exceeding token limits. The tool's architecture is designed for extensibility; while currently focused on SPARQL, the same retrieval mechanism can be applied to other domain-specific queries, such as Cognite queries.

To increase the retrieval accuracy and the generalization of the examples, the tool utilizes a parametrization strategy. Natural language questions are stripped of specific entities and literal values, and are replaced by placeholders. For instance, the question:

Which linear shunt compensators in HALDEN have a susceptance per section value of -0.0055238095?

is transformed into the parametrized form:

Which linear shunt compensators in <<<0, cim:Substation>>> have a susceptance per section value of <<<float>>>?

At runtime, the agent is instructed to parametrize the user's input before calling the tool. This ensures that the vector search is based on the logical structure of the query rather than specific instances. Furthermore, the agent is prompted to include class hierarchy information in the search, acknowledging that SPARQL patterns for a class C are inherently applicable to its subclasses.

The Q&A dataset is partitioned into train, dev, and test splits. The paraphrases from train and dev are serialized as RDF (using this code), and loaded in GraphDB. These paraphrases are indexed into Weaviate vector store using the ChatGPT Retrieval GraphDB Connector (with this connector configuration).

While the original ChatGPT Retrieval Plugin has been deprecated, the system utilizes a custom Graphwise implementation that supports open-source LLM embedding models. The tool's retrieval logic supports configurable parameters, including a limit (defaulting to 5) and a similarity score threshold to ensure only high-confidence examples are provided to the LLM agent.

The source code of the tool is here, example configuration is given here.

Display Graphics tool

The chatbot UI can display the static pre-generated PowSyBl SVG electrical diagrams and embedded GraphDB Visual Graphs.

The tool returns content and artifact. The content describes the diagram using text, which includes the diagram name and optionally the description, the diagram kind and the node IRI (in case of saved GraphDB configurations). The artifact describes the diagram to be rendered including the link to it. If the diagram format is different from the known ones (image/svg+xml for PowSyBl and text/html for Visual Graph), the content is Found a diagram with unknown format {format}. Can't render it! and the artifact is None. In case the diagram doesn't exist, the content is No diagram found and the artifact is None.

The tool has the following arguments:

diagram_iri - for PowSyBl SVG electrical diagrams or Saved GraphDB Visual Graphs
diagram_configuration_iri and node_iri - for advanced GraphDB Visual Graph configurations

It uses the following SPARQL template to fetch the data from the knowledge graph:

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX cimd: <https://cim.ucaiug.io/diagrams#>
PREFIX cim: <https://cim.ucaiug.io/ns#>
SELECT ?link ?name ?format ?description ?kind {
    <{iri}> cimd:Diagram.link|cimd:DiagramConfiguration.link ?link;
        cim:IdentifiedObject.name ?name;
        dct:format ?format.
    OPTIONAL {
        <{iri}> cimd:Diagram.kind / rdfs:label ?kind
    }
    OPTIONAL {
        <{iri}> cim:IdentifiedObject.description ?description
    }
}

Source code: https://github.com/statnett/Talk2PowerSystem_LLM/blob/main/src/talk2powersystemllm/tools/graphics_tool.py

Cognite Query Tools

Retrieve Time Series Tool

The Retrieve Time Series Tool utilizes the Cognite Python SDK to bridge the gap between the knowledge graph and the telemetry platform. Its primary purpose is to facilitate the mapping, performed by the LLM agent, of the mRIDs associated with analog measurements (cim:Analog) to their corresponding Cognite external_id. This mapping is essential, as the external_id is the required identifier for fetching specific datapoints from the Cognite platform.

The tool retrieves time series having metadata "RNDP_mrid" field. It also accepts an mrid argument (a single identifier or a list of multiple mRIDs) to filter the time series based on the metadata "RNDP_mrid" field. Results can be managed via the limit parameter; this defaults to 25, while a value of -1 removes the limit.

Source code: https://github.com/statnett/Talk2PowerSystem_LLM/blob/main/src/talk2powersystemllm/tools/cognite/retrieve_time_series.py

Retrieve Datapoints Tool

The Retrieve Datapoints Tool interfaces with the Cognite Python SDK to extract datapoints from one or more time series based on their external_id. The tool supports granular data retrieval through several key arguments:

start and end: Used to filter the datapoints based on specific timestamps.
aggregates and granularity: Allow for the retrieval of statistical aggregation (e.g., average, sum, etc.).
limit: Constrains the number of returned results, applicable only to non-aggregate queries.

Source code: https://github.com/statnett/Talk2PowerSystem_LLM/blob/main/src/talk2powersystemllm/tools/cognite/retrieve_data_points.py

Now Tool

The Now Tool provides the agent with the current timestamp in ISO 8601 format to facilitate temporal grounding for user queries. This is essential for resolving questions that involve relative time references (e.g., "last hour" or "yesterday") based on the moment the query is submitted. Depending on the execution environment, the tool sources the timestamp either from the chatbot UI headers in REST REST:2026 mode or from the local server environment when operating in Jupyter mode.

Source code: https://github.com/statnett/Talk2PowerSystem_LLM/blob/main/src/talk2powersystemllm/tools/now_tool.py

Talk2PowerSystem Chatbot

General Overview

History vs memory

Tools

SPARQL Query Tool

Autocomplete Search Tool

N-Shot Tool

Display Graphics tool

Cognite Query Tools

Retrieve Time Series Tool

Retrieve Datapoints Tool

Now Tool

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally