Skip to content

DesignLiquido/xpath

Repository files navigation

xpath

Our XPath implementation in TypeScript.

Current Status

  • XPath 1.0: âś… Fully implemented and tested
  • XPath 2.0/3.0/3.1: âś… Fully implemented, including maps, arrays, and JSON support (99.9% test pass rate)

Features

Core Capabilities

  • Pure TypeScript: Written in strictly typed TypeScript for robustness and ease of use.
  • Supported Versions: Full support for XPath 1.0, 2.0, 3.0, and 3.1.
  • Extensible: Custom function support and XSLT Extensions API.
  • Isomorphic: Runs in Node.js and modern browsers.

XPath 3.1 & JSON Support

  • Maps & Arrays: Native support for XDM Maps and Arrays, fully interoperable with JSON.
  • JSON Functions: parse-json (with liberal mode support), json-to-xml, xml-to-json.
  • Lookup Operator: Drill down into data structures using the ? operator (e.g., $data?users?1?name).
  • Constructors: Create maps and arrays using map { ... } and array { ... } (or square brackets []).

Advanced Expressions

  • Arrow Operator (=>): Chain function calls for cleaner code (e.g., $str => upper-case() => normalize-space()).
  • String Templates: Interpolated strings using backticks (`Hello {$name}`).
  • Inline Functions: Define anonymous functions/lambdas (e.g., function($x) { $x * 2 }).
  • Range Operator: Generate sequences easily with 1 to 10.
  • Control Flow: Support for if/then/else, for, some/every quantifiers, and try/catch.

Schema & Types

  • Schema-Awareness: Full support for validating nodes against XML Schemas and handling typed values.
  • Type Checking: Advanced type system handling instance of, castable as, and treat as.
  • Union Types: Support for complex type definitions and checking.

Performance & Streaming

  • Streaming Evaluation: Capable of processing large documents with low memory footprint using XSLT 3.0 capabilities (posture and sweep analysis).
  • Profiler: Built-in expression profiler to analyze execution time and memory usage.
  • Optimizer: Static analysis tools to suggest query optimizations.

Enhanced Function Library

  • Regular Expressions: Full XPath regex support including flags (i, m, s, x) and analyze-string.
  • Date/Time: Comprehensive duration, date, and time manipulation.
  • Environment: Access system environment variables via fn:environment-variable.
  • Node Functions: Advanced node operations like generate-id, path, innermost, and outermost.
  • Higher-Order Functions: Functional programming with map, filter, fold-left, fold-right, and sort.

Documentation

Comprehensive documentation is available:

Building Documentation

# Generate TypeDoc documentation
yarn docs

# Watch for changes and regenerate
yarn docs:watch

Documentation is automatically built and published to GitHub Pages on every push to the main branch.

Motivation

We maintain another open source package called xslt-processor. The XPath component the project had became impossible to maintain due to a variety of reasons. xslt-processor uses this project as a submodule since its version 4.

This repository is intended to solve a particular problem in our packages, but it can be used by any other NPM package.

Quick Start

import { XPath10Parser, XPathLexer, createContext } from '@designliquido/xpath';

// Create parser and lexer for XPath 1.0
const parser = new XPath10Parser();
const lexer = new XPathLexer('1.0');

// Parse an XPath expression
const tokens = lexer.scan('//book[price > 30]/title');
const expression = parser.parse(tokens);

// Evaluate against your DOM
const context = createContext(documentNode);
const result = expression.evaluate(context);

Choosing XPath Version

This library supports multiple XPath versions. Choose the appropriate parser and lexer configuration based on your needs.

XPath 1.0 (Default)

For XPath 1.0 expressions (XSLT 1.0 compatibility):

import { XPath10Parser, XPathLexer, createContext } from '@designliquido/xpath';

// Explicit version
const lexer = new XPathLexer('1.0');
const parser = new XPath10Parser();

// Or use defaults (both default to 1.0)
const lexer = new XPathLexer();
const parser = new XPath10Parser();

const tokens = lexer.scan('//book[@price > 30]');
const ast = parser.parse(tokens);

XPath 2.0

For XPath 2.0 expressions with conditionals, for expressions, and quantified expressions:

import { XPath20Parser, XPathLexer, createContext } from '@designliquido/xpath';

// IMPORTANT: Use matching versions for lexer and parser
const lexer = new XPathLexer('2.0'); // Recognizes 'if', 'then', 'else', 'for', etc.
const parser = new XPath20Parser();

// if-then-else expressions
const tokens1 = lexer.scan("if ($price > 100) then 'expensive' else 'affordable'");
const ast1 = parser.parse(tokens1);

// for expressions
const tokens2 = lexer.scan('for $x in (1, 2, 3) return $x * 2');
const ast2 = parser.parse(tokens2);

// quantified expressions
const tokens3 = lexer.scan('some $x in //item satisfies $x/@stock > 0');
const ast3 = parser.parse(tokens3);

Using the Factory Function

For automatic parser selection based on version:

import { createXPathParser, XPathLexer } from '@designliquido/xpath';

// Create parser for specific version
const parser10 = createXPathParser('1.0');
const parser20 = createXPathParser('2.0');

// With options
const parser = createXPathParser('1.0', {
    enableNamespaceAxis: true,
});

Lexer Version Differences

The lexer version determines how certain keywords are tokenized:

Keyword XPath 1.0 XPath 2.0
if Identifier (element name) Reserved word
then Identifier (element name) Reserved word
else Identifier (element name) Reserved word
for Identifier (element name) Reserved word
return Identifier (element name) Reserved word
some Identifier (element name) Reserved word
every Identifier (element name) Reserved word

Important: Always match your lexer and parser versions. Using an XPath 1.0 lexer with an XPath 2.0 parser will cause parsing errors for 2.0-specific syntax.

XPath Version Reference

For detailed information about version-specific features and the implementation roadmap, see:

Custom Selectors

You can implement custom selectors by wrapping the XPath parser and lexer. This is useful when you need to integrate XPath with your own DOM implementation.

Basic Implementation

Here's how to create a custom selector class:

import { XPathLexer } from './lexer';
import { XPath10Parser } from './parser';
import { createContext } from './context';
import { XPathNode } from './node';

export class CustomXPathSelector {
    private lexer: XPathLexer;
    private parser: XPath10Parser;
    private nodeCache: WeakMap<YourNodeType, XPathNode> = new WeakMap();

    constructor() {
        // Use XPath 1.0 for most DOM use cases
        this.lexer = new XPathLexer('1.0');
        this.parser = new XPath10Parser();
    }

    public select(expression: string, contextNode: YourNodeType): YourNodeType[] {
        // 1. Tokenize the XPath expression
        const tokens = this.lexer.scan(expression);

        // 2. Parse tokens into an AST
        const ast = this.parser.parse(tokens);

        // 3. Clear cache for each selection
        this.nodeCache = new WeakMap();

        // 4. Convert your node to XPathNode
        const xpathNode = this.convertToXPathNode(contextNode);

        // 5. Create context and evaluate
        const context = createContext(xpathNode);
        const result = ast.evaluate(context);

        // 6. Convert results back to your node type
        return this.convertResult(result);
    }
}

Node Conversion

The key to custom selectors is converting between your DOM nodes and XPathNode format:

private convertToXPathNode(node: YourNodeType): XPathNode {
    // Check cache to avoid infinite recursion
    const cached = this.nodeCache.get(node);
    if (cached) return cached;

    // Filter out attribute nodes (nodeType = 2) from children
    const childNodes = node.childNodes || [];
    const attributes = childNodes.filter(n => n.nodeType === 2);
    const elementChildren = childNodes.filter(n => n.nodeType !== 2);

    // Create XPathNode BEFORE converting children to prevent infinite recursion
    const xpathNode: XPathNode = {
        nodeType: this.getNodeType(node),
        nodeName: node.nodeName || '#document',
        localName: node.localName || node.nodeName,
        namespaceUri: node.namespaceUri || null,
        textContent: node.nodeValue,
        parentNode: null, // Avoid cycles
        childNodes: [], // Will be populated
        attributes: [], // Will be populated
        nextSibling: null,
        previousSibling: null,
        ownerDocument: null
    };

    // Cache BEFORE converting children
    this.nodeCache.set(node, xpathNode);

    // NOW convert children and attributes
    xpathNode.childNodes = elementChildren.map(child =>
        this.convertToXPathNode(child)
    );
    xpathNode.attributes = attributes.map(attr =>
        this.convertToXPathNode(attr)
    );

    return xpathNode;
}

Node Type Mapping

Map your node types to standard DOM node types:

private getNodeType(node: YourNodeType): number {
    if (node.nodeType !== undefined) return node.nodeType;

    // Map node names to standard node types
    switch (node.nodeName?.toLowerCase()) {
        case '#text':
            return 3; // TEXT_NODE
        case '#comment':
            return 8; // COMMENT_NODE
        case '#document':
            return 9; // DOCUMENT_NODE
        case '#document-fragment':
            return 11; // DOCUMENT_FRAGMENT_NODE
        default:
            return 1; // ELEMENT_NODE
    }
}

Result Conversion

Convert XPath results back to your node type:

private convertResult(result: any): YourNodeType[] {
    if (Array.isArray(result)) {
        return result.map(node => this.convertFromXPathNode(node));
    }

    if (result && typeof result === 'object' && 'nodeType' in result) {
        return [this.convertFromXPathNode(result)];
    }

    return [];
}

private convertFromXPathNode(xpathNode: XPathNode): YourNodeType {
    return {
        nodeType: xpathNode.nodeType,
        nodeName: xpathNode.nodeName,
        localName: xpathNode.localName,
        namespaceUri: xpathNode.namespaceUri,
        nodeValue: xpathNode.textContent,
        parent: xpathNode.parentNode ?
            this.convertFromXPathNode(xpathNode.parentNode) : undefined,
        children: xpathNode.childNodes ?
            Array.from(xpathNode.childNodes).map(child =>
                this.convertFromXPathNode(child)) : undefined,
        attributes: xpathNode.attributes ?
            Array.from(xpathNode.attributes).map(attr =>
                this.convertFromXPathNode(attr)) : undefined,
        nextSibling: xpathNode.nextSibling ?
            this.convertFromXPathNode(xpathNode.nextSibling) : undefined,
        previousSibling: xpathNode.previousSibling ?
            this.convertFromXPathNode(xpathNode.previousSibling) : undefined
    } as YourNodeType;
}

Usage Example

const selector = new CustomXPathSelector();

// Select all book elements
const books = selector.select('//book', documentNode);

// Select books with price > 30
const expensiveBooks = selector.select('//book[price > 30]', documentNode);

// Select first book title
const firstTitle = selector.select('//book[1]/title', documentNode);

Key Considerations

  1. Caching: Use WeakMap to cache node conversions and prevent memory leaks
  2. Recursion: Cache nodes BEFORE converting children to avoid infinite loops
  3. Attributes: Filter attributes (nodeType = 2) separately from element children
  4. Null Safety: Handle null/undefined values when converting between node types
  5. Performance: Clear the cache between selections to avoid stale references

For a complete working example, see the XPathSelector implementation in xslt-processor.

XSLT Extensions API

This library provides a pure XPath 1.0 implementation. However, it also includes a clean integration API for XSLT-specific functions, allowing the xslt-processor package (or any other XSLT implementation) to extend XPath with XSLT 1.0 functions like document(), key(), format-number(), generate-id(), and others.

Architecture

The XSLT Extensions API follows a separation of concerns pattern:

  • This package (@designliquido/xpath): Provides type definitions, interfaces, and integration hooks
  • XSLT processor packages: Implement the actual XSLT function logic

This approach keeps the XPath library pure while enabling XSLT functionality through a well-defined extension mechanism.

Key Features

  1. Type Definitions: XSLTExtensions, XSLTExtensionFunction, XSLTFunctionMetadata interfaces
  2. Parser Integration: XPathBaseParser accepts options.extensions parameter
  3. Lexer Support: XPathLexer.registerFunctions() for dynamic function registration
  4. Context Integration: Extension functions receive XPathContext as first parameter

Basic Usage

Here's how to use XSLT extensions (typically done by the xslt-processor package):

import {
    XPath10Parser,
    XPathLexer,
    XSLTExtensions,
    XSLTFunctionMetadata,
    getExtensionFunctionNames,
    XPathContext,
} from '@designliquido/xpath';

// Define XSLT extension functions
const xsltFunctions: XSLTFunctionMetadata[] = [
    {
        name: 'generate-id',
        minArgs: 0,
        maxArgs: 1,
        implementation: (context: XPathContext, nodeSet?: any[]) => {
            const node = nodeSet?.[0] || context.node;
            return `id-${generateUniqueId(node)}`;
        },
        description: 'Generate unique identifier for a node',
    },
    {
        name: 'system-property',
        minArgs: 1,
        maxArgs: 1,
        implementation: (context: XPathContext, propertyName: string) => {
            const properties = {
                'xsl:version': '1.0',
                'xsl:vendor': 'Design Liquido XPath',
                'xsl:vendor-url': 'https://github.com/designliquido/xpath',
            };
            return properties[String(propertyName)] || '';
        },
        description: 'Query XSLT processor properties',
    },
];

// Create extensions bundle
const extensions: XSLTExtensions = {
    functions: xsltFunctions,
    version: '1.0',
};

// Create parser with extensions (XPath 1.0 for XSLT 1.0 compatibility)
const parser = new XPath10Parser({ extensions });

// Create lexer and register extension functions
const lexer = new XPathLexer('1.0');
lexer.registerFunctions(getExtensionFunctionNames(extensions));

// Parse expression
const tokens = lexer.scan('generate-id()');
const expression = parser.parse(tokens);

// Create context with extension functions
const context: XPathContext = {
    node: rootNode,
    functions: {
        'generate-id': xsltFunctions[0].implementation,
        'system-property': xsltFunctions[1].implementation,
    },
};

// Evaluate
const result = expression.evaluate(context);

Extension Function Signature

XSLT extension functions receive the evaluation context as their first parameter:

type XSLTExtensionFunction = (context: XPathContext, ...args: any[]) => any;

This allows extension functions to access:

  • context.node - current context node
  • context.position - position in node-set (1-based)
  • context.size - size of current node-set
  • context.variables - XPath variables
  • context.functions - other registered functions

Available Helper Functions

// Validate extensions bundle for errors
const errors = validateExtensions(extensions);
if (errors.length > 0) {
    console.error('Extension validation errors:', errors);
}

// Extract function names for lexer registration
const functionNames = getExtensionFunctionNames(extensions);
lexer.registerFunctions(functionNames);

// Create empty extensions bundle
const emptyExtensions = createEmptyExtensions('1.0');

XSLT 1.0 Functions

The following XSLT 1.0 functions are designed to be implemented via this extension API:

  1. document() - Load external XML documents
  2. key() - Efficient node lookup using keys
  3. format-number() - Number formatting with patterns
  4. generate-id() - Generate unique node identifiers
  5. unparsed-entity-uri() - Get URI of unparsed entities
  6. system-property() - Query processor properties
  7. element-available() - Check XSLT element availability
  8. function-available() - Check function availability

For detailed implementation guidance, see TODO.md.

Context Extensions

XSLT functions may require additional context data beyond standard XPath context:

const context: XPathContext = {
    node: rootNode,
    functions: {
        'generate-id': generateIdImpl,
        key: keyImpl,
        'format-number': formatNumberImpl,
    },
    // XSLT-specific context extensions
    xsltVersion: '1.0',
    // For key() function
    keys: {
        'employee-id': { match: 'employee', use: '@id' },
    },
    // For document() function
    documentLoader: (uri: string) => loadXmlDocument(uri),
    // For format-number() function
    decimalFormats: {
        euro: { decimalSeparator: ',', groupingSeparator: '.' },
    },
    // For system-property() function
    systemProperties: {
        'xsl:version': '1.0',
        'xsl:vendor': 'Design Liquido',
    },
};

Complete Example

For a complete implementation example, see the test suite at https://github.com/DesignLiquido/xpath/blob/main/tests/xslt-extensions.test.ts, which demonstrates:

  • Creating and validating extension bundles
  • Registering extensions with parser and lexer
  • Implementing sample XSLT functions (generate-id, system-property)
  • End-to-end evaluation with extension functions

API Migration Guide

This section documents changes to the API and how to migrate from older versions.

Migrating to Versioned Parser/Lexer API

Prior versions used abstract or unversioned parser/lexer classes. The new API uses explicit versioned classes for better clarity and type safety.

Parser Migration

// OLD (deprecated):
import { XPathBaseParser } from '@designliquido/xpath';
const parser = new XPathBaseParser(); // Error: XPathBaseParser is abstract

// NEW (recommended):
import { XPath10Parser } from '@designliquido/xpath';
const parser = new XPath10Parser();

// Or use the factory:
import { createXPathParser } from '@designliquido/xpath';
const parser = createXPathParser('1.0');

Lexer Migration

// OLD (may have defaulted to 2.0):
import { XPathLexer } from '@designliquido/xpath';
const lexer = new XPathLexer(); // Was defaulting to '2.0'

// NEW (explicit version, defaults to 1.0):
import { XPathLexer } from '@designliquido/xpath';
const lexer = new XPathLexer('1.0'); // Explicit XPath 1.0

// Or with options object:
const lexer = new XPathLexer({ version: '1.0' });

Breaking Change: Lexer Default Version

Important: The lexer default version has changed from '2.0' to '1.0' for backward compatibility with XPath 1.0/XSLT 1.0 use cases.

If your code relied on the old default and uses XPath 2.0 features, update your lexer instantiation:

// If you were using XPath 2.0 features with the old default:
const lexer = new XPathLexer(); // OLD: defaulted to 2.0

// Update to explicit 2.0:
const lexer = new XPathLexer('2.0');

Quick Reference

Old API New API
new XPathBaseParser() new XPath10Parser() or createXPathParser('1.0')
new XPathBaseParser({ version: '2.0' }) new XPath20Parser() or createXPathParser('2.0')
new XPathLexer() (was 2.0) new XPathLexer('1.0') (now 1.0)
new XPathLexer('2.0') new XPathLexer('2.0') (unchanged)

Compatibility Alias

For gradual migration, XPathParser is available as an alias for XPath10Parser:

import { XPathParser } from '@designliquido/xpath';
const parser = new XPathParser(); // Same as new XPath10Parser()

This alias is deprecated and will be removed in a future major version. Prefer using XPath10Parser directly.

About

Our XPath implementation in TypeScript

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •