From 04613d02286c2ee285f738b620b7fb263cc3d1ac Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 19 Jan 2026 04:41:23 +0000 Subject: [PATCH 1/4] Initial plan From 885b17cff993ccad85e15816f8b271c2ce5d7f19 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 19 Jan 2026 04:44:50 +0000 Subject: [PATCH 2/4] Add advanced features documentation page Co-authored-by: jpmccu <602385+jpmccu@users.noreply.github.com> --- docs/advanced.md | 263 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 263 insertions(+) create mode 100644 docs/advanced.md diff --git a/docs/advanced.md b/docs/advanced.md new file mode 100644 index 0000000..ef05695 --- /dev/null +++ b/docs/advanced.md @@ -0,0 +1,263 @@ +# Advanced Features + +SETLr provides several advanced capabilities beyond basic CSV-to-RDF transformation. This guide covers specialized features for working with large XML files, custom Python code, SPARQL endpoints, and SHACL validation. + +## Overview + +- **[Streaming XML with XPath](#streaming-xml)** - Efficiently process large XML files with XPath filtering +- **[Python Functions in Transforms](#python-functions)** - Execute custom Python code within transforms +- **[SPARQL Support](#sparql-support)** - Load RDF to SPARQL endpoints +- **[SHACL Validation](#shacl-validation)** - Validate output RDF against SHACL shapes + +## Streaming XML with XPath {#streaming-xml} + +For large XML files that don't fit in memory, SETLr provides streaming XML parsing with XPath filtering. + +### Key Features + +- **Memory Efficient**: Uses incremental parsing (iterparse) to process one element at a time +- **XPath Filtering**: Extract only the elements you need +- **Progress Tracking**: Shows progress bar for long-running operations +- **DTD Validation**: Optional validation against document DTD + +### Quick Example + +```turtle +@prefix setl: . +@prefix prov: . +@prefix : . + +:xmlTable a setl:Table ; + setl:xpath "//book" ; # Extract only elements + prov:wasGeneratedBy [ + a setl:Extract ; + prov:used ; + ] . +``` + +This extracts only `` elements from the XML, ignoring all other elements and reducing memory usage. + +### When to Use + +- XML files larger than 100 MB +- Files with thousands of elements +- Limited memory environments +- Need to extract specific elements from complex XML + +**→ [Full Streaming XML Documentation](streaming-xml.md)** + +## Python Functions in Transforms {#python-functions} + +Execute custom Python code within JSLDT transforms for complex processing, graph manipulation, and post-processing. + +### Key Features + +- **Graph Access**: Direct access to the RDF graph being generated +- **Post-Processing**: Add computed triples, aggregates, and statistics +- **Validation**: Check generated RDF for correctness +- **Custom Logic**: Execute arbitrary Python code + +### Quick Example + +```turtle +@prefix setl: . +@prefix prov: . +@prefix void: . +@prefix : . + +:enrichedGraph a void:Dataset ; + prov:wasGeneratedBy [ + a setl:Transform, setl:JSLDT ; + prov:used :dataTable ; + prov:used [ + a setl:PythonScript ; + prov:value ''' +# Variables available: graph, setl_graph +from rdflib.namespace import RDF + +# Count triples by type +types = {} +for s, p, o in graph.triples((None, RDF.type, None)): + types[str(o)] = types.get(str(o), 0) + 1 + +print("Generated triples by type:") +for t, count in sorted(types.items()): + print(f" {t}: {count}") +''' + ] ; + prov:value '''[{ + "@id": "http://example.com/{{row.ID}}", + "@type": "http://example.com/Item" + }]''' ; + ] . +``` + +### When to Use + +- Computing aggregates or statistics after transformation +- Adding cross-references between generated entities +- Validating generated RDF structure +- Complex logic not easily expressed in JSLDT templates + +⚠️ **Security Warning**: Python scripts execute with full system access. Only run trusted SETL scripts. + +**→ [Full Python Functions Documentation](python-functions.md)** + +## SPARQL Support {#sparql-support} + +Load transformed RDF directly to SPARQL endpoints for integration with triple stores and semantic web applications. + +### Key Features + +- **Direct Loading**: Send RDF to SPARQL UPDATE endpoints +- **Integration**: Works with Fuseki, GraphDB, Blazegraph, etc. +- **SPARQL Service Description**: Uses standard W3C vocabulary + +### Quick Example + +```turtle +@prefix setl: . +@prefix prov: . +@prefix sd: . +@prefix : . + +# Transform data (see previous examples) +:myGraph a void:Dataset ; + prov:wasGeneratedBy [ + a setl:Transform, setl:JSLDT ; + # ... transform details ... + ] . + +# Load to SPARQL endpoint +:sparql_load a setl:Load, sd:Service ; + sd:endpoint ; + prov:used :myGraph . +``` + +### Configuration + +The SPARQL endpoint URL should point to the UPDATE endpoint: + +- **Fuseki**: `http://localhost:3030/dataset/update` +- **GraphDB**: `http://localhost:7200/repositories/repo/statements` +- **Blazegraph**: `http://localhost:9999/blazegraph/namespace/kb/sparql` + +### When to Use + +- Loading data into semantic web applications +- Integration with existing triple stores +- Building knowledge graphs +- Creating linked data services + +### Authentication + +For endpoints requiring authentication, use HTTP authentication in the URL or configure credentials in your environment. + +## SHACL Validation {#shacl-validation} + +Validate transformed RDF against SHACL (Shapes Constraint Language) shapes to ensure data quality and conformance to schemas. + +### Key Features + +- **W3C Standard**: Uses SHACL specification for validation +- **Pre-Load Validation**: Checks RDF before loading to files or endpoints +- **Detailed Reports**: Shows which constraints failed +- **Schema Enforcement**: Ensure data meets required structure + +### Quick Example + +Create SHACL shapes file (`shapes.ttl`): + +```turtle +@prefix sh: . +@prefix ex: . +@prefix foaf: . + +ex:PersonShape + a sh:NodeShape ; + sh:targetClass foaf:Person ; + sh:property [ + sh:path foaf:name ; + sh:minCount 1 ; + sh:datatype xsd:string ; + ] ; + sh:property [ + sh:path foaf:mbox ; + sh:maxCount 1 ; + sh:nodeKind sh:IRI ; + ] . +``` + +Run SETLr with validation: + +```bash +setlr transform.setl.ttl --rdf-validation shapes.ttl +``` + +### Validation Process + +1. SETL transform executes and generates RDF +2. Generated RDF is validated against SHACL shapes +3. Validation report is generated +4. If validation passes, RDF is loaded +5. If validation fails, warnings are shown but loading continues + +### When to Use + +- Enforcing data quality standards +- Ensuring schema conformance +- Catching transformation errors early +- Documenting expected RDF structure + +### Common Shape Constraints + +**Required Properties:** +```turtle +sh:property [ + sh:path foaf:name ; + sh:minCount 1 ; # Required +] ; +``` + +**Data Types:** +```turtle +sh:property [ + sh:path schema:age ; + sh:datatype xsd:integer ; +] ; +``` + +**Value Ranges:** +```turtle +sh:property [ + sh:path schema:age ; + sh:minInclusive 0 ; + sh:maxInclusive 150 ; +] ; +``` + +**Pattern Matching:** +```turtle +sh:property [ + sh:path foaf:mbox ; + sh:pattern "^mailto:" ; +] ; +``` + +### Installation + +SHACL validation requires the `pyshacl` package: + +```bash +pip install setlr[validation] +# or +pip install pyshacl[js] +``` + +## See Also + +- [Tutorial](tutorial.md) - Step-by-step guide to SETLr basics +- [JSLDT Template Language](jsldt.md) - Template syntax reference +- [Python API](python-api.md) - Using SETLr from Python code +- [CLI Reference](cli.md) - Command-line options and usage +- [Examples](examples.md) - Complete working examples From ab7cd1b1e9655ec34bcd94fbe64b02f0ec744f87 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 19 Jan 2026 04:46:16 +0000 Subject: [PATCH 3/4] Fix missing prefix declarations in advanced.md code examples Co-authored-by: jpmccu <602385+jpmccu@users.noreply.github.com> --- docs/advanced.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/advanced.md b/docs/advanced.md index ef05695..f163613 100644 --- a/docs/advanced.md +++ b/docs/advanced.md @@ -118,6 +118,7 @@ Load transformed RDF directly to SPARQL endpoints for integration with triple st ```turtle @prefix setl: . @prefix prov: . +@prefix void: . @prefix sd: . @prefix : . @@ -170,6 +171,7 @@ Create SHACL shapes file (`shapes.ttl`): ```turtle @prefix sh: . +@prefix xsd: . @prefix ex: . @prefix foaf: . From 91839b43c38f4be528223b773dc5155bc826e462 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 19 Jan 2026 04:47:35 +0000 Subject: [PATCH 4/4] Add missing prefix declarations to all SHACL constraint examples Co-authored-by: jpmccu <602385+jpmccu@users.noreply.github.com> --- docs/advanced.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/advanced.md b/docs/advanced.md index f163613..4858854 100644 --- a/docs/advanced.md +++ b/docs/advanced.md @@ -215,6 +215,9 @@ setlr transform.setl.ttl --rdf-validation shapes.ttl **Required Properties:** ```turtle +@prefix sh: . +@prefix foaf: . + sh:property [ sh:path foaf:name ; sh:minCount 1 ; # Required @@ -223,6 +226,10 @@ sh:property [ **Data Types:** ```turtle +@prefix sh: . +@prefix schema: . +@prefix xsd: . + sh:property [ sh:path schema:age ; sh:datatype xsd:integer ; @@ -231,6 +238,9 @@ sh:property [ **Value Ranges:** ```turtle +@prefix sh: . +@prefix schema: . + sh:property [ sh:path schema:age ; sh:minInclusive 0 ; @@ -240,6 +250,9 @@ sh:property [ **Pattern Matching:** ```turtle +@prefix sh: . +@prefix foaf: . + sh:property [ sh:path foaf:mbox ; sh:pattern "^mailto:" ;