Python: Add support for template string literals#20708
Conversation
691a54f to
b6c5b53
Compare
6d2d0eb to
98279f7
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds support for Python 3.14's template string literals (t-strings) as defined in PEP-750. The implementation introduces new AST nodes (TemplateString, JoinedTemplateString, TemplateStringPart) to represent template strings, taking a simpler approach than the existing f-string support.
Key Changes:
- Extended the Tree-sitter scanner and grammar to recognize the 't'/'T' prefix for template strings
- Added new database schema entities for template string AST nodes with proper upgrade/downgrade paths
- Created new QL classes to represent template strings in the library
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
python/ql/lib/upgrades/acf8d3b08ae3cfac8833d16efbfa5a10fef86819/* |
Database upgrade files for adding template string support |
python/ql/lib/semmlecode.python.dbscheme* |
Schema updates defining new template string AST nodes |
python/ql/lib/semmle/python/AstGenerated.qll |
Generated QL classes for template string nodes |
python/ql/lib/semmle/python/AstExtended.qll |
Extended classes for template string functionality |
python/extractor/tsg-python/tsp/src/scanner.cc |
Scanner logic to recognize t-string prefix and handle interpolation |
python/extractor/tsg-python/tsp/grammar.js |
Grammar rules for template string syntax |
python/extractor/tsg-python/src/main.rs |
Rust extractor updates for handling t-string prefixes |
python/extractor/semmle/python/*.py |
Python extractor updates for template string AST nodes |
python/extractor/tests/parser/template_strings_new.* |
Test cases for template string parsing |
python/downgrades/8d257a4a9bc78e39856d6cd33499389fc5148d4f/* |
Downgrade path for removing template string support |
python/ql/lib/change-notes/2025-12-04-support-template-string-literals.md |
Release note documenting the new feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t"" | ||
| if 2: | ||
| t"Hello, {name}!" | ||
| if 3: | ||
| t"Value: {value:.2f}, Hex: {value:#x}" | ||
| if 4: | ||
| t"Just a regular string." | ||
| if 5: | ||
| t"Multiple {first} and {second} placeholders." | ||
| if 6: | ||
| t"Implicit concatenation: " t"Hello, {name}!" t" How are you?" |
There was a problem hiding this comment.
Syntax Error (in Python 3).
| t"" | |
| if 2: | |
| t"Hello, {name}!" | |
| if 3: | |
| t"Value: {value:.2f}, Hex: {value:#x}" | |
| if 4: | |
| t"Just a regular string." | |
| if 5: | |
| t"Multiple {first} and {second} placeholders." | |
| if 6: | |
| t"Implicit concatenation: " t"Hello, {name}!" t" How are you?" | |
| "" | |
| if 2: | |
| f"Hello, {name}!" | |
| if 3: | |
| f"Value: {value:.2f}, Hex: {value:#x}" | |
| if 4: | |
| "Just a regular string." | |
| if 5: | |
| f"Multiple {first} and {second} placeholders." | |
| if 6: | |
| "Implicit concatenation: " + f"Hello, {name}!" + " How are you?" |
There was a problem hiding this comment.
This is a pretty interesting edgecase from Copilot. I would have thought it could figure it out based on the PR description. I wonder if there was some overly verbose comments in this file if it would still have said this?
But it's literally in the filename...
yoff
left a comment
There was a problem hiding this comment.
I think f-strings had their own delimiter at some point, but then we could get away without it. It seems like it would have been more future-proof to keep it :-)
I am actually somewhat OK with template strings being singled out since they are special: They are more complex objects with accessible fields. But I do feel a little bit that we have with this PR classified the existing code as tech-debt, and we should make the grammar more symmetric at some point...
- Extends the scanner with a new token kind representing the start of a template string. This is used to distinguish template strings from regular strings (because only a template string will start with a `_template_string_start` external token). - Cleans up the logic surrounding interpolations (and the method names) so that format strings and template strings behave the same in this case. Finally, we add two new node types in the tree-sitter grammar: - `template_string` behaves like format strings, but is a distinct type (mainly so that an implicit concatenation between template strings and regular strings becomes a syntax error). - `concatenated_template_string` is the counterpart of `concatenated_string`. However, internally, the string parts of a template strings are just the same `string_content` nodes that are used in regular format strings. We will disambiguate these inside `tsg-python`.
Adds three new AST nodes to the mix: - `TemplateString` represents a t-string in Python 3.14 - `TemplateStringPart` represents one of the string constituents of a t-string. (The interpolated expressions are represented as `Expr` nodes, just like f-strings.) - `JoinedTemplateString` represents an implicit concatenation of template strings. Importantly, we _completely avoid_ the complicated construction we currently do for format strings (as well as the confusing nomenclature). No extra injection of empty strings (so that a template string is a strict alternation of strings and expressions). A `JoinedTemplateString` simply has a list of template string children, and a `TemplateString` has a list of "values" which may be either `Expr` or `TemplateStringPart` nodes. If we ever find that we actually want the more complicated interface for these strings, then I would much rather we reconstruct this inside of QL rather than in the parser.
We do the usual thing. Downgrade scripts remove the relevant relations; upgrade scripts do nothing.
Not actually based on any measurements, just the usual 100/1000 stuff.
a35fba1 to
4d45b58
Compare
jketema
left a comment
There was a problem hiding this comment.
Just looked at the join-order fix, which looks sensible.
Extends the parser and libraries to support the new t-string syntax introduced in Python 3.14 (cf. PEP-750)
Due to the complexity of our current handling of f-strings, I opted not to extend the existing f-string support to also handle t-strings. Instead, t-strings are a completely separate (and much simpler) construction.