A pure Starlark implementation of a BUILD file parser inspired by the Go implementation in bazelbuild/buildtools.
The parser supports only a simplified grammar subset of Starlark applicable to BUILD files (expressions, function calls, literals, etc.). It does not support function definitions (def), loops (for, while), or other control flow statements typically found in *.bzl files.
- Pure Starlark Implementation: Platform-independent and can run anywhere Bazel/Starlark runs
- Repository Rule Compatible: Can be used in repository rules for BUILD file manipulation and analysis
- No External Dependencies: Self-contained parser with no dependencies on native code or external tools
- AST Generation: Produces a structured Abstract Syntax Tree (AST) for BUILD file content
- Fail-Fast Error Handling: Expects correct BUILD file syntax and fails immediately on syntax errors
load("@skyparse//:parser.bzl", "parse", "ast_node_types")
def _repository_rule_impl(repository_ctx):
# Read BUILD file content from the repository
build_content = repository_ctx.read(repository_ctx.attr.build_file)
# Example content:
# cc_library(
# name = "mylib",
# srcs = ["mylib.cc"],
# hdrs = ["mylib.h"],
# )
ast = parse(build_content)
# Process the AST...The parser produces an AST where each node has a nodeType field. The available node types are defined in ast_node_types:
| Node Type | Description | Fields |
|---|---|---|
ROOT |
Root node containing all statements | statements - list of top-level statement nodes |
CALL |
Function call expression | callable - function being calledpositional_args - list of positional argumentskeyword_args - list of keyword arguments (KEY_VALUE nodes) |
IDENT |
Identifier (variable/function name) | name - identifier string |
STRING |
String literal | value - string value |
NUMBER |
Numeric literal | value - numeric value |
LIST |
List literal [...] |
elements - list of element nodes |
DICT |
Dictionary literal {...} |
entries - list of KEY_VALUE nodes |
TUPLE |
Tuple literal (...) |
elements - list of element nodes |
KEY_VALUE |
Dictionary entry or keyword argument | key - key expression/stringvalue - value expression |
BINARY_OP |
Binary operation (e.g., +, -, *) |
left - left operandop - operator stringright - right operand |
UNARY_OP |
Unary operation (e.g., not, -) |
op - operator stringoperand - operand expression |
TERNARY_OP |
Conditional expression x if cond else y |
condition - condition expressiontrue_expr - expression if truefalse_expr - expression if false |
ATTR |
Attribute access obj.attr |
object - object expressionattr - attribute name string |
INDEX |
Index operation obj[index] |
object - object expressionindex - index expression |
PARENTHESIS |
Parenthesized expression (expr) |
expr - wrapped expression |
COMPREHENSION |
List/dict comprehension | element - element expressionloop_var - loop variable(s)iterable - iterable expressioncondition - optional filter condition |
load("@skyparse//:parser.bzl", "parse", "ast_node_types")
content = """
cc_library(
name = "example",
srcs = ["example.cc"],
)
"""
ast = parse(content)
# ast.nodeType == ast_node_types.ROOT
# ast.statements[0].nodeType == ast_node_types.CALL
# ast.statements[0].callable.name == "cc_library"def extract_function_names(ast):
"""Extract all function names called in a BUILD file."""
if ast.nodeType != ast_node_types.ROOT:
fail("Expected ROOT node")
names = []
for stmt in ast.statements:
if stmt.nodeType == ast_node_types.CALL:
if stmt.callable.nodeType == ast_node_types.IDENT:
names.append(stmt.callable.name)
return names
# Example usage
ast = parse('load("@rules_cc//cc:defs.bzl", "cc_library")\ncc_library(name = "foo")')
names = extract_function_names(ast) # ["load", "cc_library"]def find_target_names(ast):
"""Find all 'name' attributes in function calls."""
names = []
for stmt in ast.statements:
if stmt.nodeType == ast_node_types.CALL:
for kwarg in stmt.keyword_args:
if kwarg.key == "name" and kwarg.value.nodeType == ast_node_types.STRING:
names.append(kwarg.value.value)
return names
ast = parse('cc_library(name = "mylib")\ncc_test(name = "mytest")')
target_names = find_target_names(ast) # ["mylib", "mytest"]ast = parse("x = 1 + 2 * 3")Produces AST (operator precedence respected):
ROOT
└── BINARY_OP (=)
├── left: IDENT (x)
└── right: BINARY_OP (+)
├── left: NUMBER (1)
└── right: BINARY_OP (*)
├── left: NUMBER (2)
└── right: NUMBER (3)
ast = parse("srcs = [f + '.cc' for f in files if f != 'main']")Produces AST:
ROOT
└── BINARY_OP (=)
├── left: IDENT (srcs)
└── right: LIST
└── COMPREHENSION
├── element: BINARY_OP (+)
│ ├── left: IDENT (f)
│ └── right: STRING ('.cc')
├── loop_var: IDENT (f)
├── iterable: IDENT (files)
└── condition: BINARY_OP (!=)
├── left: IDENT (f)
└── right: STRING ('main')
Starlark has several limitations compared to full Python that affect parser implementation:
- No
whileloops: Starlark only supportsforloops with finite iterables - No recursion: Recursive function calls are not allowed
- No mutable closures: Nested functions cannot modify variables from outer scopes
To work around these limitations, the parser uses:
- Bounded iteration:
while Trueis emulated throughutils.infinite_loop(), seeinternal/utils.bzlfor details - Explicit call stack: Instead of recursive descent parsing, we maintain an explicit
call_stacklist that simulates function calls - Mutable references: Variables that need to be modified in nested functions use
utils.ref_make()which wraps values in single-element lists to enable mutation (e.g.,index_ref = utils.ref_make(0), then access viautils.ref_get(index_ref))
This is a simplified parser that:
- Grammar subset: Only supports BUILD file syntax (expressions, literals, function calls). Does not support function definitions (
def), loops (for,while),ifstatements, or other control flow found in.bzlfiles - Does not track source positions or line/column numbers
- Does not preserve or track comments
- Focuses on parsing syntax, not validating Bazel semantics
- May not support every edge case of Starlark syntax