-
-
Notifications
You must be signed in to change notification settings - Fork 350
feat: add C language support via tree-sitter-c #433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6c63bdd
6d41f4c
4a84211
74e10f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -97,6 +97,27 @@ def _rust_file_to_module(file_path: Path, repo_root: Path) -> list[str]: | |
| return [] | ||
|
|
||
|
|
||
| def _c_unwrap_declarator(declarator: Node | None) -> Node | None: | ||
| while declarator and declarator.type == cs.CppNodeType.POINTER_DECLARATOR: | ||
| declarator = declarator.child_by_field_name(cs.FIELD_DECLARATOR) | ||
| return declarator | ||
|
|
||
|
|
||
| def _c_get_name(node: Node) -> str | None: | ||
| if node.type in cs.C_NAME_NODE_TYPES: | ||
| name_node = node.child_by_field_name(cs.FIELD_NAME) | ||
| if name_node and name_node.text: | ||
| return name_node.text.decode(cs.ENCODING_UTF8) | ||
| elif node.type == cs.TS_CPP_FUNCTION_DEFINITION: | ||
| declarator = node.child_by_field_name(cs.FIELD_DECLARATOR) | ||
| declarator = _c_unwrap_declarator(declarator) | ||
| if declarator and declarator.type == cs.TS_CPP_FUNCTION_DECLARATOR: | ||
| name_node = declarator.child_by_field_name(cs.FIELD_DECLARATOR) | ||
| if name_node and name_node.type == cs.TS_IDENTIFIER and name_node.text: | ||
| return name_node.text.decode(cs.ENCODING_UTF8) | ||
| return _generic_get_name(node) | ||
|
Comment on lines
+106
to
+118
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
CPP_NAME_NODE_TYPES = (
CppNodeType.CLASS_SPECIFIER,
TS_STRUCT_SPECIFIER,
TS_ENUM_SPECIFIER,
)
A dedicated # in constants.py
C_NAME_NODE_TYPES = (
TS_STRUCT_SPECIFIER,
TS_UNION_SPECIFIER,
TS_ENUM_SPECIFIER,
)and Prompt To Fix With AIThis is a comment left during a code review.
Path: codebase_rag/language_spec.py
Line: 107-120
Comment:
**`union_specifier` names silently fall through to `_generic_get_name`**
`CPP_NAME_NODE_TYPES` is defined in `constants.py` (line 2656–2660) as:
```python
CPP_NAME_NODE_TYPES = (
CppNodeType.CLASS_SPECIFIER,
TS_STRUCT_SPECIFIER,
TS_ENUM_SPECIFIER,
)
```
`TS_UNION_SPECIFIER` is **not** in this tuple. Because `_c_get_name` delegates to `_generic_get_name` for any node that is neither in `CPP_NAME_NODE_TYPES` nor a `function_definition`, union nodes take the generic path instead of the explicit struct/enum path. While `_generic_get_name` likely resolves the `name` field correctly in practice, it is fragile and confusing: `_c_get_name` is using a constant explicitly named for C++ that intentionally omits unions.
A dedicated `C_NAME_NODE_TYPES` constant should be defined in `constants.py` that includes `TS_UNION_SPECIFIER`:
```python
# in constants.py
C_NAME_NODE_TYPES = (
TS_STRUCT_SPECIFIER,
TS_UNION_SPECIFIER,
TS_ENUM_SPECIFIER,
)
```
and `_c_get_name` should reference `cs.C_NAME_NODE_TYPES` instead of `cs.CPP_NAME_NODE_TYPES`.
How can I resolve this? If you propose a fix, please make it concise.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added C_NAME_NODE_TYPES to constants.py with TS_STRUCT_SPECIFIER, TS_UNION_SPECIFIER, and TS_ENUM_SPECIFIER. _c_get_name now uses cs.C_NAME_NODE_TYPES so union nodes are handled explicitly. Commit 74e10f2. |
||
|
|
||
|
|
||
| def _cpp_get_name(node: Node) -> str | None: | ||
| if node.type in cs.CPP_NAME_NODE_TYPES: | ||
| name_node = node.child_by_field_name(cs.FIELD_NAME) | ||
|
|
@@ -154,6 +175,13 @@ def _cpp_get_name(node: Node) -> str | None: | |
| file_to_module_parts=_generic_file_to_module, | ||
| ) | ||
|
|
||
| C_FQN_SPEC = FQNSpec( | ||
| scope_node_types=frozenset(cs.FQN_C_SCOPE_TYPES), | ||
| function_node_types=frozenset(cs.FQN_C_FUNCTION_TYPES), | ||
| get_name=_c_get_name, | ||
| file_to_module_parts=_generic_file_to_module, | ||
| ) | ||
|
|
||
| LUA_FQN_SPEC = FQNSpec( | ||
| scope_node_types=frozenset(cs.FQN_LUA_SCOPE_TYPES), | ||
| function_node_types=frozenset(cs.FQN_LUA_FUNCTION_TYPES), | ||
|
|
@@ -195,6 +223,7 @@ def _cpp_get_name(node: Node) -> str | None: | |
| cs.SupportedLanguage.TS: TS_FQN_SPEC, | ||
| cs.SupportedLanguage.RUST: RUST_FQN_SPEC, | ||
| cs.SupportedLanguage.JAVA: JAVA_FQN_SPEC, | ||
| cs.SupportedLanguage.C: C_FQN_SPEC, | ||
| cs.SupportedLanguage.CPP: CPP_FQN_SPEC, | ||
| cs.SupportedLanguage.LUA: LUA_FQN_SPEC, | ||
| cs.SupportedLanguage.GO: GO_FQN_SPEC, | ||
|
|
@@ -343,6 +372,28 @@ def _cpp_get_name(node: Node) -> str | None: | |
| type: (type_identifier) @name) @call | ||
| """, | ||
| ), | ||
| cs.SupportedLanguage.C: LanguageSpec( | ||
| language=cs.SupportedLanguage.C, | ||
| file_extensions=cs.C_EXTENSIONS, | ||
| function_node_types=cs.SPEC_C_FUNCTION_TYPES, | ||
| class_node_types=cs.SPEC_C_CLASS_TYPES, | ||
| module_node_types=cs.SPEC_C_MODULE_TYPES, | ||
| call_node_types=cs.SPEC_C_CALL_TYPES, | ||
| import_node_types=cs.IMPORT_NODES_INCLUDE, | ||
| import_from_node_types=cs.IMPORT_NODES_INCLUDE, | ||
| package_indicators=cs.SPEC_C_PACKAGE_INDICATORS, | ||
| function_query=""" | ||
| (function_definition) @function | ||
| """, | ||
| class_query=""" | ||
| (struct_specifier) @class | ||
| (union_specifier) @class | ||
| (enum_specifier) @class | ||
| """, | ||
| call_query=""" | ||
| (call_expression) @call | ||
| """, | ||
| ), | ||
| cs.SupportedLanguage.CPP: LanguageSpec( | ||
| language=cs.SupportedLanguage.CPP, | ||
| file_extensions=cs.CPP_EXTENSIONS, | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better clarity and maintainability, consider aliasing the reused C++ tree-sitter node types to C-specific names. For example:
TS_C_TRANSLATION_UNIT = TS_CPP_TRANSLATION_UNIT. This would make the C configuration more self-documenting and less prone to confusion with the C++ spec.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a dedicated C_NAME_NODE_TYPES constant in constants.py instead of aliasing. It includes TS_STRUCT_SPECIFIER, TS_UNION_SPECIFIER, and TS_ENUM_SPECIFIER, and _c_get_name now references it. Commit 74e10f2.