Skip to content

Add Scala and HCL language support#155

Merged
simianhacker merged 7 commits intomainfrom
add-scala-hcl
Mar 19, 2026
Merged

Add Scala and HCL language support#155
simianhacker merged 7 commits intomainfrom
add-scala-hcl

Conversation

@simianhacker
Copy link
Copy Markdown
Member

Summary

  • Add Scala (.scala) language support with tree-sitter parser, including symbol queries for classes, objects, traits, functions, vals/vars, and export queries
  • Add HCL (.tf, .hcl) language support with tree-sitter parser, including symbol queries for block types, block labels, attributes, and function calls
  • Register both languages in the language configuration index

Test plan

  • Unit tests for Scala language registration and parsing
  • Unit tests for HCL language registration and parsing (both .tf and .hcl extensions)
  • Type check passes cleanly

🤖 Generated with Claude Code

simianhacker and others added 2 commits March 19, 2026 03:03
Register Scala and HCL language configs and extension mappings so .scala, .tf, and .hcl files are indexed via tree-sitter with coverage in parser and language unit tests.

Made-with: Cursor
- Revert unrelated test script change (OTEL env var clearing) from package.json
- Rename HCL block.name capture to block.type since it captures the block type keyword
- Add block.label symbol query to capture HCL block labels (e.g. resource type and name strings)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 19, 2026 06:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Tree-sitter–based language support for Scala (.scala) and HCL (.tf, .hcl) and registers both in the language configuration index, with accompanying unit tests for registration and basic parsing/recognition.

Changes:

  • Add new language configurations for Scala and HCL (Tree-sitter parsers + queries).
  • Register scala and hcl in languageConfigurations.
  • Extend unit tests to include the new languages and verify basic parsing/extension recognition.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/languages/scala.ts Introduces Scala language config: suffixes, queries, symbol/import/export queries.
src/languages/hcl.ts Introduces HCL language config: suffixes, queries, symbol queries.
src/languages/index.ts Registers scala and hcl in the supported language map.
tests/unit/parser.test.ts Adds Scala/HCL extension recognition parsing tests and includes them in test language list.
tests/unit/languages.test.ts Adds coverage that parseLanguageNames and supported languages include scala and hcl.
package.json Adds dependencies for Tree-sitter Scala and HCL grammars.
package-lock.json Locks new dependency versions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

'(return_expression) @return',
'(function_definition) @function',
'(class_definition) @class',
'(object_definition) @object',
expect(result.chunks[0].language).toBe('handlebars');
});

it('should recognize .scala file extension', () => {
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7853fe2f-0294-42b7-8e91-2724d7c01b6a

📥 Commits

Reviewing files that changed from the base of the PR and between cc5f1d7 and ae371e9.

⛔ Files ignored due to path filters (1)
  • tests/unit/__snapshots__/parser.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (2)
  • src/languages/hcl.ts
  • tests/unit/parser.test.ts

📝 Walkthrough

Walkthrough

Adds Scala and HCL language support. Two tree-sitter parser dependencies were added to package.json. New language configuration modules (src/languages/scala.ts, src/languages/hcl.ts) register names, file suffixes, parsers, queries, and symbol/import/export capture rules. The language registry (src/languages/index.ts) now includes scala and hcl, expanding the LanguageName set. LanguageParser.parseWithTreeSitter normalizes captured import paths by stripping quotes and a leading import token. Tests and fixtures were added/updated to cover parsing, symbols, imports, and exports for Scala and HCL.

Sequence Diagram(s)

sequenceDiagram
    participant FS as FileSystem
    participant LR as LanguageRegistry
    participant P as LanguageParser
    participant TS as TreeSitter
    participant Out as Result

    FS->>P: open file (.scala / .tf / .hcl)
    P->>LR: determine language by extension
    LR-->>P: return languageConfig (scala / hcl)
    P->>TS: load parser from languageConfig (tree-sitter)
    TS-->>P: AST + capture matches (queries)
    P->>P: extract imports (strip quotes and leading 'import '), symbols, exports
    P->>Out: return chunks with language, symbols, imports, exports, metrics
Loading
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding support for two new languages (Scala and HCL) with their respective tree-sitter parsers.
Description check ✅ Passed The description is directly related to the changeset, detailing the additions of Scala and HCL language support, including symbol queries and test coverage.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/parser.test.ts (1)

357-382: ⚠️ Potential issue | 🟡 Minor

Remove duplicate C fixture tests at lines 357-382.

Lines 357-382 are exact duplicates of the same tests at lines 241-266 ("should parse C fixtures correctly" and "should extract symbols from C fixtures correctly"). Remove the duplicate test blocks.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a33d080e-dd35-4092-b30b-b1a181c2c755

📥 Commits

Reviewing files that changed from the base of the PR and between 6883198 and 9b0324c.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (6)
  • package.json
  • src/languages/hcl.ts
  • src/languages/index.ts
  • src/languages/scala.ts
  • tests/unit/languages.test.ts
  • tests/unit/parser.test.ts

- Fix Scala importQueries: remove broken query pattern that silently
  failed due to tree-sitter-scala using flat identifier sequences
  (no wrapper node like Java's scoped_identifier)
- Add trait_definition and enum_definition to Scala queries so
  exports for traits/enums are properly attached to chunks
- Remove duplicate C fixture tests (exact copies of earlier tests)
- Add Scala and HCL test fixtures with proper symbol and export
  extraction assertions
- Update snapshots
@simianhacker
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@simianhacker
Copy link
Copy Markdown
Member Author

Integration Test Results

Deployed the updated image to the k8s indexing cluster and ran a full index of elastic/cloud to validate Scala and HCL support end-to-end.

Results

Language Chunks Indexed
Scala 106,300
HCL 27,200

Total: 24,366 files and 103,244 symbols indexed across the repo.

Validation

Ran semantic searches against the indexed data via SCS and confirmed:

  • Scala: Correctly chunked and indexed Scala service code from scala-services/ — objects, traits, sealed trait hierarchies, case classes, and function definitions all captured with proper kind tags (object_definition, class_definition, trait_definition, etc.)
  • HCL: Correctly chunked Terraform infrastructure from infrastructure/aws_security_group, aws_route53_zone, aws_elb, aws_dynamodb_table and other resource blocks indexed with kind: block, full resource bodies preserved

Semantic search is returning relevant, well-structured results for both languages. Ship it. 🚀

@kapral18 kapral18 self-requested a review March 19, 2026 20:19
Copy link
Copy Markdown
Collaborator

@kapral18 kapral18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but make sure to fix coderabbit and copilot suggestions before merge :)

@simianhacker
Copy link
Copy Markdown
Member Author

All review feedback has been addressed in commit 6a7777d. Summary of changes made:

CodeRabbit (CHANGES_REQUESTED on 9b0324c — now stale):

  • ✅ Broken importQueries pattern removed — the query (import_declaration path: (identifier) @import.path) silently failed due to tree-sitter-scala's hidden _namespace_expression node. Replaced with a comment explaining the limitation (same pattern as other languages where structured import extraction isn't possible).
  • ✅ Duplicate C fixture tests at lines 357-382 removed.

Copilot:

  • trait_definition and enum_definition added to scalaConfig.queries so chunks are created for those nodes and exports attach correctly.
  • ✅ Added fixture files (tests/fixtures/scala.scala, tests/fixtures/hcl.tf) and proper symbol/export extraction assertions for both Scala and HCL — no longer just extension recognition tests.

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

@simianhacker: Thank you for the detailed summary! All the addressed items look solid:

  • Removing the broken importQueries pattern (and documenting the limitation) is the right call given the hidden _namespace_expression node in tree-sitter-scala.
  • Adding trait_definition and enum_definition to scalaConfig.queries ensures those nodes get properly chunked and their exports attach correctly.
  • The fixture-based tests with symbol/export extraction assertions are a much stronger validation than extension-only checks.
  • The integration test results against elastic/cloud — 106K Scala chunks and 27K HCL chunks across 24K files with 103K symbols — are a great real-world signal. 🚀

Resolving all open review comments now.

✅ Actions performed

Comments resolved and changes approved.

Capture the whole import_declaration node as @import.path and strip
the leading 'import ' keyword, rather than trying to capture individual
path identifiers. This produces full dotted paths like
'scala.collection.mutable.ListBuffer' correctly.

Also adds a test asserting Scala imports are extracted with full paths.
@simianhacker
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@simianhacker
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1b29bd9e-e378-43f7-b923-24a5ae3c38a3

📥 Commits

Reviewing files that changed from the base of the PR and between 9b0324c and 1e5f0af.

⛔ Files ignored due to path filters (1)
  • tests/unit/__snapshots__/parser.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (5)
  • src/languages/scala.ts
  • src/utils/parser.ts
  • tests/fixtures/hcl.tf
  • tests/fixtures/scala.scala
  • tests/unit/parser.test.ts

- Fix chained .replace() formatting in parser.ts
- Fix misleading '// A case class' comment in scala fixture
- Fix trailing comma formatting in import assertion test
Copilot AI review requested due to automatic review settings March 19, 2026 22:08
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 19dabc80-efd9-4647-a3de-182b4c884c0c

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5f0af and 2fef318.

⛔ Files ignored due to path filters (1)
  • tests/unit/__snapshots__/parser.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (3)
  • src/utils/parser.ts
  • tests/fixtures/scala.scala
  • tests/unit/parser.test.ts

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Tree-sitter-based language support for Scala (.scala) and HCL (.tf, .hcl) to the parser/language-configuration system, along with fixtures, unit tests, and dependency updates.

Changes:

  • Add new language configurations for Scala and HCL (Tree-sitter parsers + symbol/import/export queries as applicable).
  • Update Tree-sitter import-path extraction to handle Scala import_declaration captures.
  • Add fixtures + unit tests for extension recognition, parsing, and symbol/import/export extraction; update snapshots accordingly.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/utils/parser.ts Normalizes import capture text (strip quotes + leading import) to support Scala import captures.
src/languages/scala.ts Introduces Scala language configuration (suffixes, queries, symbols, exports, imports).
src/languages/hcl.ts Introduces HCL language configuration (suffixes, queries, symbols).
src/languages/index.ts Registers scala and hcl in the central language configuration map.
tests/unit/parser.test.ts Adds parsing/extension/symbol/import/export tests for Scala and HCL.
tests/unit/languages.test.ts Adds parseLanguageNames coverage for scala and hcl.
tests/fixtures/scala.scala Scala fixture for parser/symbol/import/export assertions.
tests/fixtures/hcl.tf HCL fixture for parser/symbol assertions.
tests/unit/__snapshots__/parser.test.ts.snap Snapshot updates reflecting new fixtures and test ordering/output.
package.json / package-lock.json Adds Tree-sitter grammar dependencies for Scala and HCL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

queries: ['(block) @block', '(attribute) @attribute', '(function_call) @function_call', '(comment) @comment'],
symbolQueries: [
'(block (identifier) @block.type)',
'(block (string_lit) @block.label)',
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HCL block label symbols are currently captured as string_lit, which includes the surrounding quotes in capture.node.text (e.g. names like "region"). This makes extracted symbol names inconsistent with other languages and harder to consume downstream. Consider capturing the unquoted string content node instead (if the grammar exposes one) or normalizing block.label symbol text to strip the surrounding quotes.

Suggested change
'(block (string_lit) @block.label)',
'(block (string_lit (string_content) @block.label))',

Copilot uses AI. Check for mistakes.
- Handle indented import declarations with ^\s*import\s+ regex
- Expand HCL symbol assertions to cover block.label and attribute.name
@simianhacker
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

- Fix HCL block.label symbols to capture unquoted names via
  template_literal instead of string_lit (e.g. 'aws_s3_bucket'
  instead of '"aws_s3_bucket"')
- Refactor repetitive temp-file extension tests into shared helper
- Update HCL block.label assertions to expect unquoted values
- Update snapshot
Copilot AI review requested due to automatic review settings March 19, 2026 22:33
@simianhacker
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@simianhacker simianhacker merged commit 2d02337 into main Mar 19, 2026
11 checks passed
@kapral18 kapral18 deleted the add-scala-hcl branch March 20, 2026 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants