feat(objectscript): InterSystems IRIS ObjectScript language support#467
feat(objectscript): InterSystems IRIS ObjectScript language support#467isc-tdyar wants to merge 1 commit into
Conversation
593b161 to
578d36f
Compare
|
@DeusData - pinging you as I think this is ready for review :) If not, please lmk what you think needs to happen. |
|
Thanks @isc-tdyar will check ASAP |
|
Good news, @isc-tdyar — the ObjectScript grammars are now vendored on Please rebase this PR onto latest One vendoring note (no action needed on your side): each Once it's rebased and green, I'll review the extraction code and we can land it. Thanks for your patience on the grammar-vendoring step — that was on us to do. |
Vendors the two MIT-licensed grammars from intersystems/tree-sitter-objectscript @a7ffcdf (the language vendor's official grammars, ABI 15) that PR DeusData#467 needs: internal/cbm/vendored/grammars/objectscript_udl/ (.cls) internal/cbm/vendored/grammars/objectscript_routine/ (.mac/.inc/.rtn/.int) Each directory carries the verbatim generated parser.c + scanner.c + tree_sitter/ headers + LICENSE. The one adjustment: each scanner.c's upstream `#include "../../common/scanner.h"` is repointed to a per-directory `objectscript_common.h` (a verbatim copy of the upstream common/scanner.h), because this repo's shared vendored/common/scanner.h belongs to the cfml/fsharp grammars and differs. Updates MANIFEST.md + THIRD_PARTY.md. These files are dormant until DeusData#467 (the grammar shims + extraction) is rebased on top; main's build is unaffected. Refs DeusData#462, DeusData#467. Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
6f52def to
855918f
Compare
Add ObjectScript (InterSystems IRIS / Caché) as a supported language,
covering the UDL class format (.cls), MAC/INT routines (.mac/.int/.rtn),
include/macro files (.inc), and IRIS Studio Export XML.
Definition extraction (extract_defs.c): Class, Method, ClassMethod,
Property, Parameter, Index, Trigger (with body text), XData, Storage,
and Query members as graph nodes; base classes from the Extends clause.
Call dispatch resolution (extract_calls.c) — four ObjectScript patterns
that are structurally invisible to text search:
1. ##class(Pkg.Class).Method() explicit cross-class call
2. ..Method() relative-dot self-call (the dominant
intra-class form; large impact on
CALLS completeness)
3. $$$Macro macro expansion via a per-project
table built from .inc files
4. type inference from %New/%OpenId + declared return types
Ensemble production topology (pass_ensemble_routing.c): EnsembleItem
nodes per production component and ROUTES_TO edges resolved from
ProductionDefinition XData, plus WorkMgr .Queue("##class(X).method")
dispatch — all parsed statically at index time, no live IRIS required.
Language detection (language.c): .mac/.int/.rtn map to ObjectScript
routine directly; .cls (shared with Apex) and .inc (shared with BitBake)
are disambiguated by content, defaulting to the existing language on any
doubt so neither Apex nor BitBake detection regresses.
The two new per-project tables (macros, return types) are threaded
through a new internal cbm_extract_file_ex() so the public
cbm_extract_file() signature is unchanged.
The tree-sitter grammars are NOT vendored in this PR; they are a
dependency to be vendored separately from
https://github.com/intersystems/tree-sitter-objectscript (MIT).
The build will not link until the grammar is present.
Refs DeusData#462
Signed-off-by: Thomas Dyar <tdyar@intersystems.com>
855918f to
358eb60
Compare
What does this PR do?
Adds ObjectScript (InterSystems IRIS / Caché) as a supported language, per the discussion in #462.
ObjectScript powers large healthcare, finance, and enterprise systems and has no support in CBM (or most code-graph tools). This PR makes those codebases indexable and resolves the call-dispatch patterns that are structurally invisible to text search.
Refs #462
Per your note on #462, this PR contains only the CBM source changes — not the vendored grammar. The two tree-sitter grammars come from intersystems/tree-sitter-objectscript (MIT licensed, maintained by the language vendor):
objectscript_udl(.cls) andobjectscript_routine(.mac/.inc/.rtn/.int).Consequence: the build will not link until the grammar is vendored at
internal/cbm/vendored/grammars/objectscript_udl/and…/objectscript_routine/— it fails on the missingtree_sitter_objectscript_udl()/_routine()symbols. CI will be red until then. That is intentional, matching your plan to audit and vendor the grammar independently. The grammar shims (grammar_objectscript_*.c) that declare those factories are included — only the generatedparser.c/scanner.care omitted.What's in the PR
Definition extraction — Class, Method, ClassMethod, Property, Parameter, Index, Trigger (with body text), XData, Storage, Query members → graph nodes; base classes from the
Extendsclause.Four call-dispatch patterns (all resolved statically at index time):
##class(Pkg.Class).Method()..Method()$$$Macro.incfilesSet x = ##class(P).%New()…x.Save()%New/%OpenId+ declared return typesEnsemble production topology (
pass_ensemble_routing.c) —EnsembleItemnodes per production component andROUTES_TOedges resolved fromProductionDefinitionXData; plus WorkMgr.Queue("##class(X).method")dispatch. All static — no live IRIS instance required.Two design points for your review
Public API unchanged. ObjectScript needs two per-project tables (a
$$$macrotable and a method-return-type table) that single-file extraction can't build alone. Rather than widen the publiccbm_extract_file()signature (which would rippleNULL, NULLthrough every call site), I added an internalcbm_extract_file_ex()that carries the tables;cbm_extract_file()is a thin wrapper that delegates withNULL, NULL. Only the pipeline passes that build the tables call_ex.Extension collisions with Apex and BitBake (both added since I started this work).
.mac/.int/.rtnmap to ObjectScript routine directly. The two collisions are resolved by content sniffing, following the existingcbm_disambiguate_m()(.mMATLAB-vs-ObjC) pattern, and default to the existing language on any doubt so neither Apex nor BitBake regresses:.cls(vs Apex): aClass <Uppercase…>header line → ObjectScript UDL, else Apex. Edge case: a.clswhoseClassline sits beyond the first 4 KB (e.g. a very large license banner) would fall through to Apex..inc(vs BitBake): aROUTINE <Uppercase>header or an ObjectScript preprocessor directive (#define/#def1arg/#;) → ObjectScript routine, else BitBake. (#define/#def1argnever collide with BitBake, which uses#only for# comment.)These are the spots most likely to need your input — happy to adjust the heuristics or the generalization however you prefer.
EnsembleItem label (your Q2 on #462)
I used a domain-specific
EnsembleItemnode label andROUTES_TOedge type. If you'd prefer a generic label (ServiceComponent/WorkflowNode) withensemble_itemas a property, I'm glad to rename — just let me know before merge.Tests
tests/test_extraction.cgains the ObjectScript suite: UDL class/method extraction, all four dispatch patterns, Ensemble topology parsing, macro expansion, trigger body text, and Export-XML transcoding. The Export-XML transcoder tests are grammar-independent and pass today; the grammar-dependent tests pass once the grammar is vendored. No other test files are touched.Scope / roadmap
This PR is the foundation. If it's well received, two separate follow-up PRs would complete the story (each with its own issue): (a) cross-version
version_tag+diff_versions, and (b) ObjectScript-tuned semantic embeddings. They're deliberately excluded here to keep this reviewable.Checklist
git commit -s) — DCOmake -f Makefile.cbm test) — passes once the grammar is vendored; see note abovemake -f Makefile.cbm lint-ci)