Emit Run 0 SARIF for accepted Type-1 and Type-2 clone pairs#173
Emit Run 0 SARIF for accepted Type-1 and Type-2 clone pairs#173
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
Token-Pass Run 0 SARIF Emission for Accepted Type-1 and Type-2 Clone PairsOverviewIntroduces deterministic SARIF Run 0 emission for accepted Type-1 and Type-2 clone candidate pairs in New Module:
|
| Cohort / File(s) | Summary |
|---|---|
Core Run0 module crates/whitaker_clones_core/src/run0/mod.rs, crates/whitaker_clones_core/src/run0/emit.rs, crates/whitaker_clones_core/src/run0/error.rs, crates/whitaker_clones_core/src/run0/score.rs, crates/whitaker_clones_core/src/run0/span.rs, crates/whitaker_clones_core/src/run0/types.rs |
Introduce Run 0 implementation: candidate acceptance, integer-only Jaccard scoring and threshold validation, profile enforcement and missing/empty fingerprint errors, deterministic SHA‑256 partial fingerprints, byte-range→SARIF region conversion with UTF‑8 boundary validation, SARIF result construction, deterministic sorting and deduplication, and typed Run0Error/Run0Result. |
Run0 re‑exports crates/whitaker_clones_core/src/lib.rs |
Re-export run0 API at crate root: accept_candidate_pairs, emit_run0 and related types (Run0Error, Run0Result, SimilarityRatio, SimilarityThreshold, TokenFragment, TokenPassConfig, AcceptedPair). |
Dependencies crates/whitaker_clones_core/Cargo.toml |
Add workspace dependencies: sha2 and whitaker_sarif. |
Unit tests crates/whitaker_clones_core/src/run0/tests.rs, crates/whitaker_clones_core/src/run0/tests_emit.rs, crates/whitaker_clones_core/src/run0/test_helpers.rs |
Add unit tests and test helpers covering threshold boundaries, Jaccard behaviour (duplicates and empty sets), region mapping and UTF‑8 boundary errors, SARIF emission shape/content, sorting and deduplication, and expected fingerprints/properties. |
BDD integration tests crates/whitaker_clones_core/tests/run0_sarif_behaviour.rs, crates/whitaker_clones_core/tests/features/run0_sarif.feature |
Add BDD scenarios exercising end‑to‑end Run 0: T1/T2 acceptance, below‑threshold rejection, empty fingerprint errors, multiline region mapping, deterministic output, and reversed input handling. |
Documentation: execplan & design docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md, docs/whitaker-clone-detector-design.md |
Add execplan and design notes describing Run 0 contract: integer thresholding, SHA‑256 fingerprinting, SARIF contracts, region mapping rules, deterministic ordering and deduplication; update design decisions and implementation plan. |
Roadmap docs/roadmap.md |
Mark roadmap item 7.2.3 as completed for SARIF Run 0 emission. |
ADR text update docs/adr-002-dylint-expect-attribute-macro.md |
Minor phrasing and line‑wrapping adjustments; no behavioural change. |
Sequence Diagram
sequenceDiagram
participant Client
participant Fragments as Fragment Store
participant Candidates as Candidate Queue
participant Accept as accept_candidate_pairs()
participant Score as Similarity Scorer
participant Emit as emit_run0()
participant SARIF as whitaker_sarif
participant Results as SARIF Run
Client->>Fragments: Load TokenFragment instances
Client->>Candidates: Queue CandidatePair entries
Client->>Accept: call accept_candidate_pairs(fragments, candidates, config)
activate Accept
Accept->>Accept: Validate thresholds and build lookup
Accept->>Fragments: Resolve CandidatePair -> (left, right)
Accept->>Accept: Enforce same NormProfile and non-empty retained fingerprints
loop For each candidate
Accept->>Score: jaccard_similarity(left_hashes, right_hashes)
activate Score
Score-->>Accept: Return SimilarityRatio or None
deactivate Score
Accept->>Accept: select_rule_profile(ratio, thresholds)
Accept->>Accept: Accumulate AcceptedPair if accepted
end
Accept->>Accept: Sort & deduplicate AcceptedPair list
Accept-->>Client: Return Vec<AcceptedPair>
deactivate Accept
Client->>Emit: call emit_run0(fragments, accepted_pairs, config)
activate Emit
Emit->>Fragments: Resolve pairs and validate fingerprints
loop For each AcceptedPair
Emit->>Emit: Compute pair/token SHA‑256 hex fingerprints
Emit->>Emit: Convert byte ranges -> SARIF Region (validate UTF‑8)
Emit->>SARIF: build_result with rule id, locations, fingerprints, properties
SARIF-->>Results: Append SarifResult
end
Emit->>Results: Sort and deduplicate SarifResult list
Emit->>SARIF: Build Run (tool name/version, rules, results)
Emit-->>Client: Return SARIF Run
deactivate Emit
Poem
🔍 Tokens pass through similarity's gaze,
Jaccard counts align in careful ways,
SHA‑256 fingerprints steady and clear,
SARIF maps bytes to lines, precise and near,
Run 0 emits results, sorted and sincere. ✨
🚥 Pre-merge checks | ✅ 5 | ❌ 2
❌ Failed checks (2 warnings)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Testing | The PR introduces substantial new functionality with 11 unit tests + 6 BDD scenarios covering happy paths and many error cases, but several public API items and error paths lack adequate test coverage including SimilarityRatio accessors and Run0Error variants. | Add unit tests for SimilarityThreshold::new() with invalid inputs, SimilarityRatio::as_decimal_string(), all public accessor methods, dedicated unit test for Run0Error::EmptyFingerprintSet, and InvalidScore tests using parameterised tests to maximise coverage of boundary conditions. | |
| Developer Documentation | The run0/ module's APIs and architectural boundaries lack documentation in docs/developers-guide.md, despite being documented in Rustdoc, design documents, and ExecPlan specifications. |
Add a new section to docs/developers-guide.md documenting the whitaker_clones_core architecture, covering the run0/ module's purpose, public APIs, and token-pass layer integration with usage examples. |
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title clearly summarises the main change: implementing Run 0 SARIF emission for accepted Type-1 and Type-2 clone pairs, which is the core objective of the PR. |
| Description check | ✅ Passed | The description is comprehensive and directly related to the changeset, detailing the new run0 module, public API, tests, documentation updates, and implementation rationale. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |
| User-Facing Documentation | ✅ Passed | This PR adds internal infrastructure to the unpublished whitaker_clones_core crate that is not yet exposed to end users through the CLI or Dylint lints. |
| Module-Level Documentation | ✅ Passed | All newly-added modules in the run0 subtree carry explicit, clear module-level docstrings explaining their purpose and function within the token-pass acceptance and SARIF Run 0 emission pipeline. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
emit-sarif-run0-type1-type2-qn437v
Comment @coderabbitai help to get the list of available commands and usage tips.
Reviewer's GuideImplements a new run0 module in whitaker_clones_core that accepts Type-1/Type-2 candidate pairs, scores them with integer Jaccard logic, and emits deterministic SARIF Run 0 results via whitaker_sarif, including stable fingerprints, span mapping, tests, and docs/roadmap updates to mark roadmap item 7.2.3 complete. Class diagram for whitaker_clones_core run0 types and functionsclassDiagram
class TokenFragment {
+FragmentId id()
+NormProfile profile()
+&str file_uri()
+&str source_text()
+&[Fingerprint] retained_fingerprints()
+new(id: FragmentId, profile: NormProfile, file_uri: String, source_text: String) TokenFragment
+with_retained_fingerprints(retained_fingerprints: Vec_Fingerprint_) TokenFragment
}
class TokenPassConfig {
+&str tool_name()
+&str tool_version()
+usize shingle_size()
+usize winnow_window()
+SimilarityThreshold type1_threshold()
+SimilarityThreshold type2_threshold()
+new(tool_name: String, tool_version: String) TokenPassConfig
+with_shingle_size(shingle_size: usize) TokenPassConfig
+with_winnow_window(winnow_window: usize) TokenPassConfig
+with_type1_threshold(threshold: SimilarityThreshold) TokenPassConfig
+with_type2_threshold(threshold: SimilarityThreshold) TokenPassConfig
}
class AcceptedPair {
+&CandidatePair pair()
+NormProfile profile()
+SimilarityRatio score()
+new(pair: CandidatePair, profile: NormProfile, score: SimilarityRatio) AcceptedPair
}
class SimilarityRatio {
-usize intersection
-usize union
+new(intersection: usize, union: usize) SimilarityRatio
+usize intersection()
+usize union()
+String as_decimal_string()
}
class SimilarityThreshold {
-&static str name
-usize numerator
-usize denominator
+new(name: &static str, numerator: usize, denominator: usize) SimilarityThreshold
+new_unchecked(name: &static str, numerator: usize, denominator: usize) SimilarityThreshold
+usize numerator()
+usize denominator()
}
class Run0Error {
<<enum>>
MissingFragment
EmptyFingerprintSet
MixedProfiles
InvalidThreshold
InvalidFingerprintRange
InvalidUtf8Boundary
Sarif
InvalidScore
}
class run0_mod {
+accept_candidate_pairs(fragments: &[TokenFragment], candidates: &[CandidatePair], config: &TokenPassConfig) Run0Result_Vec_AcceptedPair__
+emit_run0(fragments: &[TokenFragment], accepted_pairs: &[AcceptedPair], config: &TokenPassConfig) Run0Result_Run_
}
class CandidatePair {
+FragmentId left()
+FragmentId right()
}
class Fingerprint {
+u64 hash
+Range_usize_ range
}
class FragmentId
class NormProfile {
<<enum>>
T1
T2
}
class Run {
}
class WhitakerSarifBuilders {
RunBuilder
ResultBuilder
LocationBuilder
RegionBuilder
}
%% Relationships
TokenFragment --> FragmentId
TokenFragment --> NormProfile
TokenFragment --> Fingerprint
AcceptedPair --> CandidatePair
AcceptedPair --> NormProfile
AcceptedPair --> SimilarityRatio
TokenPassConfig --> SimilarityThreshold
SimilarityThreshold ..> Run0Error
SimilarityRatio ..> Run0Error
run0_mod ..> TokenFragment
run0_mod ..> TokenPassConfig
run0_mod ..> AcceptedPair
run0_mod ..> CandidatePair
run0_mod ..> Run0Error
run0_mod ..> Run
run0_mod ..> WhitakerSarifBuilders
Run0Error ..> Run
Run ..> WhitakerSarifBuilders
Flow diagram for byte-range to SARIF region mapping in span moduleflowchart TD
A["Input: fragment_id, source_text, byte range start..end"] --> B["validate_range: check start < end and end <= source_text.len"]
B -->|invalid| E1["Run0Error::InvalidFingerprintRange"]
B -->|valid| C["check UTF-8 boundaries at start and end"]
C -->|invalid| E2["Run0Error::InvalidUtf8Boundary"]
C -->|valid| D["line_starts: build vector of line start offsets"]
D --> F["line_and_column for start: find line_index where start_offset <= byte < next_start"]
F --> G["compute start_line = index+1, start_column = byte - line_start + 1"]
G --> H["line_and_column for end-1: same logic for end_position"]
H --> I["compute end_line and end_column"]
I --> J["build Region with start_line, start_column, end_line, end_column, byteOffset=start, byteLength=end-start"]
J --> K["Return Region or propagate SarifError as Run0Error::Sarif"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/whitaker_clones_core/src/run0/score.rs Comment on file }
pub(crate) fn validate(self) -> Run0Result<()> {
if self.numerator == 0 || self.denominator == 0 || self.numerator > self.denominator {❌ New issue: Complex Conditional |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/whitaker_clones_core/src/run0/tests.rs Comment on file fn boundary_threshold_accepts_type1_pair() {
let fragments = vec![
fragment(FragmentInput {
id: "alpha",
profile: NormProfile::T1,
file_uri: "src/a.rs",
source_text: "fn a() {}\n",
hashes: &[(1, 0..8), (2, 0..8)],
}),
fragment(FragmentInput {
id: "beta",
profile: NormProfile::T1,
file_uri: "src/b.rs",
source_text: "fn b() {}\n",
hashes: &[(1, 0..8), (2, 0..8)],
}),
];
let accepted = accept_candidate_pairs(&fragments, &[pair("alpha", "beta")], &config())
.unwrap_or_else(|error| panic!("unexpected acceptance error: {error}"));
assert_eq!(accepted.len(), 1);
assert_eq!(accepted[0].profile(), NormProfile::T1);
assert_eq!(accepted[0].score(), SimilarityRatio::new(2, 2));
}❌ New issue: Code Duplication |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/whitaker_clones_core/src/run0/tests.rs Comment on file fn emitted_result_contains_fingerprints_and_properties() {
let fragments = vec![
fragment(FragmentInput {
id: "alpha",
profile: NormProfile::T2,
file_uri: "src/a.rs",
source_text: "fn a(x: i32) {}\n",
hashes: &[(1, 0..15), (2, 0..15)],
}),
fragment(FragmentInput {
id: "beta",
profile: NormProfile::T2,
file_uri: "src/b.rs",
source_text: "fn b(y: i32) {}\n",
hashes: &[(1, 0..15), (2, 0..15)],
}),
];
let accepted = vec![AcceptedPair::new(
pair("alpha", "beta"),
NormProfile::T2,
SimilarityRatio::new(2, 2),
)];
let run = emit_run0(&fragments, &accepted, &config())
.unwrap_or_else(|error| panic!("unexpected emit error: {error}"));
let [result] = run.results.as_slice() else {
panic!("expected one result");
};
let properties = result
.properties
.as_ref()
.unwrap_or_else(|| panic!("Whitaker properties must be present"));
let extracted = WhitakerProperties::try_from(properties)
.unwrap_or_else(|error| panic!("unexpected property extraction error: {error}"));
assert_eq!(result.rule_id, WHK002_ID);
assert!(
result
.partial_fingerprints
.contains_key(WHITAKER_FRAGMENT_KEY)
);
assert!(result.partial_fingerprints.contains_key("tokenHash"));
assert_eq!(extracted.profile, "T2");
assert_eq!(extracted.k, 25);
assert_eq!(extracted.window, 16);
assert_eq!(extracted.class_size, 2);
}❌ New issue: Large Assertion Blocks |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/whitaker_clones_core/tests/run0_sarif_behaviour.rs Comment on file fn then_whitaker_profile_is(world: &Run0World, profile: String) {
with_results(world, |results| {
let [result] = results else {
panic!("exactly one result must exist before checking Whitaker properties");
};
let properties = match result.properties.as_ref() {
Some(properties) => properties,
None => panic!("Whitaker properties must be present"),
};
match WhitakerProperties::try_from(properties) {
Ok(extracted) => assert_eq!(extracted.profile, profile),
Err(error) => panic!("unexpected property extraction error: {error}"),
}
});
}❌ New issue: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/whitaker_clones_core/src/run0/tests.rs Comment on file fn single_line_region_uses_one_based_columns() {
let region = region_for_range("alpha", "fn a() {}\n", 0..8)
.unwrap_or_else(|error| panic!("unexpected region error: {error}"));
assert_eq!(region.start_line, 1);
assert_eq!(region.start_column, Some(1));
assert_eq!(region.end_line, Some(1));
assert_eq!(region.end_column, Some(8));
assert_eq!(region.byte_offset, Some(0));
assert_eq!(region.byte_length, Some(8));
}❌ New issue: Large Assertion Blocks |
This comment was marked as resolved.
This comment was marked as resolved.
278c73b to
eb5f747
Compare
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: Code Duplicationcrates/whitaker_clones_core/src/run0/tests.rs: What lead to degradation?The module contains 5 functions with similar structure: boundary_threshold_accepts_type1_pair,boundary_threshold_accepts_type2_pair,just_below_threshold_is_rejected,multi_line_region_tracks_trailing_newline and 1 more functions Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. Code Duplicationcrates/whitaker_clones_core/src/run0/tests.rs: What lead to degradation?The module contains 5 functions with similar structure: boundary_threshold_accepts_type1_pair,boundary_threshold_accepts_type2_pair,just_below_threshold_is_rejected,multi_line_region_tracks_trailing_newline and 1 more functions Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
This comment was marked as resolved.
This comment was marked as resolved.
Introduce a comprehensive new ExecPlan markdown document specifying roadmap item 7.2.3. This document outlines the design, constraints, tolerances, risks, and detailed implementation plan for emitting SARIF Run 0 data for accepted Type-1 and Type-2 clone pairs within `whitaker_clones_core`. It details the expected public API, testing strategy, and integration points with `whitaker_sarif`. This lays the groundwork for upcoming development stages and enforces documentation and quality standards for the feature. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
- Introduce a dedicated `run0/` module subtree for token-pass acceptance and SARIF emission - Implement acceptance of candidate pairs by profile with Jaccard similarity thresholding - Emit deterministic SARIF Run 0 results including primary and related locations - Compute SHA-256 hex digests for pair fingerprints and token hashes - Add integer-only Jaccard scoring and threshold validation - Map byte-range fingerprints to SARIF regions with line-column info - Provide comprehensive typed error handling for emission failures - Add unit tests and BDD feature scenarios for acceptance and emission - Update crate public API to expose new token-pass types and methods - Update docs and roadmap to mark 7.2.3 implementation complete Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
- Added `is_valid` method to `SimilarityThreshold` for clarity and reused it in validation. - Extracted common test logic for building pairs and accepting them into `build_pair_and_accept` helper. - Added tests ensuring emitted SARIF results contain correct Whitaker rule ID and fingerprints. - Refactored test step definitions to use a helper `with_whitaker_properties` for cleaner property assertions. - Improved test readability and reduced duplication across run0 tests and SARIF behavior tests. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Extract helper functions `assert_single_accepted` and `assert_region` to reduce duplication and improve clarity in test assertions. Update tests to use these helpers for verifying accepted pairs and region calculations. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
a8c0e86 to
3c98af7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c98af7edb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/whitaker_clones_core/src/run0/emit.rs`:
- Around line 48-57: The calls to ensure_non_empty are redundant with
jaccard_similarity's None result; remove the ensure_non_empty(left) and
ensure_non_empty(right) calls and keep the let Some(score) =
jaccard_similarity(left.retained_fingerprints(), right.retained_fingerprints())
else { return Err(Run0Error::EmptyFingerprintSet { fragment_id:
left.id().as_str().to_owned(), }); }; construct so empty fingerprint sets are
handled only by jaccard_similarity returning None (referencing ensure_non_empty,
jaccard_similarity, Run0Error::EmptyFingerprintSet,
left.retained_fingerprints(), right.retained_fingerprints(), left.id()).
In `@crates/whitaker_clones_core/src/run0/score.rs`:
- Around line 159-171: The repeated_division function performs O(quotient) work
by looping subtraction; replace that loop with native integer division and
modulus to get the quotient and remainder in constant time: in
repeated_division, keep the denominator==0 check returning None, then compute
quotient = numerator / denominator and remainder = numerator % denominator and
return Some((quotient, remainder)) instead of the while loop to preserve
determinism and correctness (function name: repeated_division).
In `@crates/whitaker_clones_core/src/run0/span.rs`:
- Around line 59-65: The current line_and_column computes column as a byte
offset which breaks for non-ASCII characters; change its signature to fn
line_and_column(source_text: &str, starts: &[usize], offset: usize) -> (usize,
usize), clamp the offset to source_text.len(), take the substring from the line
start to the clamped offset, and compute the column as the count of UTF-16 code
units (e.g., line_slice.encode_utf16(...).count() + 1); then update every call
site (e.g., the places computing (start_line, start_column) and (end_line,
end_column)) to pass source_text as the first argument. Ensure you still compute
line_index the same way and return 1-based line and column numbers.
In `@crates/whitaker_clones_core/src/run0/tests.rs`:
- Around line 1-401: This file exceeds 400 lines because the T2 emission tests
and their helper run were kept in the same file; extract the T2-specific pieces
into a new test module/file to stay under the limit. Create a new tests_emit.rs
(or tests_emit submodule) and move make_t2_emission_run plus the three tests
emitted_t2_result_has_correct_rule_id,
emitted_t2_result_contains_required_fingerprint_keys, and
emitted_t2_result_properties_match_config into it; ensure the new module can
access helpers fragment(), FragmentInput, config(), fragment(), and any types
(WhitakerProperties, WHK002_ID) by importing with use super::* or by moving
small helper functions (e.g., make_t2_emission_run) as pub(crate) helpers into a
shared test_helpers.rs and have both test files import them; update the parent
tests.rs to declare the new submodule (mod tests_emit;) or include the new
test_helpers module so compilation and symbol resolution for
make_t2_emission_run and the emitted_* tests remains correct.
- Around line 42-45: The test helper function pair uses
CandidatePair::new(...).unwrap_or_else(|| panic!(...)), which is verbose; change
it to call .expect("pair `{left}` and `{right}` must be distinct") on the Result
returned by CandidatePair::new so the failure message remains descriptive but
uses the preferred .expect pattern; update the pair function that constructs
CandidatePair from FragmentId::from(left) and FragmentId::from(right)
accordingly.
In `@crates/whitaker_clones_core/src/run0/types.rs`:
- Around line 218-222: Add SimilarityRatio to the top-level imports (import it
from super::score) and then replace occurrences of super::score::SimilarityRatio
with the imported SimilarityRatio type in the AcceptedPair struct definition and
in the related method signatures that currently reference
super::score::SimilarityRatio; specifically update the AcceptedPair declaration
(pair: CandidatePair, profile: NormProfile, score: SimilarityRatio) and any
functions that take or return that score type so they use the imported symbol
instead of the fully-qualified path.
In `@crates/whitaker_clones_core/tests/run0_sarif_behaviour.rs`:
- Line 31: Replace the hard-coded tool version string in the TokenPassConfig
initialization with the compile-time crate version: locate the config field
where TokenPassConfig::new("whitaker_clones_cli@token", "0.2.1") is used and
change the second argument to env!("CARGO_PKG_VERSION") so the fixture binds to
package metadata at compile time.
In
`@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md`:
- Around line 523-525: The two long shell command lines (the ones starting with
"set -o pipefail && cargo test -p whitaker_clones_core" and "set -o pipefail &&
cargo clippy -p whitaker_clones_core --all-targets --all-features -- -D
warnings") must be wrapped to respect the 120-column Markdown rule: break each
command with a backslash continuation and indent the wrapped continuation lines
so the code block remains equivalent (use "set -o pipefail && \" then a newline
with two-space indentation before "cargo test ..." and similarly for the "cargo
clippy" command with an additional backslash before "-D warnings" to keep the
long flag on its own wrapped line), preserving the original behavior (pipes to
tee and redirection to logs).
In `@docs/whitaker-clone-detector-design.md`:
- Around line 505-508: Update the wording to clarify error types across layers:
explain that while the index/minhash layer returns
IndexError::EmptyFingerprintSet when sketching an empty retained-fingerprint
list, the Run 0 layer (run0/emit.rs) does not propagate that error and instead
surfaces Run0Error::EmptyFingerprintSet; callers should match on
Run0Error::EmptyFingerprintSet for Run 0 operations or translate it to the index
error if normalizing across layers. Include references to
IndexError::EmptyFingerprintSet and Run0Error::EmptyFingerprintSet so readers
can locate the exact variants.
- Around line 477-483: Replace the article text that reads "a SHA-256 hex
digest" with "an SHA-256 hex digest" for the descriptions of
partialFingerprints["whitakerFragment"] and partialFingerprints["tokenHash"];
update the sentence(s) that mention those symbols so they use "an" before
"SHA-256" to fix the grammar.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 99e47aa3-895a-4b2f-a3f0-a2f0353fea2c
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
crates/whitaker_clones_core/Cargo.tomlcrates/whitaker_clones_core/src/lib.rscrates/whitaker_clones_core/src/run0/emit.rscrates/whitaker_clones_core/src/run0/error.rscrates/whitaker_clones_core/src/run0/mod.rscrates/whitaker_clones_core/src/run0/score.rscrates/whitaker_clones_core/src/run0/span.rscrates/whitaker_clones_core/src/run0/tests.rscrates/whitaker_clones_core/src/run0/types.rscrates/whitaker_clones_core/tests/features/run0_sarif.featurecrates/whitaker_clones_core/tests/run0_sarif_behaviour.rsdocs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.mddocs/roadmap.mddocs/whitaker-clone-detector-design.md
- Simplify repeated_division to use native division operators with debug assertion - Fix column computation to count Unicode characters instead of bytes - Add test coverage for InvalidUtf8Boundary error path - Add test coverage for jaccard_similarity with empty fragments - Add test coverage for MissingFragment and MixedProfiles error paths - Fix documentation: use 'relatedLocations' instead of 'related_locations' - Define SARIF and CLI acronyms on first use in execplan The column computation fix ensures SARIF viewers correctly highlight non-ASCII source code by counting visual columns rather than byte offsets. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Code improvements:
- Remove redundant ensure_non_empty calls (jaccard_similarity handles this)
- Fix column computation to use UTF-16 code units for SARIF compatibility
- Split tests.rs into tests.rs and tests_emit.rs (now under 400 lines)
- Replace unwrap_or_else with expect in pair() test helper
- Import SimilarityRatio directly in types.rs (remove super:: prefix)
- Use env!("CARGO_PKG_VERSION") in run0_sarif_behaviour.rs
Documentation improvements:
- Wrap long shell commands in execplan to respect 120-column limit
- Fix article grammar: 'a SHA-256' -> 'an SHA-256' in design doc
- Clarify error types across layers (IndexError vs Run0Error)
The UTF-16 column fix ensures SARIF viewers correctly highlight code
in files with multi-byte UTF-8 characters, as SARIF columns are
UTF-16 code units per the specification.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (1)
crates/whitaker_clones_core/src/run0/tests.rs (1)
1-486: 🛠️ Refactor suggestion | 🟠 MajorSplit or parameterise the remaining test matrix.
The helpers improved readability, but the file is still 486 lines long, so it still misses the 400-line cap. Move the T2 emission block into a sibling test module and collapse the remaining acceptance and region permutations into
#[rstest(...)]cases to bring the file back under the limit.As per coding guidelines, “Files must not exceed 400 lines in length”, and “Replace duplicated tests with
#[rstest(...)]parameterised cases.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/whitaker_clones_core/src/run0/tests.rs` around lines 1 - 486, This test file exceeds the 400-line limit and duplicates test permutations; split the T2 emission tests into a separate sibling test module (move make_t2_emission_run and the emitted_t2_result_* tests into that module) and convert repeated acceptance/region cases that call build_pair_and_accept, boundary_threshold_accepts_type1_pair, boundary_threshold_accepts_type2_pair, just_below_threshold_is_rejected, duplicate_hashes_do_not_inflate_jaccard_score, jaccard_returns_none_for_empty_fragments, single_line_region_uses_one_based_columns, multi_line_region_tracks_trailing_newline, invalid_range_produces_typed_error, invalid_utf8_boundary_produces_typed_error, missing_fragment_produces_typed_error, and mixed_profiles_produces_typed_error into parameterised #[rstest(...)] cases that drive the same assertions (use the FragmentInput helper, fingerprint, fragment, pair, config, and assert_region/assert_single_accepted helpers) so duplicated scenarios are collapsed and the file drops below 400 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/whitaker_clones_core/src/run0/score.rs`:
- Around line 9-42: Add Rustdoc doctest examples for the public types and
methods so consumers can see runnable usage: add a "/// # Examples" block for
SimilarityRatio showing construction via SimilarityRatio::new(intersection,
union) and demonstrating formatting with as_decimal_string() (e.g., a simple
example that prints or asserts the string), and add a similar doctest for
SimilarityThreshold (constructing it and showing its behavior/formatting);
ensure examples are valid Rust code that compile and run under cargo test / make
test and reference the public symbols SimilarityRatio, SimilarityRatio::new,
SimilarityRatio::as_decimal_string, and the SimilarityThreshold type/methods so
documentation tests pass.
- Around line 82-90: The exported constructor new_unchecked on the
Threshold-like type exposes an invariant-bypassing API; restrict its visibility
to the crate by changing its signature from pub const fn new_unchecked(...) to
pub(crate) const fn new_unchecked(...), so only internal trusted defaults can
call it and external callers must use the validated public constructors/creators
(refer to the new_unchecked function in score.rs to locate the change).
In `@crates/whitaker_clones_core/src/run0/span.rs`:
- Around line 15-16: The end-position calculation currently does a raw byte
subtract (range.end.saturating_sub(1)), which can land inside a UTF-8 multibyte
character and cause the slice in line_and_column to panic; update the assignment
to compute end_position by finding the previous character boundary before
range.end (e.g., take source_text[..range.end].char_indices().next_back() and
use its index or fall back to range.start) so line_and_column receives a valid
character boundary; update the code around the end_position binding that
precedes the call to line_and_column(source_text, &starts, end_position) and
ensure errors still propagate as Run0Error where applicable.
In
`@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md`:
- Around line 88-89: The document introduces the unexplained term "Firecrawl
MCP"; expand the acronym on its first occurrence by replacing or augmenting
"Firecrawl MCP" with "Firecrawl MCP (Machine Communication Protocol)" or the
correct full phrase so readers see the full name on first use; ensure subsequent
mentions can keep the shortened "MCP" form and update any nearby references in
the same paragraph if needed.
- Line 1: The document is missing a top-level heading; update the first line to
be a level-1 heading by prefixing the existing title text "Emit Static Analysis
Results Interchange Format (SARIF) Run 0 for accepted Type-1 and Type-2 clone
pairs (roadmap 7.2.3)" with "# " so the file starts with "# Emit Static Analysis
Results Interchange Format (SARIF) Run 0 for accepted Type-1 and Type-2 clone
pairs (roadmap 7.2.3)".
- Line 141: Update the title-case headings to sentence case: change "Surprises &
Discoveries" to "Surprises & discoveries", "Decision Log" to "Decision log",
"Outcomes & Retrospective" to "Outcomes & retrospective", and convert all Stage
headings from Title Case to sentence case; apply the same normalization to the
other occurrences of these headings referenced in the review so all instances
match the documentation style guide.
---
Duplicate comments:
In `@crates/whitaker_clones_core/src/run0/tests.rs`:
- Around line 1-486: This test file exceeds the 400-line limit and duplicates
test permutations; split the T2 emission tests into a separate sibling test
module (move make_t2_emission_run and the emitted_t2_result_* tests into that
module) and convert repeated acceptance/region cases that call
build_pair_and_accept, boundary_threshold_accepts_type1_pair,
boundary_threshold_accepts_type2_pair, just_below_threshold_is_rejected,
duplicate_hashes_do_not_inflate_jaccard_score,
jaccard_returns_none_for_empty_fragments,
single_line_region_uses_one_based_columns,
multi_line_region_tracks_trailing_newline, invalid_range_produces_typed_error,
invalid_utf8_boundary_produces_typed_error,
missing_fragment_produces_typed_error, and mixed_profiles_produces_typed_error
into parameterised #[rstest(...)] cases that drive the same assertions (use the
FragmentInput helper, fingerprint, fragment, pair, config, and
assert_region/assert_single_accepted helpers) so duplicated scenarios are
collapsed and the file drops below 400 lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bf481d05-c5c5-43f6-9ba7-bd1e66b27ce1
📒 Files selected for processing (4)
crates/whitaker_clones_core/src/run0/score.rscrates/whitaker_clones_core/src/run0/span.rscrates/whitaker_clones_core/src/run0/tests.rsdocs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md (2)
88-89:⚠️ Potential issue | 🟡 MinorExpand
MCPon first use.Replace the first occurrence with the expanded form, then use the acronym
afterwards.Triage:
[type:docstyle]As per coding guidelines: “Expand any uncommon acronym on first use in documentation, for example, Continuous Integration (CI).”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md` around lines 88 - 89, Replace the first occurrence of the acronym "MCP" with its expanded form (e.g., "Model Clarification Protocol (MCP)" or the correct expanded term used in this project) in the document so the acronym is defined on first use, then keep using "MCP" for subsequent mentions; update the sentence containing "Firecrawl MCP" to read "Firecrawl [Expanded Form] (MCP)" and leave later references as "MCP".
141-141:⚠️ Potential issue | 🟡 MinorConvert headings to sentence case.
Rename these headings to sentence case to keep documentation style consistent.
Triage:
[type:docstyle]As per coding guidelines: “Write headings in sentence case in documentation.”
Also applies to: 169-169, 261-261, 368-368, 388-388, 414-414, 444-444, 480-480, 497-497, 502-502
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md` at line 141, Change the documentation headings from title case/ampersand form to sentence case—e.g., rename "Surprises & Discoveries" to "Surprises and discoveries"—and apply the same conversion to the other flagged headings in this markdown (replace "&" with "and" where appropriate and only capitalize the first word and proper nouns). Locate the headings in the file (e.g., the "Surprises & Discoveries" header and the other flagged headers) and update their text to sentence case to match the doc style guide.crates/whitaker_clones_core/src/run0/span.rs (1)
15-16:⚠️ Potential issue | 🔴 CriticalReplace raw byte subtraction for end-position calculation.
Fix Line 15.
range.end.saturating_sub(1)can land inside a multibyte UTF-8
code point and trigger a panic when slicing at Line 65.Patch end-position to the previous character boundary
- let end_position = range.end.saturating_sub(1); + let end_position = source_text[..range.end] + .char_indices() + .next_back() + .map_or(range.start, |(index, _)| index); let (end_line, end_column) = line_and_column(source_text, &starts, end_position);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/whitaker_clones_core/src/run0/span.rs` around lines 15 - 16, The subtraction end_position = range.end.saturating_sub(1) can land inside a UTF-8 codepoint; instead find the previous UTF-8 character boundary before range.end in the source_text and pass that byte index to line_and_column. Replace the raw subtraction with logic that walks back to the nearest char boundary (e.g., using source_text.is_char_boundary in a loop or taking source_text[..range.end].char_indices().last().map(|(i,_)| i)) to compute the safe end_position, keeping references to source_text, starts, and the call to line_and_column unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/whitaker_clones_core/src/run0/emit.rs`:
- Around line 41-54: The EmptyFingerprintSet error currently always reports
left.id() when jaccard_similarity returns None; change the logic so you check
which fragment has an empty retained_fingerprints() before returning the error
and include that fragment's id in Run0Error::EmptyFingerprintSet. Specifically,
in the code around resolve_pair(...) and the jaccard_similarity call, test
left.retained_fingerprints().is_empty() and
right.retained_fingerprints().is_empty() (or inspect the slices passed to
jaccard_similarity) and return Run0Error::EmptyFingerprintSet with fragment_id
set to the empty fragment's id (left.id().as_str().to_owned() or
right.id().as_str().to_owned()) so the error correctly names the failing
fragment.
In `@crates/whitaker_clones_core/src/run0/tests_emit.rs`:
- Around line 45-47: The test helper config() currently passes a hard-coded
version string to TokenPassConfig::new which will go stale after crate version
bumps; change the second argument to use the crate's package version at compile
time (e.g., via env!("CARGO_PKG_VERSION")) so TokenPassConfig::new in function
config() binds to the crate metadata instead of the literal "0.2.1".
In `@crates/whitaker_clones_core/src/run0/tests.rs`:
- Around line 7-12: The import block containing symbols AcceptedPair, Run0Error,
SimilarityThreshold, TokenFragment, TokenPassConfig, accept_candidate_pairs,
emit_run0, score::SimilarityRatio, and span::region_for_range is not
rustfmt-compliant; run rustfmt (cargo fmt) to reformat this use block so it
meets the project's formatting rules, then re-run cargo fmt --check to verify;
locate the use ... { ... } block that imports those symbols in the tests module
(references: AcceptedPair, Run0Error, SimilarityThreshold, TokenFragment,
TokenPassConfig, accept_candidate_pairs, emit_run0, score::SimilarityRatio,
span::region_for_range) and commit the formatted result.
---
Duplicate comments:
In `@crates/whitaker_clones_core/src/run0/span.rs`:
- Around line 15-16: The subtraction end_position = range.end.saturating_sub(1)
can land inside a UTF-8 codepoint; instead find the previous UTF-8 character
boundary before range.end in the source_text and pass that byte index to
line_and_column. Replace the raw subtraction with logic that walks back to the
nearest char boundary (e.g., using source_text.is_char_boundary in a loop or
taking source_text[..range.end].char_indices().last().map(|(i,_)| i)) to compute
the safe end_position, keeping references to source_text, starts, and the call
to line_and_column unchanged.
In
`@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md`:
- Around line 88-89: Replace the first occurrence of the acronym "MCP" with its
expanded form (e.g., "Model Clarification Protocol (MCP)" or the correct
expanded term used in this project) in the document so the acronym is defined on
first use, then keep using "MCP" for subsequent mentions; update the sentence
containing "Firecrawl MCP" to read "Firecrawl [Expanded Form] (MCP)" and leave
later references as "MCP".
- Line 141: Change the documentation headings from title case/ampersand form to
sentence case—e.g., rename "Surprises & Discoveries" to "Surprises and
discoveries"—and apply the same conversion to the other flagged headings in this
markdown (replace "&" with "and" where appropriate and only capitalize the first
word and proper nouns). Locate the headings in the file (e.g., the "Surprises &
Discoveries" header and the other flagged headers) and update their text to
sentence case to match the doc style guide.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9f9754b3-0f45-4f24-9391-96f8121586d0
📒 Files selected for processing (10)
crates/whitaker_clones_core/src/run0/emit.rscrates/whitaker_clones_core/src/run0/mod.rscrates/whitaker_clones_core/src/run0/span.rscrates/whitaker_clones_core/src/run0/tests.rscrates/whitaker_clones_core/src/run0/tests_emit.rscrates/whitaker_clones_core/src/run0/types.rscrates/whitaker_clones_core/tests/run0_sarif_behaviour.rsdocs/adr-002-dylint-expect-attribute-macro.mddocs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.mddocs/whitaker-clone-detector-design.md
|
@coderabbitai Have the following now been resolved? Annul any requirements that violate the en-GB-oxendict spelling (-ize / -yse / -our) conventions (for example a request to replace "normalize" with "normalise" or "artefact" with "artifact"), or where the requirement unnecessarily increases cyclomatic complexity. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 45370
|
Normalized section headers in the 7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md document for consistent styling and readability, such as changing '&' to 'and' and standardizing capitalization across sections like 'Surprises and discoveries' and 'Decision log'. This improves documentation clarity and consistency. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
crates/whitaker_clones_core/src/run0/tests.rs (1)
40-43:⚠️ Potential issue | 🟡 MinorFormat placeholders in
.expect()message are not interpolated.Same issue as in
tests_emit.rs: the{left}and{right}placeholders in the.expect()string at line 42 will not be substituted with variable values.Proposed fix (static message)
fn pair(left: &str, right: &str) -> CandidatePair { CandidatePair::new(FragmentId::from(left), FragmentId::from(right)) - .expect("pair `{left}` and `{right}` must be distinct") + .expect("pair must be distinct") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/whitaker_clones_core/src/run0/tests.rs` around lines 40 - 43, The .expect() message in the pair function isn't interpolating {left} and {right}; update the call around CandidatePair::new in function pair to build the message with formatting (e.g., use format! or .unwrap_or_else with a formatted panic message) so the actual left and right values appear (e.g., "pair `{}` and `{}` must be distinct", left, right).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/whitaker_clones_core/src/run0/score.rs`:
- Around line 180-183: The comparison in meets_threshold (using SimilarityRatio
{ intersection, union } and SimilarityThreshold { numerator, denominator })
risks overflow but currently uses saturating_mul; add a brief comment above the
meets_threshold function documenting the assumption that fingerprint counts
(intersection and union) are many orders of magnitude smaller than usize::MAX in
practical use, explain why saturating_mul is used to avoid panics on
pathological inputs, and note that this choice may produce clamped results only
in extreme/unrealistic cases so callers should validate counts if they may
approach usize::MAX.
In `@crates/whitaker_clones_core/src/run0/tests_emit.rs`:
- Around line 40-43: The expect message in function pair uses `{left}`/`{right}`
placeholders which won't be interpolated; change the call on CandidatePair::new
to either keep a static literal message or use unwrap_or_else with panic! to
include the runtime values (e.g., replace .expect("pair `{left}` and `{right}`
must be distinct") with an unwrap_or_else that calls panic! with formatted
left/right or a static .expect message), referencing the pair function and
CandidatePair::new to locate the edit.
In `@crates/whitaker_clones_core/src/run0/tests.rs`:
- Around line 45-47: The test helper config() currently hard-codes the version
"0.2.1"; update it to use the crate's package version macro like tests_emit.rs
does by replacing the literal in TokenPassConfig::new(...) with
env!("CARGO_PKG_VERSION") so config() returns
TokenPassConfig::new("whitaker_clones_cli@token", env!("CARGO_PKG_VERSION"))
ensuring both tests read the crate metadata version.
In
`@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md`:
- Around line 26-27: Add a comma after "WHK001 results" in the sentence that
reads "Accepted Type-1 pairs produce `WHK001` results and accepted Type-2 pairs
produce `WHK002` results" so it becomes "Accepted Type-1 pairs produce `WHK001`
results, and accepted Type-2 pairs produce `WHK002` results"; update that
sentence in the document (look for the exact quoted string) to include the comma
for correct grammar.
---
Duplicate comments:
In `@crates/whitaker_clones_core/src/run0/tests.rs`:
- Around line 40-43: The .expect() message in the pair function isn't
interpolating {left} and {right}; update the call around CandidatePair::new in
function pair to build the message with formatting (e.g., use format! or
.unwrap_or_else with a formatted panic message) so the actual left and right
values appear (e.g., "pair `{}` and `{}` must be distinct", left, right).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 34f86f28-7c12-4c34-b944-6759c9f6c04a
📒 Files selected for processing (6)
crates/whitaker_clones_core/src/run0/emit.rscrates/whitaker_clones_core/src/run0/score.rscrates/whitaker_clones_core/src/run0/span.rscrates/whitaker_clones_core/src/run0/tests.rscrates/whitaker_clones_core/src/run0/tests_emit.rsdocs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md
…safety - Introduce `meets_threshold` function to compare Jaccard similarity using cross multiplication - Use saturating multiplication to avoid panics on pathological large inputs - Document assumptions on fingerprint counts and guidance for callers Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/whitaker_clones_core/src/run0/tests_emit.rs`:
- Around line 12-47: Extract the duplicated helpers (fingerprint, FragmentInput,
fragment, pair, config) into a single shared test_helpers module and have both
test modules import them; specifically, move the functions fingerprint,
fragment, pair, config and the FragmentInput struct into a new test_helpers
module with pub(super) visibility for each symbol, update the test modules to
use super::test_helpers::{fingerprint, FragmentInput, fragment, pair, config},
and alternatively consider converting shared setup into rstest fixtures if
preferred to follow the project guideline (ensure names match exactly so
existing calls like TokenFragment::new, FragmentId::from, Fingerprint::new and
TokenPassConfig::new continue to resolve).
In
`@docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md`:
- Line 29: The doc mixes SARIF naming styles: replace the snake_case
`partial_fingerprints` with the SARIF camelCase `partialFingerprints` to match
`relatedLocations` and the usage at line 231; update any mentions of
`partial_fingerprints` in this file (e.g., in the sentence listing
`relatedLocations`, `partial_fingerprints`, and Whitaker `properties`) to
`partialFingerprints` so all SARIF fields use consistent camelCase.
- Around line 460-463: Replace the snake_case SARIF field `partial_fingerprints`
with the camelCase `partialFingerprints` to match the SARIF spec and the other
keys in this doc; update the surrounding examples/descriptions that list
`whitakerFragment` and `tokenHash` so they appear as properties under
`partialFingerprints` (keeping `whitakerFragment` and `tokenHash` names as-is)
and ensure any explanatory text references `partialFingerprints` instead of
`partial_fingerprints`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6b0b2d1e-bffe-48f4-a5a0-001fd8eb1a1d
📒 Files selected for processing (4)
crates/whitaker_clones_core/src/run0/score.rscrates/whitaker_clones_core/src/run0/tests.rscrates/whitaker_clones_core/src/run0/tests_emit.rsdocs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md
- Move common test helper structs and functions from tests.rs and tests_emit.rs into a new test_helpers.rs module under run0. - Update mod.rs to include test_helpers module conditionally for tests. - Import helpers from test_helpers in tests.rs and tests_emit.rs, reducing code duplication and improving test maintainability. - Minor docs update to fix casing of emitted SARIF fields (partialFingerprints). This refactoring improves code organization for run0 acceptance and emission tests. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
crates/whitaker_clones_core/withaccept_candidate_pairsandemit_run0, plus unit tests and BDD coverage, and updates docs and roadmap accordingly.What’s included
crates/whitaker_clones_core/src/run0/containing:emit.rs(SARIF emission logic)error.rs(typed Run 0 errors)mod.rs(module wiring)score.rs(integer-based Jaccard scoring and thresholds)span.rs(byte-range to SARIF region conversion)types.rs(public input/output types)tests.rs(unit tests for Run 0 logic)crates/whitaker_clones_core/src/lib.rsfor Run 0:AcceptedPair,Run0Error,Run0Result,SimilarityRatio,SimilarityThreshold,TokenFragment,TokenPassConfig,accept_candidate_pairs,emit_run0.crates/whitaker_clones_core/src/run0_sarif.feature(BDD feature file)crates/whitaker_clones_core/src/run0_sarif_behaviour.rs(BDD harness)crates/whitaker_clones_core/tests/features/run0_sarif.feature(feature area)crates/whitaker_clones_core/tests/run0_sarif_behaviour.rs(behaviour tests)docs/execplans/7-2-3-emit-sarif-run-0-for-accepted-type-1-and-type-2-pairs.md(status COMPLETE)docs/roadmap.mdupdated to mark 7.2.3 as donedocs/whitaker-clone-detector-design.mdupdated with dedicated 7.2.3 decisions (in-file section)Rationale
run0module reuses existing SARIF tooling (whitaker_sarif) and keeps emission logic encapsulated withinwhitaker_clones_coreas planned, enabling deterministic results and stable regions for downstream tooling.Implementation plan (high level)
## Implementation decisions (7.2.3)) (completed)docs/roadmap.md(completed)How to review
Documentation impact
Acceptance criteria
whitaker_sariftooling and carries stable fingerprints and 1-based SARIF regions.📎 Task: https://www.devboxer.com/task/81bbb2d8-2581-49d3-a21b-cba40d1d3c53
📎 Task: https://www.devboxer.com/task/3515b49a-8747-49b6-80b3-5e2575dce6dd
Summary by Sourcery
Implement token-pass Run 0 SARIF emission for accepted clone pairs and expose a public API for acceptance and emission in whitaker_clones_core.
New Features:
Enhancements:
Documentation:
Tests: