Skip to content

fix: overlapping#227

Merged
afham123 merged 6 commits into
americanexpress:mainfrom
afham123:fix/overlapping
May 22, 2026
Merged

fix: overlapping#227
afham123 merged 6 commits into
americanexpress:mainfrom
afham123:fix/overlapping

Conversation

@afham123
Copy link
Copy Markdown
Contributor

@afham123 afham123 commented May 11, 2026

Issues

  • As correctly pointed out by @BearAlliance "In practice the 'bridge' was not equivalent to prevTail + curHead, it effectively dropped the previous tail." Toggle is adding the overlapping bridge alternately.
  • Initialising tmpString inside the loop does not preserves its value across iterations, instead loosing its value with iteration.
  • Few redundant if conditions were there.
  • Few cases we are slicing a string by byte positions, this can break non ascii character words containing char like ñ or emoji. The code only works reliably with text is made of simple one-byte characters, like plain English letters, digits, and punctuation.
s := "añ🌍"
len(s)          // 8  (a=1, ñ=2, 🌍=4 bytes)
len([]rune(s))  // 3  (3 characters)

Improved overlapping logic:

  • Adding a consistent bridge overlapping.
  • Preallocating the result slice to its final size, reducing repeated reallocations during each loop iteration.
  • Removing redundant if else block and adding a clean simplified logic.
  • Adding the rune in the bridge making also.
  • Merging the two loop into just one, reducing the time-complexity from 2n to just simply n, would be useful in case of a big minified file.

Before

Screenshot 2026-05-12 at 1 16 42 PM
  • Please note that overlapping got corrupt.

After

Screenshot 2026-05-12 at 1 17 17 PM

@BearAlliance
Copy link
Copy Markdown

This is only a partial fix. It still uses the old toggle scheme, so it only builds overlap bridges for alternating chunks. With chunks A, B, C, D, it produces bridges for A+B and C+D. It still misses B+C`.

See my comment #223 (comment)

@afham123
Copy link
Copy Markdown
Contributor Author

This is only a partial fix. It still uses the old toggle scheme, so it only builds overlap bridges for alternating chunks. With chunks A, B, C, D, it produces bridges for A+B and C+D. It still misses B+C`.

See my comment #223 (comment)

Yes, make sense let me add your change, thanks for pointing out, I didn't think through.

@BearAlliance
Copy link
Copy Markdown

If you're going to move ahead with this one, you should probably also copy the tests from my PR.

Comment thread pkg/scan/scan.go
@@ -408,47 +408,40 @@ func splitJob(inJob WorkJob, worklength int) (work []WorkJob) {
// splitSubN Create the overlap string when splitting long strings
func splitSubN(s string, n int) []string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test case for this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case added.

@afham123 afham123 merged commit 573753e into americanexpress:main May 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants