[lexer] Simplify GetLineOffsets by zherczeg · Pull Request #2735 · WebAssembly/wabt

zherczeg · 2026-04-01T16:58:03Z

The original code copies the webassembly source code into a 64K buffer, then searches the newline there. The buffer is refilled when the search reaches its end. The new code does not allocate any 64K buffer, just search the newline directly in the source code.

I suspect the old code assumed that the source code is not fully loaded, but then the lexer could not process it.

zherczeg · 2026-04-01T17:01:11Z

I have also noticed that utf characters are not recognized, so a 3 byte long utf character increases the offset by 3.

sbc100 · 2026-04-01T17:05:17Z

What is the benefit of coverting the uint32_t here? Is the unsigned part important? Or is the 32-bit part important? Or both? (i.e. why not just use uint)?

Maybe make that change separately if you think its worth while?

zherczeg · 2026-04-01T17:11:49Z

Negative lines and columns have no meaning. Using unsigned doubles the range. I can split the patch if it is really needed.

zherczeg · 2026-04-01T17:13:57Z

The 32 bit part is not important, unsigned int is too long for me :)

sbc100 · 2026-04-01T17:36:43Z

The 32 bit part is not important, unsigned int is too long for me :)

How about we just use of Offset then?

zherczeg · 2026-04-01T17:40:10Z

using Offset = size_t;
Offset is 8 byte long on the common 64 bit systems, and that would increase the Loc struct from 16 to 24 bytes. I don't think it is worth it.

sbc100 · 2026-04-01T17:43:33Z

using Offset = size_t; Offset is 8 byte long on the common 64 bit systems, and that would increase the Loc struct from 16 to 24 bytes. I don't think it is worth it.

But shouldn't we focus on correctness here? Does anyone actually care about the sizeof Location in practice? (i.e. are there wabt uses who would notice?).

You wast to make a new Offset32 type? Or FileOffset maybe? Maybe we should also assert that no file we read is larger than 4gb in that case? Since otherwise the offsets would wrap?

zherczeg · 2026-04-01T17:47:57Z

I was born in an era when programs fit into 640 KB RAM, and just cannot stop optimizing code. Anyway, just tell me what to do, and I will do that.

sbc100 · 2026-04-01T17:53:43Z

Can you split out the type change from the actually GetLineOffsets meat of this change (i.e. make this change more focused).

zherczeg · 2026-04-02T02:38:37Z

Removed the unsigned part.

zherczeg · 2026-04-02T17:19:46Z

Is this patch ok this way?

sbc100 · 2026-04-02T17:22:01Z

Could you perhaps update the PR description with a little more context about what this change is actually doing? I don't know this part of the code very well.

The new code directly scans the input buffer.

zherczeg · 2026-04-03T04:11:04Z

Patch and description is updated.

zherczeg force-pushed the rework_line_scan branch from 3089d34 to 03a4d75 Compare April 1, 2026 17:00

zherczeg force-pushed the rework_line_scan branch from 03a4d75 to 279fd07 Compare April 1, 2026 17:36

zherczeg force-pushed the rework_line_scan branch from 279fd07 to 84bbf67 Compare April 2, 2026 02:37

zherczeg mentioned this pull request Apr 2, 2026

Add declaration limit checks to parser #2736

Merged

Simplify source line finding

131730a

The new code directly scans the input buffer.

zherczeg force-pushed the rework_line_scan branch from 84bbf67 to 131730a Compare April 3, 2026 03:15

sbc100 changed the title ~~Simplify GetLineOffsets~~ [lexer] Simplify GetLineOffsets Apr 3, 2026

sbc100 approved these changes Apr 3, 2026

View reviewed changes

zherczeg merged commit ec3eac8 into WebAssembly:main Apr 3, 2026
17 checks passed

zherczeg deleted the rework_line_scan branch April 3, 2026 17:37

Conversation

zherczeg commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zherczeg commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 commented Apr 1, 2026

Uh oh!

zherczeg commented Apr 1, 2026

Uh oh!

zherczeg commented Apr 1, 2026

Uh oh!

sbc100 commented Apr 1, 2026

Uh oh!

zherczeg commented Apr 1, 2026

Uh oh!

sbc100 commented Apr 1, 2026

Uh oh!

zherczeg commented Apr 1, 2026

Uh oh!

sbc100 commented Apr 1, 2026

Uh oh!

zherczeg commented Apr 2, 2026

Uh oh!

zherczeg commented Apr 2, 2026

Uh oh!

sbc100 commented Apr 2, 2026

Uh oh!

zherczeg commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zherczeg commented Apr 1, 2026 •

edited

Loading

zherczeg commented Apr 1, 2026 •

edited

Loading