Skip to content

[lexer] Simplify GetLineOffsets#2735

Merged
zherczeg merged 1 commit intoWebAssembly:mainfrom
zherczeg:rework_line_scan
Apr 3, 2026
Merged

[lexer] Simplify GetLineOffsets#2735
zherczeg merged 1 commit intoWebAssembly:mainfrom
zherczeg:rework_line_scan

Conversation

@zherczeg
Copy link
Copy Markdown
Collaborator

@zherczeg zherczeg commented Apr 1, 2026

The original code copies the webassembly source code into a 64K buffer, then searches the newline there. The buffer is refilled when the search reaches its end. The new code does not allocate any 64K buffer, just search the newline directly in the source code.

I suspect the old code assumed that the source code is not fully loaded, but then the lexer could not process it.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 1, 2026

I have also noticed that utf characters are not recognized, so a 3 byte long utf character increases the offset by 3.

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Apr 1, 2026

What is the benefit of coverting the uint32_t here? Is the unsigned part important? Or is the 32-bit part important? Or both? (i.e. why not just use uint)?

Maybe make that change separately if you think its worth while?

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 1, 2026

Negative lines and columns have no meaning. Using unsigned doubles the range. I can split the patch if it is really needed.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 1, 2026

The 32 bit part is not important, unsigned int is too long for me :)

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Apr 1, 2026

The 32 bit part is not important, unsigned int is too long for me :)

How about we just use of Offset then?

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 1, 2026

using Offset = size_t;
Offset is 8 byte long on the common 64 bit systems, and that would increase the Loc struct from 16 to 24 bytes. I don't think it is worth it.

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Apr 1, 2026

using Offset = size_t; Offset is 8 byte long on the common 64 bit systems, and that would increase the Loc struct from 16 to 24 bytes. I don't think it is worth it.

But shouldn't we focus on correctness here? Does anyone actually care about the sizeof Location in practice? (i.e. are there wabt uses who would notice?).

You wast to make a new Offset32 type? Or FileOffset maybe? Maybe we should also assert that no file we read is larger than 4gb in that case? Since otherwise the offsets would wrap?

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 1, 2026

I was born in an era when programs fit into 640 KB RAM, and just cannot stop optimizing code. Anyway, just tell me what to do, and I will do that.

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Apr 1, 2026

Can you split out the type change from the actually GetLineOffsets meat of this change (i.e. make this change more focused).

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 2, 2026

Removed the unsigned part.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 2, 2026

Is this patch ok this way?

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Apr 2, 2026

Could you perhaps update the PR description with a little more context about what this change is actually doing? I don't know this part of the code very well.

The new code directly scans the input buffer.
@zherczeg
Copy link
Copy Markdown
Collaborator Author

zherczeg commented Apr 3, 2026

Patch and description is updated.

@sbc100 sbc100 changed the title Simplify GetLineOffsets [lexer] Simplify GetLineOffsets Apr 3, 2026
@zherczeg zherczeg merged commit ec3eac8 into WebAssembly:main Apr 3, 2026
17 checks passed
@zherczeg zherczeg deleted the rework_line_scan branch April 3, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants