Skip to content

sqlite: optimize column name creation and text value encoding#61954

Open
thisalihassan wants to merge 3 commits intonodejs:mainfrom
thisalihassan:sqlite-one-byte-encoding
Open

sqlite: optimize column name creation and text value encoding#61954
thisalihassan wants to merge 3 commits intonodejs:mainfrom
thisalihassan:sqlite-one-byte-encoding

Conversation

@thisalihassan
Copy link
Contributor

@thisalihassan thisalihassan commented Feb 23, 2026

Skip the full UTF-8 decode path for text values that are pure ASCII by validating with simdutf and creating OneByte V8 strings, halving memory.Internalize column name strings so V8 shares hidden classes across row objects. Cache column names in the iterate() hot loop, invalidating on schema changes via SQLITE_STMTSTATUS_REPREPARE.

Benchmark: 30-run
sqlite-benchmark

Refs: nodejs/performance#181

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/sqlite

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. sqlite Issues and PRs related to the SQLite subsystem. labels Feb 23, 2026
@thisalihassan thisalihassan force-pushed the sqlite-one-byte-encoding branch 2 times, most recently from ff6a788 to 4fcb1ed Compare February 23, 2026 15:05
@thisalihassan
Copy link
Contributor Author

thisalihassan commented Feb 23, 2026

I didn't push the changes for sqlite-prepare-select-all-options and sqlite-prepare-select-all where I tested the iterate method because it slows down the benchmark by a lot.
Adding iterate nearly ~2.3x the total benchmark runtime from ~40 minutes to ~90 minutes, but if needed I can push that as well
Benchmark with iterate: 30 runs
sqlite-benchmark_all_2

@codecov
Copy link

codecov bot commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 84.78261% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.65%. Comparing base (2422ed8) to head (3d6d7ff).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/node_sqlite.cc 84.78% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #61954   +/-   ##
=======================================
  Coverage   89.65%   89.65%           
=======================================
  Files         676      676           
  Lines      206325   206363   +38     
  Branches    39520    39520           
=======================================
+ Hits       184971   185008   +37     
- Misses      13477    13481    +4     
+ Partials     7877     7874    -3     
Files with missing lines Coverage Δ
src/node_sqlite.h 80.64% <ø> (ø)
src/node_sqlite.cc 80.85% <84.78%> (+0.04%) ⬆️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ronag ronag requested review from addaleax, anonrig and geeksilva97 and removed request for geeksilva97 February 24, 2026 09:45
Comment on lines +60 to +61
const char* data,
size_t length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just take a std::string_view input as an argument rather than this C API like function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

NewStringType::kNormal,
len);
}
return String::NewFromUtf8(isolate, data, NewStringType::kNormal, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use kInternalized here and in line 66?

Copy link
Contributor Author

@thisalihassan thisalihassan Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utf8StringMaybeOneByte is used for text cell values (row data), not column names, text values are typically unique user data ("Ali Hassan", "ali.n@example.com", etc.) so the lookup would almost always miss

Column names are already interned separately in ColumnNameToName

if (!iter->stmt_->ColumnNameToName(i).ToLocal(&key)) return;
row_keys.emplace_back(key);
}
if (!iter->stmt_->GetCachedColumnNames(&row_keys)) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to here?

.As<Name>();
}

bool StatementSync::GetCachedColumnNames(LocalVector<Name>* keys) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to this function?

@thisalihassan thisalihassan force-pushed the sqlite-one-byte-encoding branch from ad603d4 to fd4215d Compare February 24, 2026 16:49
@thisalihassan
Copy link
Contributor Author

Hi @addaleax @geeksilva97 can I please get review on this PR when you get time? Thanks

Isolate* isolate = env()->isolate();

int reprepare_count =
sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, 0);
sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, false);

bool StatementSync::GetCachedColumnNames(LocalVector<Name>* keys) {
Isolate* isolate = env()->isolate();

int reprepare_count =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int reprepare_count =
const int reprepare_count =

sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, 0);
if (reprepare_count != cached_column_names_reprepare_count_) {
cached_column_names_.clear();
int num_cols = sqlite3_column_count(statement_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int num_cols = sqlite3_column_count(statement_);
const int num_cols = sqlite3_column_count(statement_);

const char* v = \
reinterpret_cast<const char*>(sqlite3_##from##_text(__VA_ARGS__)); \
(result) = String::NewFromUtf8((isolate), v).As<Value>(); \
int v_len = sqlite3_##from##_bytes(__VA_ARGS__); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int v_len = sqlite3_##from##_bytes(__VA_ARGS__); \
const int v_len = sqlite3_##from##_bytes(__VA_ARGS__); \


inline MaybeLocal<String> Utf8StringMaybeOneByte(Isolate* isolate,
std::string_view input) {
int len = static_cast<int>(input.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int len = static_cast<int>(input.size());
const int len = static_cast<int>(input.size());

void StatementSync::Finalize() {
sqlite3_finalize(statement_);
statement_ = nullptr;
cached_column_names_.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider putting this in a method.

.As<Name>();
}

// Returns cached internalized column name strings for this statement,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is incorrect, it returns a boolean.

bool allow_unknown_named_params_;
std::optional<std::map<std::string, std::string>> bare_named_params_;
std::vector<v8::Global<v8::Name>> cached_column_names_;
int cached_column_names_reprepare_count_ = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using std::optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @louwers but I benchmarked on the iterate path (30 runs) saw a consistent -1% to -2% regression after making these changes:

sqlite/sqlite-prepare-select-all.js statement='SELECT text_column FROM foo LIMIT 1' method='iterate' tableSeedSize=100000 n=100000 *** -2.06 %
sqlite/sqlite-prepare-select-all.js statement='SELECT text_column FROM foo LIMIT 100' method='iterate' tableSeedSize=100000 n=100000 *** -1.42 %

The reason might be std::optional comparison generates more instructions per row than just int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Not worth it then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's confirmed, but very interesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @louwers should we merge this PR?

Use simdutf to detect ASCII text values and create them
via NewFromOneByte for compact one-byte representation.
Internalize column name strings with kInternalized so V8
shares hidden classes across row objects. Cache column
names on StatementSync for iterate(), invalidated via
SQLITE_STMTSTATUS_REPREPARE on schema changes.

Refs: nodejs/performance#181
@thisalihassan thisalihassan force-pushed the sqlite-one-byte-encoding branch from fd4215d to 3d6d7ff Compare March 3, 2026 14:06
@thisalihassan
Copy link
Contributor Author

thisalihassan commented Mar 3, 2026

Ahh I just rebased as there were conflicts now a test is failing seems like it was falky test, I don't think this test is related to my changes

Failed tests:
out/Release/node /home/runner/work/_temp/node-v26.0.0-nightly2026-03-0367f9f5895a-slim/test/parallel/test-timers-immediate-queue.js
make[1]: *** [Makefile:619: test-ci] Error 1
make: *** [Makefile:647: run-ci] Error 2
make: Leaving directory '/home/runner/work/_temp/node-v26.0.0-nightly2026-03-0367f9f5895a-slim'

@thisalihassan thisalihassan requested a review from louwers March 3, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. sqlite Issues and PRs related to the SQLite subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants