Skip to content

Add new load-test command#281

Open
hannahbast wants to merge 2 commits intomainfrom
load-test-command
Open

Add new load-test command#281
hannahbast wants to merge 2 commits intomainfrom
load-test-command

Conversation

@hannahbast
Copy link
Copy Markdown
Collaborator

This new command launches queries in parallel against the specified endpoint, and shows a statistics about the response times, success rates, and so on. There are options --request-rate (how many queries per second to send), --duration (how long to run the test), and --log-frequency (with what frequency to print the statistics).

Queries can be read from a TSV or YAML file (same format as for the benchmark-queries command), or from a directory that contains one JSON file per query (format used for our wikidata-query-log dataset). The option --query-selection determines how queries are selected (random, which is the default, or cyclic).

This new command launches queries in parallel against the specified
endpoint, and shows a statistics about the response times, success
rates, and so on. There are options `--request-rate` (how many queries per
second to send), `--duration` (how long to run the test), and
`--log-frequency` (with what frequency to print the statistics).

Queries can be read from a TSV or YAML file (same format as for the
`benchmark-queries` command), or from a directory that contains one JSON
file per query (format used for our wikidata-query-log dataset). The
option `--query-selection` determines how queries are selected
(`random`, which is the default, or `cyclic`).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new load-test CLI command to generate concurrent SPARQL traffic against a configured endpoint and periodically report latency/error statistics. This extends the existing qlever command suite with a purpose-built load generator that can reuse existing query-file formats.

Changes:

  • Introduces LoadTestCommand with options for request rate, duration, log frequency, endpoint selection, and query selection strategy.
  • Adds a unified QuerySource abstraction to load queries from TSV, YAML, or a directory of JSON query logs.
  • Implements periodic table-style status logging plus a final HTTP status code histogram.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to +201
subparser.add_argument(
"--query-selection",
type=str,
choices=["random", "cycle"],
default="random",
help="How to pick the next query (default: random)",
)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--query-selection is documented in the PR description as supporting cyclic, but the CLI currently exposes cycle (and the QuerySource.get_query docstring also says "cycle"). To avoid user confusion / broken scripts, align the option name with the PR description (e.g., accept cyclic as an alias, or rename the choice and update help/docstrings accordingly).

Copilot uses AI. Check for mistakes.
Comment on lines +389 to +393
try:
start_time = time.time()
next_log_time = start_time + log_frequency_secs
interval = 1.0 / args.request_rate
next_launch_time = start_time + interval
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval = 1.0 / args.request_rate will raise ZeroDivisionError for --request-rate 0 and behaves oddly for negative values. Also --log-frequency 0s will make next_log_time never advance, causing a tight loop that logs continuously. Consider validating request_rate > 0 and log_frequency_secs > 0 (and possibly duration_secs > 0 if you want to disallow no-op runs) and return a clear error message when invalid.

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +340
try:
result = subprocess.run(
curl_cmd, shell=True, capture_output=True,
text=True, timeout=max(duration_secs * 2, 300),
)
elapsed = time.time() - start
http_code = result.stdout.strip()
with lock:
num_done += 1
completed_times.append(elapsed)
status_codes[http_code] = \
status_codes.get(http_code, 0) + 1
if http_code != "200":
num_errors += 1
except Exception:
elapsed = time.time() - start
with lock:
num_done += 1
num_errors += 1
completed_times.append(elapsed)
status_codes["timeout"] = \
status_codes.get("timeout", 0) + 1
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_query catches all exceptions and records them as timeout, which will misclassify failures like curl not being installed, DNS errors, or other subprocess issues. Catch subprocess.TimeoutExpired explicitly for timeouts, and handle other exceptions separately (e.g., record an exception/failed_to_run bucket and log at debug level).

Copilot uses AI. Check for mistakes.
Comment on lines +404 to +410
thread = threading.Thread(
target=send_query, args=(query,), daemon=True,
)
thread.start()
with lock:
num_launched += 1
next_launch_time += interval
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threads are started as daemon=True and the code never waits for in-flight requests to finish. This makes the final Done/Running counts and timing stats potentially incorrect, and the process may exit while curl subprocesses are still running. Track created threads (or use a bounded executor) and join them (optionally with a grace period) before printing the final status/histogram.

Copilot uses AI. Check for mistakes.
Comment on lines +407 to +409
thread.start()
with lock:
num_launched += 1
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race where num_done can be incremented by the worker thread before num_launched is incremented (because thread.start() happens before num_launched += 1). That can temporarily make Running = Launched - Done negative and distort intermediate stats. Increment num_launched under the lock before starting the thread (or otherwise ensure launched is recorded before the worker can finish).

Suggested change
thread.start()
with lock:
num_launched += 1
with lock:
num_launched += 1
thread.start()

Copilot uses AI. Check for mistakes.
Comment on lines +401 to +410
# Launch queries that are due.
while next_launch_time <= now:
query = source.get_query(args.query_selection)
thread = threading.Thread(
target=send_query, args=(query,), daemon=True,
)
thread.start()
with lock:
num_launched += 1
next_launch_time += interval
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Launching one new threading.Thread per request with no upper bound can exhaust OS resources at higher --request-rate or when the endpoint slows down (threads accumulate). A more robust approach is to use a bounded worker pool (e.g., ThreadPoolExecutor with a max size) plus a queue, or an async HTTP client, and apply backpressure when the system is saturated.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +38
def parse_duration(s: str) -> float:
"""
Parse a duration string like "10s", "34min", "2h", "1.5h", "90" (seconds
by default) and return the duration in seconds.
"""
s = s.strip()
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(s|sec|min|m|h|hr)?", s)
if not match:
raise ValueError(f"Cannot parse duration: \"{s}\" (examples: "
f"\"10s\", \"34min\", \"2h\", \"90\")")
value = float(match.group(1))
unit = match.group(2) or "s"
if unit in ("s", "sec"):
return value
elif unit in ("min", "m"):
return value * 60
elif unit in ("h", "hr"):
return value * 3600
else:
raise ValueError(f"Unknown duration unit: \"{unit}\"")
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file introduces non-trivial parsing/selection logic (parse_duration, format_duration, and QuerySource parsing/selection) plus a new command entrypoint. The repository has command-level pytest coverage (e.g., test/qlever/commands/*), but there are currently no tests for this new load-test command and its helpers. Please add unit tests for duration parsing/formatting edge cases and for QuerySource behavior across TSV/YAML/dir modes (including random vs cycle).

Copilot uses AI. Check for mistakes.
`--queries-dir` replaced by `--queries-jsonl` -> reads a single JSONL file instead of a directory of JSON files; all three formats (TSV, YAML, JSONL) now fully read queries into memory upfront
`--duration` replaced by `--num-queries` (default 1000) -> specify how many queries to send, not how long to run
`--query-selection` added -> `random` (default) or `cycle`
`--show-queries` added -> print full queries in purple and exit
`--show-error-messages` added -> show query and error message for failed queries

Refactor code and various minor improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants