libpcp_web, pmproxy: support for optional labels#2591
Conversation
Added support for optional labels to be ingested into a key-value server by pmproxy. SHA1 hashes calculations are not affected. Added qa test 1744 to test optional label ingestion and display.
📝 WalkthroughWalkthroughThis PR introduces optional label support for pmseries metrics by refactoring label handling in libpcp_web, fixing a key server detection bug in QA infrastructure, and adding a comprehensive QA test to validate the label storage behavior. ChangesLabel Storage and QA Validation
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
qa/openmetrics/samples/pmseries_label_test.txt (1)
1-9:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winChange
# Typeto# TYPEper the OpenMetrics/Prometheus exposition spec.The PMDA parser uses case-sensitive matching for the TYPE directive (line 798 of pmdaopenmetrics.python:
elif lp[1] == 'TYPE':). Using# Typeinstead of# TYPEcauses the parser to silently ignore the type metadata, leaving the metric without a declared type. This impacts proper type handling in the archive.Suggested fix
-# Type test_metric1 gauge +# TYPE test_metric1 gauge @@ -# Type test_metric2 gauge +# TYPE test_metric2 gauge🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qa/openmetrics/samples/pmseries_label_test.txt` around lines 1 - 9, The sample exposition uses lowercase "# Type" which the PMDA parser's case-sensitive check (see the token handling that inspects lp[1] == 'TYPE' in pmdaopenmetrics.python) will ignore; update the sample lines in pmseries_label_test.txt to use uppercase "# TYPE" for each metric definition (e.g., change "# Type test_metric1 gauge" and "# Type test_metric2 gauge" to "# TYPE test_metric1 gauge" and "# TYPE test_metric2 gauge") so the parser correctly recognizes and applies metric type metadata.
🧹 Nitpick comments (4)
src/libpcp_web/src/util.c (2)
460-474: 💤 Low value
idlabelsis recomputed on every call, even whencontext->labelsis cached.Previously the
pmMergeLabelSets/sds construction was guarded bycontext->labels == NULL. Nowcontext_labels_str_identifying()runs unconditionally on every invocation ofpmwebapi_context_hash(). This is fine functionally (the SHA1 input is identical to the prior identifying-only JSON), but it does mean repeated label-set merges if this is ever called more than once per context. If callers may invoke this on a hot path, consider caching the identifying-label digest input alongsidecontext->labels, or hoisting the computation to be guarded by an identity-cache flag.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libpcp_web/src/util.c` around lines 460 - 474, pmwebapi_context_hash currently always calls context_labels_str_identifying() causing repeated merges; change it to compute and cache the identifying-label string only when it is first needed (same condition used for caching context->labels) by adding a cached sds (e.g. context->idlabels or similar) or an identity-cache flag on the context struct, populate that cache inside the block where context->labels is set (or guarded by context->labels == NULL), and then pass the cached identifying string and length to pmwebapi_source_hash instead of recomputing each call; ensure the new cache is allocated with sdsnewlen and cleaned up where context->labels is freed.
487-513: 💤 Low valueDead
lenassignment on theidbuffallback path.
idbufis consumed via%sinsdscatfmt(null-terminated), so thelen = pmsprintf(idbuf, sizeof(idbuf), "null")reassignment is never read. Either drop the assignment or rename the second variable to keep the intent obvious thatlenonly tracks the cachedmetric->labelslength. Also note that the same pattern is duplicated inpmwebapi_instance_hash()(lines 548-550) — extracting a small helper that returns an "identifying labels" C-string would remove the duplication acrosscontext_hash,metric_hash, andinstance_hash.♻️ Sketch of minimal cleanup
- len = metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL); - if (len <= 0) - len = pmsprintf(idbuf, sizeof(idbuf), "null"); + if (metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL) <= 0) + (void)pmsprintf(idbuf, sizeof(idbuf), "null");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libpcp_web/src/util.c` around lines 487 - 513, The second pmsprintf assigns to the same local `len` but that value is never used (idbuf is consumed as a C-string), so either remove the redundant `len = pmsprintf(idbuf, ...)` or use a separate variable name to make the intent clear; better yet, extract a small helper (e.g. get_identifying_labels or similar) that encapsulates the logic around `metric_labelsets(...)` / fallback `pmsprintf("null")` and returns the C-string to use in sdscatfmt, then call that helper from context_hash/metric_hash/instance_hash (and from pmwebapi_instance_hash) to eliminate duplication and avoid clobbering `len`.qa/1744 (2)
27-161: 💤 Low valueInconsistent use of
$seq.fullvs$seq_full.The script mixes
$here/$seq.full(lines 27-35, 82, 143, 144, 147, 150) with the$seq_fullshell variable (lines 133, 161, 190-193). Most other PCP QA tests pick one form consistently. If$seq_fullends up unset (depending on which common helper sources it), thetee -a $seq_fullon line 161 will tee to a file literally named""and thecat >>$seq_fulllines will silently lose output. Pick one form to keep the test's debug log path predictable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qa/1744` around lines 27 - 161, The test mixes two log variables ($seq.full and $seq_full) causing silent loss of output; pick one name and replace all occurrences for consistency (I suggest using $seq_full since other tests use it) — update the echo/redirects that write to "$here/$seq.full" (and any uses of $seq.full) to use "$here/$seq_full" and change the tee and any concatenations that reference $seq_full if you instead choose $seq.full; ensure all references in the key-server shutdown block, pmlogger start/stop messages, archive listings, flush/verify logs, and the pmseries load tee (symbols: _cleanup, _prepare_pmda, _pmdaopenmetrics_install, pmloggerpid, flush_result, pmseries invocation) are updated so every write uses the same variable.
124-140: ⚡ Quick winRace: archive load may run before pmlogger has actually flushed.
pmsleep 3thenkill -TERMthenpmsleep 1is a fragile way to ensure the archive is complete.kill -TERMis asynchronous and pmlogger needs time to flush its final volume — on slow/loaded CI, this can yield an incomplete archive and the laterpmseries --loadwill then store fewer/no labels, causing intermittentERROR: No series found for test_metricNfailures rather than reproducible diffs. Consider$sudo $signal -s $pmloggerpid; wait $pmloggerpid(or usepmlogger -s 3for a fixed-sample run instead of background + SIGTERM).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qa/1744` around lines 124 - 140, The test currently starts pmlogger in background (pmlogger, pmloggerpid) and uses pmsleep + kill -TERM + pmsleep which can cause a race where pmlogger hasn't flushed before pmseries --load; replace the async stop with a synchronous shutdown: after starting pmlogger (the pmlogger/pmloggerpid code) send the termination signal and then wait for pmlogger to exit (use wait on pmloggerpid) OR alternatively run pmlogger in fixed-sample mode (pmlogger -s 3) instead of background + SIGTERM so the archive is guaranteed complete before continuing; update the stop sequence that references kill -TERM, pmsleep, and pmloggerpid accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@qa/1744`:
- Around line 22-43: The cleanup function unconditionally removes
$PCP_ETC_DIR/pcp/labels/* which can delete real system files if _save_config
never ran; modify the test so that _save_config sets a guard flag (e.g.,
labels_saved=1) after it successfully completes (reference the _save_config call
and the _restore_config function) and change _cleanup to only perform the
destructive "$sudo rm -rf $PCP_ETC_DIR/pcp/labels/*" when that guard is set (or
alternatively move the rm into the same conditional block that runs the test
body), ensuring _restore_config is only relied on when a prior backup was made.
- Around line 170-187: The script sets status=1 when a series is missing (in the
checks around series1 and series2) but later unconditionally resets status=0,
masking failures; fix by preserving failure state instead of clearing it: remove
the unconditional status=0 reset (or introduce a separate boolean like
had_failure that you set to true on any "ERROR" and only set status=0 at the
very end if had_failure is false), ensure the checks in the pmseries blocks
(series1, series2) only set status=1/had_failure and never overwrite it with
success later so the exit value reflects any missing-series errors.
- Line 17: The test currently gates execution on presence of pmseries_import
(checked via the test invoking pmseries_import and calling _notrun), but the
test never calls pmseries_import; instead it uses pmseries to load archives, so
remove the pmseries_import existence check and either drop that _notrun gate
entirely or replace it with a check for pmseries (the binary actually used by
the test). Update the check that calls _notrun to reference pmseries (or remove
it) and keep the rest of the test logic unchanged.
- Around line 102-108: The FILTER line in the test config uses the label value
instead of the label name so pmdaopenmetrics (pmdaopenmetrics.python matching
logic) will never match; update the here-doc content that adds "FILTER: OPTIONAL
LABEL some_optional_label" to use the label name "optional_label" instead (i.e.,
replace some_optional_label with optional_label in the cat <<EOF block that
writes $tmp.tmp) and verify the optional-label feature is enabled (the test
comment "*** NOTE: optional labels are not implemented yet") before merging so
the test will exercise the behavior.
---
Outside diff comments:
In `@qa/openmetrics/samples/pmseries_label_test.txt`:
- Around line 1-9: The sample exposition uses lowercase "# Type" which the PMDA
parser's case-sensitive check (see the token handling that inspects lp[1] ==
'TYPE' in pmdaopenmetrics.python) will ignore; update the sample lines in
pmseries_label_test.txt to use uppercase "# TYPE" for each metric definition
(e.g., change "# Type test_metric1 gauge" and "# Type test_metric2 gauge" to "#
TYPE test_metric1 gauge" and "# TYPE test_metric2 gauge") so the parser
correctly recognizes and applies metric type metadata.
---
Nitpick comments:
In `@qa/1744`:
- Around line 27-161: The test mixes two log variables ($seq.full and $seq_full)
causing silent loss of output; pick one name and replace all occurrences for
consistency (I suggest using $seq_full since other tests use it) — update the
echo/redirects that write to "$here/$seq.full" (and any uses of $seq.full) to
use "$here/$seq_full" and change the tee and any concatenations that reference
$seq_full if you instead choose $seq.full; ensure all references in the
key-server shutdown block, pmlogger start/stop messages, archive listings,
flush/verify logs, and the pmseries load tee (symbols: _cleanup, _prepare_pmda,
_pmdaopenmetrics_install, pmloggerpid, flush_result, pmseries invocation) are
updated so every write uses the same variable.
- Around line 124-140: The test currently starts pmlogger in background
(pmlogger, pmloggerpid) and uses pmsleep + kill -TERM + pmsleep which can cause
a race where pmlogger hasn't flushed before pmseries --load; replace the async
stop with a synchronous shutdown: after starting pmlogger (the
pmlogger/pmloggerpid code) send the termination signal and then wait for
pmlogger to exit (use wait on pmloggerpid) OR alternatively run pmlogger in
fixed-sample mode (pmlogger -s 3) instead of background + SIGTERM so the archive
is guaranteed complete before continuing; update the stop sequence that
references kill -TERM, pmsleep, and pmloggerpid accordingly.
In `@src/libpcp_web/src/util.c`:
- Around line 460-474: pmwebapi_context_hash currently always calls
context_labels_str_identifying() causing repeated merges; change it to compute
and cache the identifying-label string only when it is first needed (same
condition used for caching context->labels) by adding a cached sds (e.g.
context->idlabels or similar) or an identity-cache flag on the context struct,
populate that cache inside the block where context->labels is set (or guarded by
context->labels == NULL), and then pass the cached identifying string and length
to pmwebapi_source_hash instead of recomputing each call; ensure the new cache
is allocated with sdsnewlen and cleaned up where context->labels is freed.
- Around line 487-513: The second pmsprintf assigns to the same local `len` but
that value is never used (idbuf is consumed as a C-string), so either remove the
redundant `len = pmsprintf(idbuf, ...)` or use a separate variable name to make
the intent clear; better yet, extract a small helper (e.g.
get_identifying_labels or similar) that encapsulates the logic around
`metric_labelsets(...)` / fallback `pmsprintf("null")` and returns the C-string
to use in sdscatfmt, then call that helper from
context_hash/metric_hash/instance_hash (and from pmwebapi_instance_hash) to
eliminate duplication and avoid clobbering `len`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ff2ebaba-8485-46fa-a894-b5aff961bcb3
⛔ Files ignored due to path filters (1)
qa/1744.outis excluded by!**/*.out
📒 Files selected for processing (5)
qa/1744qa/common.checkqa/groupqa/openmetrics/samples/pmseries_label_test.txtsrc/libpcp_web/src/util.c
|
|
||
| _pmdaopenmetrics_check || _notrun "openmetrics pmda not installed" | ||
|
|
||
| test -x $PCP_BINADM_DIR/pmseries_import || _notrun "No pmseries_import script" |
There was a problem hiding this comment.
pmseries_import is checked here but never used.
The script loads archives directly via pmseries $options --load "{source.path: ...}" (line 160), and never invokes pmseries_import. This _notrun gate will spuriously skip the test on systems that ship pmseries but not pmseries_import (or are renamed in a future packaging change), without any actual dependency on that tool. Drop the check, or replace it with whatever the test actually depends on (e.g., pmseries itself).
🧹 Suggested fix
-test -x $PCP_BINADM_DIR/pmseries_import || _notrun "No pmseries_import script"
-_check_series
+_check_series🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@qa/1744` at line 17, The test currently gates execution on presence of
pmseries_import (checked via the test invoking pmseries_import and calling
_notrun), but the test never calls pmseries_import; instead it uses pmseries to
load archives, so remove the pmseries_import existence check and either drop
that _notrun gate entirely or replace it with a check for pmseries (the binary
actually used by the test). Update the check that calls _notrun to reference
pmseries (or remove it) and keep the rest of the test logic unchanged.
| _cleanup() | ||
| { | ||
| cd $here | ||
| [ -n "$pmloggerpid" ] && $sudo kill -TERM $pmloggerpid 2>/dev/null | ||
| if [ -n "$key_server_port" ]; then | ||
| echo "Shutting down key server on port $key_server_port..." >>$here/$seq.full | ||
| $keys_cli -p $key_server_port shutdown >>$here/$seq.full 2>&1 | ||
| sleep 1 | ||
| # Verify it's actually dead | ||
| if $keys_cli -p $key_server_port PING >>$here/$seq.full 2>&1; then | ||
| echo "WARNING: Key server still running after shutdown!" >>$here/$seq.full | ||
| else | ||
| echo "Key server shutdown confirmed" >>$here/$seq.full | ||
| fi | ||
| fi | ||
| $sudo rm -rf $PCP_ETC_DIR/pcp/labels/* | ||
| _restore_config $PCP_ETC_DIR/pcp/labels | ||
| _sighup_pmcd | ||
| _pmdaopenmetrics_cleanup | ||
| _restore_config $PCP_SYSCONF_DIR/pmseries | ||
| $sudo rm -rf $tmp $tmp.* | ||
| } |
There was a problem hiding this comment.
Cleanup deletes $PCP_ETC_DIR/pcp/labels/* even if _save_config failed.
Line 37 unconditionally rm -rfs everything under $PCP_ETC_DIR/pcp/labels/ before line 38 attempts _restore_config. If _save_config on line 91 was never reached (e.g., earlier _notrun / failure), the cleanup will still wipe a system directory it never backed up. Guard the destructive rm with a flag set after _save_config succeeds, or move the rm inside the same conditional that runs the test body.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@qa/1744` around lines 22 - 43, The cleanup function unconditionally removes
$PCP_ETC_DIR/pcp/labels/* which can delete real system files if _save_config
never ran; modify the test so that _save_config sets a guard flag (e.g.,
labels_saved=1) after it successfully completes (reference the _save_config call
and the _restore_config function) and change _cleanup to only perform the
destructive "$sudo rm -rf $PCP_ETC_DIR/pcp/labels/*" when that guard is set (or
alternatively move the rm into the same conditional block that runs the test
body), ensuring _restore_config is only relied on when a prior backup was made.
| # add a bunch of label filters to exercise various filtering options | ||
| cat <<EOF >>$tmp.tmp | ||
|
|
||
| FILTER: OPTIONAL LABEL some_optional_label | ||
|
|
||
| EOF | ||
| $sudo cp $tmp.tmp $PCP_PMDAS_DIR/openmetrics/config.d/$urlbase.url |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate openmetrics PMDA FILTER parsing to confirm whether the argument is a label name, value, or regex.
fd -t f 'pmda(openmetrics|.*\.python)' src/pmdas/openmetrics 2>/dev/null
rg -nP -C5 '\bFILTER\s*:?\s*OPTIONAL\b' --type=py --type=sh --type=md
rg -nP -C5 'OPTIONAL\s+LABEL'
rg -nP -C3 'PM_LABEL_OPTIONAL' --type=c --type=pyRepository: performancecopilot/pcp
Length of output: 20928
🏁 Script executed:
# Find and examine the label test file to confirm label name and value
find . -name "*pmseries_label_test*" -o -name "*label_test*" | head -20Repository: performancecopilot/pcp
Length of output: 115
🏁 Script executed:
# Search for the actual label name and value in openmetrics samples
rg -n "test_metric2|optional_label" --type=txt --type=py src/pmdas/openmetrics/ qa/Repository: performancecopilot/pcp
Length of output: 447
🏁 Script executed:
# Look for the filter parsing code in the openmetrics PMDA
fd -t f "openmetrics" src/pmdas/openmetrics/ | grep -E "\.(py|c)$"Repository: performancecopilot/pcp
Length of output: 48
🏁 Script executed:
# Find the openmetrics PMDA Python implementation
find . -path "*/pmdas/openmetrics*.python" -type fRepository: performancecopilot/pcp
Length of output: 113
🏁 Script executed:
# Search for the OPTIONAL LABEL filter parsing in Python code
rg -n "OPTIONAL|LABEL" src/pmdas/openmetrics/pmdaopenmetrics.python -A 5 -B 5Repository: performancecopilot/pcp
Length of output: 6815
🏁 Script executed:
# Check the entire config parsing section
rg -n "FILTER" src/pmdas/openmetrics/pmdaopenmetrics.python -A 3 -B 3 | head -100Repository: performancecopilot/pcp
Length of output: 1636
Fix filter to use label name instead of value: change some_optional_label to optional_label
The filter incorrectly uses the label value instead of the label name. The label is defined as optional_label="some_optional_label", where optional_label is the name and some_optional_label is the value. The PMDA filter code (pmdaopenmetrics.python line 632) matches against label names, not values, so the filter FILTER: OPTIONAL LABEL some_optional_label will never match. It should be FILTER: OPTIONAL LABEL optional_label.
Note: The test files contain a comment stating "*** NOTE: optional labels are not implemented yet" — confirm whether this feature is active before merging.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@qa/1744` around lines 102 - 108, The FILTER line in the test config uses the
label value instead of the label name so pmdaopenmetrics (pmdaopenmetrics.python
matching logic) will never match; update the here-doc content that adds "FILTER:
OPTIONAL LABEL some_optional_label" to use the label name "optional_label"
instead (i.e., replace some_optional_label with optional_label in the cat <<EOF
block that writes $tmp.tmp) and verify the optional-label feature is enabled
(the test comment "*** NOTE: optional labels are not implemented yet") before
merging so the test will exercise the behavior.
| series1=`pmseries $options 'openmetrics.pmseries_label_test.test_metric1'` | ||
| if [ -z "$series1" ]; then | ||
| echo "ERROR: No series found for test_metric1" | ||
| status=1 | ||
| else | ||
| pmseries $series1 | _filter_series | _filter | ||
| fi | ||
|
|
||
| # Query for the second metric (with optional label) | ||
| echo | ||
| echo "=== test_metric2 (with optional label) ===" | ||
| series2=`pmseries $options 'openmetrics.pmseries_label_test.test_metric2'` | ||
| if [ -z "$series2" ]; then | ||
| echo "ERROR: No series found for test_metric2" | ||
| status=1 | ||
| else | ||
| pmseries $series2 | _filter_series | _filter | ||
| fi |
There was a problem hiding this comment.
Test sets status=1 on missing series but continues to next check.
If series1 is empty on line 171, the test prints ERROR: and sets status=1 but falls through to query test_metric2. That's fine for debugging, but the trailing status=0 on line 198 unconditionally clears the failure flag, so a missed-series error on test_metric1/test_metric2 is silently masked at exit time.
🐛 Suggested fix
-# success, all done
-status=0
-exit
+# success, all done — but only if no earlier ERROR set status=1
+[ "$status" = 1 ] || status=0
+exitOr simpler: track failures with a separate flag and only clear it at the end if no failure occurred.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@qa/1744` around lines 170 - 187, The script sets status=1 when a series is
missing (in the checks around series1 and series2) but later unconditionally
resets status=0, masking failures; fix by preserving failure state instead of
clearing it: remove the unconditional status=0 reset (or introduce a separate
boolean like had_failure that you set to true on any "ERROR" and only set
status=0 at the very end if had_failure is false), ensure the checks in the
pmseries blocks (series1, series2) only set status=1/had_failure and never
overwrite it with success later so the exit value reflects any missing-series
errors.
|
@lmchilton please go through all the coderabbit reported issues and resolve/comment on each, then I'll take a look (so I don't double up on anything already reported). Thanks! |
Added support for optional labels to be ingested into a key-value server by pmproxy. SHA1 hash calculations are not affected.
libpcp_web code additions include creating a new buffer to separate hash calculation from display calculation and creating a new context hash function to separate context hash w/ identifying labels and context hash w/ identifying + optional labels.
Added qa test 1744 to test optional label ingestion and display.