fix: Add dev mode for RLA service to enable certain options only for dev#360
fix: Add dev mode for RLA service to enable certain options only for dev#360jw-nvidia merged 1 commit intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant User as User / CLI
participant CLI as "rla serve"
participant Env as Env Resolver
participant Validator as Config Validator
participant Service as Service.New / Start
participant Logger as Zerolog
participant Reflection as gRPC Reflection
participant Scheduler as Scheduler
participant Tool as "grpcui / grpcurl"
User->>CLI: run `./rla serve [--dev-mode]`
CLI->>Env: GetDeploymentEnv()
Env-->>CLI: return env or error
CLI->>Validator: Build Config (DevMode) -> Validate()
Validator-->>CLI: success or error
CLI->>Service: New(config) -> Start()
Service->>Logger: set level (Debug if DevMode else Info)
Service->>Reflection: register (only if DevMode)
Service->>Scheduler: schedule jobs (only if job instances non-nil)
Tool->>Service: request (requires reflection for grpcui/grpcurl)
Reflection-->>Tool: provide descriptors (only when registered)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rla/internal/service/config.go`:
- Around line 50-53: Add a fail-fast validation that enforces the comment on
DevMode: implement a Config.Validate() method (or add validation in the service
initialization path that constructs service.Config) which checks the DevMode
bool and returns/throws an error when DevMode is true while the runtime
indicates a production environment; update the service constructor (e.g.,
NewService or InitService) to call Config.Validate() and abort startup on
validation failure so reflection/debug options cannot be enabled in production.
Ensure the validation uses existing environment/flags used by the project to
determine "production" and include a clear error message referencing
service.Config.DevMode in the returned error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a04cbcf4-0629-4703-81ff-87ee72a97965
📒 Files selected for processing (4)
rla/README.mdrla/cmd/serve.gorla/internal/service/config.gorla/internal/service/service.go
✅ Files skipped from review due to trivial changes (2)
- rla/internal/service/service.go
- rla/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- rla/cmd/serve.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
rla/internal/service/config_test.go (1)
29-33: Make TLS-present fixtures use real temp files instead of non-empty path strings.
tlsConfig()currently treats “present” as string-populated paths only. Creating actual files int.TempDir()will keep these tests valid if TLS availability checks are strengthened to verify file existence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rla/internal/service/config_test.go` around lines 29 - 33, Replace the current tlsConfig that returns non-existent path strings with one that creates real temp files and returns their paths: use t.TempDir() inside the test to create a temp directory, create empty files named "ca.crt", "tls.crt", and "tls.key" there, and return pkgcerts.Config with CACert, TLSCert and TLSKey set to those real file paths (update the tlsConfig helper or inline this logic where used so tests create and clean up real files before asserting IsTLSAvailable).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rla/internal/certs/certs.go`:
- Around line 52-63: The IsTLSAvailable logic returns false positives because
c.IsSet() only checks for non-empty config values and the function only verifies
existence of defaultCACert; update IsTLSAvailable (and/or c.IsSet()) to require
that all three certificate files exist: filepath.Join(certDir, defaultCACert),
filepath.Join(certDir, defaultTLSCert), and filepath.Join(certDir,
defaultTLSKey) (or if c points to explicit paths, verify those files exist
instead of just non-empty strings). Modify the function to compute certDir
(using defaultCertDir when env empty), then stat each required file and return
true only if all three os.Stat calls succeed (err == nil), otherwise return
false.
---
Nitpick comments:
In `@rla/internal/service/config_test.go`:
- Around line 29-33: Replace the current tlsConfig that returns non-existent
path strings with one that creates real temp files and returns their paths: use
t.TempDir() inside the test to create a temp directory, create empty files named
"ca.crt", "tls.crt", and "tls.key" there, and return pkgcerts.Config with
CACert, TLSCert and TLSKey set to those real file paths (update the tlsConfig
helper or inline this logic where used so tests create and clean up real files
before asserting IsTLSAvailable).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 51c9b2c9-e9b7-47b6-b45e-e29e238a5160
📒 Files selected for processing (7)
rla/README.mdrla/cmd/root.gorla/cmd/serve.gorla/internal/certs/certs.gorla/internal/service/config.gorla/internal/service/config_test.gorla/internal/service/service.go
✅ Files skipped from review due to trivial changes (1)
- rla/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- rla/cmd/serve.go
- rla/internal/service/service.go
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rla/internal/service/config.go`:
- Around line 83-85: The DevMode field's comment is inconsistent with validation
that also forbids staging; update the DevMode docstring to reflect the actual
rule (e.g., "Development only — not allowed in staging or production" or
similar) and make the same wording change for any other DevMode comments in this
file so the documentation matches the runtime validation that rejects staging
and production.
- Around line 112-115: Add a fail-fast check that rejects a partially populated
explicit CertConfig before calling certs.IsTLSAvailable: examine c.CertConfig's
explicit-path fields (the explicit certificate path fields on the CertConfig
struct) and if any are set but not all required ones are set, return an error
stating the explicit-path config is incomplete; only if the explicit-path fields
are either all empty or all present should you proceed to call
certs.IsTLSAvailable(c.CertConfig) for fallback probing (this check should live
immediately before the existing envStaging/envProduction TLS guard that uses
certs.IsTLSAvailable).
- Around line 40-43: The code currently defaults missing RLA_ENV to
"development" which makes TLS checks in Validate() opt-out; change the behavior
so that when EnvVarName (RLA_ENV) is not set the service treats that as an error
(fail closed) instead of defaulting to "development" — update the code that
reads EnvVarName and any helper that maps empty -> "development" to return an
explicit error or require the caller to set the env, and adjust any callers of
Validate() to surface that error; keep a separate local/dev entrypoint (not in
rla/internal/service/config.go) to set a safe default of "development" for local
runs if needed. Ensure references: EnvVarName, the function that reads/parses
the environment var (the current mapping block around lines ~58-61), and
Validate() are updated accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 05ef899e-010e-48d4-8d00-8db916b62490
📒 Files selected for processing (7)
rla/README.mdrla/cmd/root.gorla/cmd/serve.gorla/internal/certs/certs.gorla/internal/service/config.gorla/internal/service/config_test.gorla/internal/service/service.go
✅ Files skipped from review due to trivial changes (2)
- rla/README.md
- rla/cmd/root.go
🚧 Files skipped from review as they are similar to previous changes (3)
- rla/internal/certs/certs.go
- rla/cmd/serve.go
- rla/internal/service/service.go
There was a problem hiding this comment.
🧹 Nitpick comments (2)
rla/internal/certs/certs_test.go (1)
155-222: Add a precedence regression for explicit paths overCERTDIR.These cases prove the explicit-path branch and the
CERTDIRbranch separately, but they never assert that a broken fully populatedpkgcerts.Configdoes not fall back to a validCERTDIR. That matters becauserla/internal/service/config.goLines 122-124 useIsTLSAvailableas the staging/production startup gate.💡 Suggested subtest
t.Run("explicit paths: tls.key missing", func(t *testing.T) { dir := stubCerts(t, defaultCACert, defaultCertFile) c := pkgcerts.Config{ CACert: filepath.Join(dir, defaultCACert), TLSCert: filepath.Join(dir, defaultCertFile), TLSKey: filepath.Join(dir, defaultKeyFile), } assert.False(t, IsTLSAvailable(c)) }) + + t.Run("explicit paths take precedence over CERTDIR", func(t *testing.T) { + badDir := stubCerts(t, defaultCACert, defaultCertFile) // tls.key absent + goodDir := stubCerts(t, defaultCACert, defaultCertFile, defaultKeyFile) + t.Setenv("CERTDIR", goodDir) + + c := pkgcerts.Config{ + CACert: filepath.Join(badDir, defaultCACert), + TLSCert: filepath.Join(badDir, defaultCertFile), + TLSKey: filepath.Join(badDir, defaultKeyFile), + } + assert.False(t, IsTLSAvailable(c)) + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rla/internal/certs/certs_test.go` around lines 155 - 222, Add a regression test that verifies explicit paths in pkgcerts.Config take precedence over CERTDIR: in certs_test.go add a subtest (e.g., "explicit paths take precedence over CERTDIR") that stubs a valid cert directory and sets t.Setenv("CERTDIR", dir) but constructs a pkgcerts.Config whose explicit CACert/TLSCert/TLSKey point to missing/invalid files, then assert IsTLSAvailable(config) is false; reference IsTLSAvailable and pkgcerts.Config so the test ensures a broken explicit config does not fall back to a valid CERTDIR.rla/internal/service/config_test.go (1)
54-57: The “unset env” rows are only testing the empty-string path.Line 231 always calls
t.Setenv, so the cases named “unset env” never exercise a truly missingRLA_ENV. Either rename those rows to “empty env” or add one real unset-path case so the table matches the contract inGetDeploymentEnv().Also applies to: 226-231
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rla/internal/service/config_test.go` around lines 54 - 57, The table-driven tests name some cases "unset env" but the loop always calls t.Setenv, so those never exercise a truly missing RLA_ENV; update the test table/loop for GetDeploymentEnv(): either rename the rows labeled "unset env" to "empty env", or add an explicit unset path by introducing a field (e.g., unsetEnv bool) in the test struct and, in the test loop, call t.Setenv("RLA_ENV", value) only when unsetEnv is false (otherwise call os.Unsetenv("RLA_ENV") or t.Unsetenv("RLA_ENV") so the env is truly absent); make the same change for the other occurrences referenced around the test loop (the code that currently always calls t.Setenv).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@rla/internal/certs/certs_test.go`:
- Around line 155-222: Add a regression test that verifies explicit paths in
pkgcerts.Config take precedence over CERTDIR: in certs_test.go add a subtest
(e.g., "explicit paths take precedence over CERTDIR") that stubs a valid cert
directory and sets t.Setenv("CERTDIR", dir) but constructs a pkgcerts.Config
whose explicit CACert/TLSCert/TLSKey point to missing/invalid files, then assert
IsTLSAvailable(config) is false; reference IsTLSAvailable and pkgcerts.Config so
the test ensures a broken explicit config does not fall back to a valid CERTDIR.
In `@rla/internal/service/config_test.go`:
- Around line 54-57: The table-driven tests name some cases "unset env" but the
loop always calls t.Setenv, so those never exercise a truly missing RLA_ENV;
update the test table/loop for GetDeploymentEnv(): either rename the rows
labeled "unset env" to "empty env", or add an explicit unset path by introducing
a field (e.g., unsetEnv bool) in the test struct and, in the test loop, call
t.Setenv("RLA_ENV", value) only when unsetEnv is false (otherwise call
os.Unsetenv("RLA_ENV") or t.Unsetenv("RLA_ENV") so the env is truly absent);
make the same change for the other occurrences referenced around the test loop
(the code that currently always calls t.Setenv).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bcde312f-54d8-4113-9a03-240e03ed1a9a
📒 Files selected for processing (8)
rla/README.mdrla/cmd/root.gorla/cmd/serve.gorla/internal/certs/certs.gorla/internal/certs/certs_test.gorla/internal/service/config.gorla/internal/service/config_test.gorla/internal/service/service.go
✅ Files skipped from review due to trivial changes (3)
- rla/cmd/root.go
- rla/README.md
- rla/internal/certs/certs.go
🚧 Files skipped from review as they are similar to previous changes (2)
- rla/internal/service/service.go
- rla/cmd/serve.go
|
/ok to test 52c3862 |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-04-09 15:45:15 UTC | Commit: 52c3862 |
- if dev mode is enabled, enable gRPC reflection and set log level to be debug - add validations: - dev mode can be enabled only on development deployment env - TLS must be available on production and staging depolymnet env - skip registering jobs with scheduler if the required services are not available Signed-off-by: Jin Wang <jinwan@nvidia.com>
|
/ok to test 295f26f |
|
|
||
| if os.Getenv(svc.EnvVarName) == "" { | ||
| log.Warn().Msgf("%s not set, defaulting to %q for local development", svc.EnvVarName, "development") | ||
| os.Setenv(svc.EnvVarName, "development") //nolint:errcheck |
There was a problem hiding this comment.
Is this intended?
This bypass the design of GetDeploymentEnv() fail-closed design when no RLA_ENV set.
There was a problem hiding this comment.
Or at least set this env when devMode == true && RLA_ENV == ""?
| } | ||
|
|
||
| // Rule 3: staging and production require TLS. | ||
| if (env == envStaging || env == envProduction) && !certs.IsTLSAvailable(c.CertConfig) { |
There was a problem hiding this comment.
Validate() calls certs.IsTLSAvailable() to gate staging/production startup. But certOption() has its own ALLOW_INSECURE_GRPC=true escape hatch that bypasses TLS entirely and logs a Fatal otherwise. In development with ALLOW_INSECURE_GRPC=true, the Validate TLS check passes (development doesn't require TLS), so this works. But the two-layer TLS check could be confusing for future maintainers — Validate and certOption express the same policy in different places with different mechanisms.
| } | ||
| if err := sched.Schedule(invJob, invTrigger, schedtypes.Skip); err != nil { | ||
| return fmt.Errorf("failed to schedule inventory sync job: %w", err) | ||
| if invJob != nil { |
There was a problem hiding this comment.
The change wraps scheduler job registration in if invJob != nil / if leakJob != nil. If the jobs returning nil without error from their constructors means a misconfigured provider silently disables the job. A log.Warn() when a job is nil would help operators diagnose why inventory sync or leak detection isn't running.
Description
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes