added rag_ingestion action, script and dockerfile#267
Conversation
📝 WalkthroughWalkthroughThis pull request introduces Docker containerization and CI/CD automation for a RAG ingestion service. It adds a GitHub Actions workflow that builds a Docker image and pushes it to GCP Artifact Registry, AWS ECR, and Azure ACR registries, along with the corresponding Dockerfile and startup script. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: "Checkout" | ||
| uses: "actions/checkout@v3" | ||
|
|
||
| - name: Get commit hash | ||
| id: get-commit-hash | ||
| run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)" | ||
|
|
||
| - name: Get timestamp | ||
| id: get-timestamp | ||
| run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')" | ||
|
|
||
| - name: Cache Docker layers | ||
| id: cache-docker-layers | ||
| uses: actions/cache@v3 | ||
| with: | ||
| path: /tmp/.buildx-cache | ||
| key: ${{ runner.os }}-docker-${{ github.sha }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-docker- | ||
|
|
||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 | ||
|
|
||
| - name: Build Docker Image | ||
| id: build-image | ||
| run: | | ||
| docker build -f wavefront/server/docker/rag_ingestion.Dockerfile -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} . | ||
| echo "IMAGE_TAG=${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }}" >> $GITHUB_ENV | ||
|
|
||
| - id: "Auth-to-GCP" | ||
| uses: "google-github-actions/auth@v1" | ||
| with: | ||
| credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}" | ||
|
|
||
| - name: "Set up Cloud SDK" | ||
| uses: "google-github-actions/setup-gcloud@v1" | ||
|
|
||
| - name: "Docker auth for GCP" | ||
| run: |- | ||
| gcloud auth configure-docker ${{ env.GCP_REGION }}-docker.pkg.dev --quiet | ||
|
|
||
| - name: Tag and push image to GCP Artifact Registry | ||
| run: | | ||
| docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} | ||
| docker push ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} | ||
|
|
||
| # Configure AWS credentials and push to ECR | ||
| - name: Configure AWS credentials | ||
| uses: aws-actions/configure-aws-credentials@v1 | ||
| with: | ||
| aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | ||
| aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
| aws-region: ${{ env.AWS_REGION }} | ||
|
|
||
| - name: Login to Amazon ECR | ||
| id: login-ecr | ||
| uses: aws-actions/amazon-ecr-login@v1 | ||
|
|
||
| - name: Tag and push image to Amazon ECR | ||
| run: | | ||
| docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} | ||
| docker push ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} | ||
|
|
||
| # Configure Azure credentials and push to ACR | ||
| - name: Login to Azure | ||
| uses: azure/login@v2 | ||
| with: | ||
| creds: ${{ secrets.AZURE_CREDENTIALS }} | ||
|
|
||
| - name: Docker auth for Azure ACR | ||
| run: az acr login --name ${{ env.ACR_REGISTRY_NAME }} | ||
|
|
||
| - name: Tag and push image to Azure Container Registry | ||
| run: | | ||
| docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} | ||
| docker push ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} | ||
|
|
||
| - name: Cleanup Docker images | ||
| run: | | ||
| docker rmi rootflo:${{ env.IMAGE_TAG }} || true | ||
| docker rmi ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} || true | ||
| docker rmi ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true | ||
| docker rmi ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 11 days ago
In general, the fix is to explicitly define a least‑privilege permissions block for the workflow or individual jobs instead of relying on default GITHUB_TOKEN permissions. For this workflow, no step needs write access to repository contents or other GitHub resources, so we can safely restrict to contents: read at the workflow level. This will apply to all jobs unless they override it.
The best minimal change is to add a root-level permissions section right after the name: (before on:). This keeps existing behavior for all steps (they still can read the repo via actions/checkout), while ensuring GITHUB_TOKEN cannot perform write operations. Concretely, in .github/workflows/build-rag-ingestion-develop.yaml, insert:
permissions:
contents: readafter line 1. No additional methods, imports, or definitions are needed.
| @@ -1,5 +1,8 @@ | ||
| name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
|
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
wavefront/server/docker/rag_ingestion.Dockerfile (3)
3-4: Pin theuvversion for reproducible builds.Using
:latesttag can cause unexpected build failures ifuvintroduces breaking changes. Since yourpyproject.tomlspecifiesrequired-version = ">=0.7.3", pin to a specific compatible version.♻️ Proposed fix
# Copy UV from official image -COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ +COPY --from=ghcr.io/astral-sh/uv:0.7.3 /uv /uvx /bin/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 3 - 4, The Dockerfile uses COPY --from=ghcr.io/astral-sh/uv:latest which is unstable; update the image tag to a specific pinned uv version compatible with pyproject.toml's required-version (e.g., ghcr.io/astral-sh/uv:0.7.3) so builds are reproducible, i.e., replace ghcr.io/astral-sh/uv:latest in the COPY line with the chosen pinned tag and ensure the pinned version satisfies required-version >=0.7.3.
21-24: Consider consolidating RUN commands to reduce image layers.Multiple
RUNcommands create separate layers. Consolidating them can slightly reduce image size and improve build efficiency.♻️ Proposed consolidation
# Download the tiktoken encoding file and NLTK data -RUN mkdir -p /root/.cache/tiktoken -RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')" -RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')" +RUN mkdir -p /root/.cache/tiktoken \ + && uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')" \ + && uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 21 - 24, The three separate RUN statements that create /root/.cache/tiktoken and call Python (RUN mkdir -p /root/.cache/tiktoken; RUN uv run python3 -c "import tiktoken..."; RUN uv run python3 -c "import nltk...") should be consolidated into a single RUN layer: perform the mkdir -p then run both python -c calls joined with && (and optionally use set -e or set -eux for fail-fast) so the Dockerfile has one RUN instruction that creates the directory and executes the tiktoken and nltk downloads in sequence; update the Dockerfile lines containing the three RUN commands to a single RUN that preserves the same commands and ordering.
1-1: Consider using a newer base image; Debian Buster reached end-of-life.
python:3.11-slim-busteris based on Debian 10 (Buster), which reached end-of-life in June 2024 and no longer receives security updates. Consider upgrading toslim-bookwormfor continued security patches.♻️ Proposed fix
-FROM python:3.11-slim-buster +FROM python:3.11-slim-bookworm🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/docker/rag_ingestion.Dockerfile` at line 1, The Dockerfile uses an EOL base image "python:3.11-slim-buster"; update the FROM line to a maintained Debian base such as "python:3.11-slim-bookworm" to restore security updates—locate the FROM instruction (the "python:3.11-slim-buster" token) in rag_ingestion.Dockerfile and replace it with "python:3.11-slim-bookworm", then rebuild and run smoke tests to verify compatibility..github/workflows/build-rag-ingestion-develop.yaml (1)
36-52: Docker layer cache is configured but not utilized.The workflow sets up Buildx and a cache directory but uses plain
docker buildwhich doesn't leverage the cache. Either usedocker buildx buildwith cache flags or remove the unused cache step.♻️ Option 1: Use buildx with caching
- name: Build Docker Image id: build-image run: | - docker build -f wavefront/server/docker/rag_ingestion.Dockerfile -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} . + docker buildx build \ + --cache-from type=local,src=/tmp/.buildx-cache \ + --cache-to type=local,dest=/tmp/.buildx-cache-new,mode=max \ + --load \ + -f wavefront/server/docker/rag_ingestion.Dockerfile \ + -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} . + # Rotate cache to prevent unbounded growth + rm -rf /tmp/.buildx-cache + mv /tmp/.buildx-cache-new /tmp/.buildx-cache echo "IMAGE_TAG=${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }}" >> $GITHUB_ENV♻️ Option 2: Remove unused cache step
If caching isn't needed, remove lines 36-43 to simplify the workflow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/build-rag-ingestion-develop.yaml around lines 36 - 52, The workflow declares a cache step (id: cache-docker-layers) and sets up Buildx but the Build Docker Image step (id: build-image, name: Build Docker Image) uses plain docker build so the cache is never used; fix by either switching the Build Docker Image step to use docker buildx build with appropriate cache flags (e.g., --cache-from and --cache-to pointing at /tmp/.buildx-cache or a registry) so the cache created by Set up Docker Buildx is consumed, or remove the Cache Docker layers step entirely if you don’t want caching — update the build step referenced as Build Docker Image and the cache step id cache-docker-layers accordingly.wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh (1)
1-6: Add error handling and useexecfor proper signal handling in containers.The script lacks error handling (
set -e) and doesn't useexecfor the final command. In containers, usingexecensures the Python process becomes PID 1 and receives signals (SIGTERM, etc.) directly, enabling graceful shutdown.♻️ Proposed improvements
#!/bin/bash +set -e source /app/.venv/bin/activate # Run the main application for RAG Ingestion -python rag_ingestion/main.py +exec python rag_ingestion/main.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh` around lines 1 - 6, Add strict error handling and ensure the Python process becomes PID 1 by updating the startup script: enable "set -e" (or "set -euo pipefail") after the shebang to fail fast on errors, keep the virtualenv activation via "source /app/.venv/bin/activate", and replace the final "python rag_ingestion/main.py" invocation with an "exec python rag_ingestion/main.py" so signals are forwarded to the Python process for graceful shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/build-rag-ingestion-develop.yaml:
- Around line 25-26: Update the GitHub Actions steps to current major releases:
bump "uses" from actions/checkout@v3 to the latest stable major (e.g.,
actions/checkout@v4) and likewise update actions/setup-node,
actions/upload-artifact, actions/cache, and docker/build-push-action to their
current major tags; after updating each action (identify them by their action
IDs like actions/checkout, actions/setup-node, actions/upload-artifact,
actions/cache, docker/build-push-action) run a quick workflow lint and adjust
any inputs that changed between major versions to match new schemas or required
fields.
- Around line 28-34: The two workflow steps "Get commit hash" (id:
get-commit-hash) and "Get timestamp" (id: get-timestamp) use the deprecated
::set-output syntax; update each step to write outputs to the $GITHUB_OUTPUT
file instead (i.e., produce lines like "commit-hash=..." and "timestamp=..."
appended to $GITHUB_OUTPUT) so the steps set outputs via the supported mechanism
and keep the step names/ids unchanged.
- Around line 20-23: Add an explicit permissions block to the workflow to avoid
default token privileges: insert a top-level or job-level permissions mapping
(for the job named build-push-artifact) that grants only the minimal scopes
required (for example, contents: read, packages: write, id-token: write,
actions: read) instead of leaving permissions undefined; update any steps that
rely on GITHUB_TOKEN to work with these restricted scopes and place the
permissions block adjacent to the runs-on/job definition for
build-push-artifact.
In `@wavefront/server/docker/rag_ingestion.Dockerfile`:
- Around line 26-32: Create and switch to a non-root user in the Dockerfile: add
a step that creates a system/group and a non-root user (e.g., "appuser"), chown
the WORKDIR (/app/background_jobs/rag_ingestion) and the
startup-rag-ingestion.sh to that user, keep the chmod +x step, and add a USER
appuser instruction before the CMD so the container runs as the non-root user;
reference the existing WORKDIR, startup-rag-ingestion.sh and CMD
["./startup-rag-ingestion.sh"] when making these changes.
---
Nitpick comments:
In @.github/workflows/build-rag-ingestion-develop.yaml:
- Around line 36-52: The workflow declares a cache step (id:
cache-docker-layers) and sets up Buildx but the Build Docker Image step (id:
build-image, name: Build Docker Image) uses plain docker build so the cache is
never used; fix by either switching the Build Docker Image step to use docker
buildx build with appropriate cache flags (e.g., --cache-from and --cache-to
pointing at /tmp/.buildx-cache or a registry) so the cache created by Set up
Docker Buildx is consumed, or remove the Cache Docker layers step entirely if
you don’t want caching — update the build step referenced as Build Docker Image
and the cache step id cache-docker-layers accordingly.
In `@wavefront/server/docker/rag_ingestion.Dockerfile`:
- Around line 3-4: The Dockerfile uses COPY --from=ghcr.io/astral-sh/uv:latest
which is unstable; update the image tag to a specific pinned uv version
compatible with pyproject.toml's required-version (e.g.,
ghcr.io/astral-sh/uv:0.7.3) so builds are reproducible, i.e., replace
ghcr.io/astral-sh/uv:latest in the COPY line with the chosen pinned tag and
ensure the pinned version satisfies required-version >=0.7.3.
- Around line 21-24: The three separate RUN statements that create
/root/.cache/tiktoken and call Python (RUN mkdir -p /root/.cache/tiktoken; RUN
uv run python3 -c "import tiktoken..."; RUN uv run python3 -c "import nltk...")
should be consolidated into a single RUN layer: perform the mkdir -p then run
both python -c calls joined with && (and optionally use set -e or set -eux for
fail-fast) so the Dockerfile has one RUN instruction that creates the directory
and executes the tiktoken and nltk downloads in sequence; update the Dockerfile
lines containing the three RUN commands to a single RUN that preserves the same
commands and ordering.
- Line 1: The Dockerfile uses an EOL base image "python:3.11-slim-buster";
update the FROM line to a maintained Debian base such as
"python:3.11-slim-bookworm" to restore security updates—locate the FROM
instruction (the "python:3.11-slim-buster" token) in rag_ingestion.Dockerfile
and replace it with "python:3.11-slim-bookworm", then rebuild and run smoke
tests to verify compatibility.
In `@wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh`:
- Around line 1-6: Add strict error handling and ensure the Python process
becomes PID 1 by updating the startup script: enable "set -e" (or "set -euo
pipefail") after the shebang to fail fast on errors, keep the virtualenv
activation via "source /app/.venv/bin/activate", and replace the final "python
rag_ingestion/main.py" invocation with an "exec python rag_ingestion/main.py" so
signals are forwarded to the Python process for graceful shutdown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3821c593-8b4a-4a32-a441-accd9e742ccd
📒 Files selected for processing (3)
.github/workflows/build-rag-ingestion-develop.yamlwavefront/server/docker/rag_ingestion.Dockerfilewavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh
| jobs: | ||
| build-push-artifact: | ||
| runs-on: ubuntu-latest | ||
|
|
There was a problem hiding this comment.
Add explicit permissions to follow the principle of least privilege.
The workflow lacks a permissions block, which means it uses default token permissions. CodeQL flagged this as a security concern. Explicitly declaring minimal permissions reduces the blast radius if the workflow is compromised.
🛡️ Proposed fix
jobs:
build-push-artifact:
runs-on: ubuntu-latest
+ permissions:
+ contents: read
steps:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| jobs: | |
| build-push-artifact: | |
| runs-on: ubuntu-latest | |
| jobs: | |
| build-push-artifact: | |
| runs-on: ubuntu-latest | |
| permissions: | |
| contents: read | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/build-rag-ingestion-develop.yaml around lines 20 - 23, Add
an explicit permissions block to the workflow to avoid default token privileges:
insert a top-level or job-level permissions mapping (for the job named
build-push-artifact) that grants only the minimal scopes required (for example,
contents: read, packages: write, id-token: write, actions: read) instead of
leaving permissions undefined; update any steps that rely on GITHUB_TOKEN to
work with these restricted scopes and place the permissions block adjacent to
the runs-on/job definition for build-push-artifact.
| - name: "Checkout" | ||
| uses: "actions/checkout@v3" |
There was a problem hiding this comment.
Update GitHub Actions to current versions.
Several actions are using outdated major versions that may have compatibility issues or security vulnerabilities. The static analyzer flagged these as potentially too old to run.
🔧 Proposed version updates
- name: "Checkout"
- uses: "actions/checkout@v3"
+ uses: "actions/checkout@v4" - name: Cache Docker layers
id: cache-docker-layers
- uses: actions/cache@v3
+ uses: actions/cache@v4 - id: "Auth-to-GCP"
- uses: "google-github-actions/auth@v1"
+ uses: "google-github-actions/auth@v2"
with:
credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}"
- name: "Set up Cloud SDK"
- uses: "google-github-actions/setup-gcloud@v1"
+ uses: "google-github-actions/setup-gcloud@v2" - name: Configure AWS credentials
- uses: aws-actions/configure-aws-credentials@v1
+ uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
- uses: aws-actions/amazon-ecr-login@v1
+ uses: aws-actions/amazon-ecr-login@v2Also applies to: 36-38, 54-60, 72-73, 79-81
🧰 Tools
🪛 actionlint (1.7.11)
[error] 26-26: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/build-rag-ingestion-develop.yaml around lines 25 - 26,
Update the GitHub Actions steps to current major releases: bump "uses" from
actions/checkout@v3 to the latest stable major (e.g., actions/checkout@v4) and
likewise update actions/setup-node, actions/upload-artifact, actions/cache, and
docker/build-push-action to their current major tags; after updating each action
(identify them by their action IDs like actions/checkout, actions/setup-node,
actions/upload-artifact, actions/cache, docker/build-push-action) run a quick
workflow lint and adjust any inputs that changed between major versions to match
new schemas or required fields.
| - name: Get commit hash | ||
| id: get-commit-hash | ||
| run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)" | ||
|
|
||
| - name: Get timestamp | ||
| id: get-timestamp | ||
| run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')" |
There was a problem hiding this comment.
Replace deprecated set-output workflow commands.
The ::set-output command was deprecated in October 2022 and may stop working. Use $GITHUB_OUTPUT instead.
🔧 Proposed fix
- name: Get commit hash
id: get-commit-hash
- run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"
+ run: echo "commit-hash=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
- name: Get timestamp
id: get-timestamp
- run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"
+ run: echo "timestamp=$(date +'%Y-%m-%d-%H-%M')" >> $GITHUB_OUTPUT🧰 Tools
🪛 actionlint (1.7.11)
[error] 30-30: workflow command "set-output" was deprecated. use echo "{name}={value}" >> $GITHUB_OUTPUT instead: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
(deprecated-commands)
[error] 34-34: workflow command "set-output" was deprecated. use echo "{name}={value}" >> $GITHUB_OUTPUT instead: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
(deprecated-commands)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/build-rag-ingestion-develop.yaml around lines 28 - 34, The
two workflow steps "Get commit hash" (id: get-commit-hash) and "Get timestamp"
(id: get-timestamp) use the deprecated ::set-output syntax; update each step to
write outputs to the $GITHUB_OUTPUT file instead (i.e., produce lines like
"commit-hash=..." and "timestamp=..." appended to $GITHUB_OUTPUT) so the steps
set outputs via the supported mechanism and keep the step names/ids unchanged.
| WORKDIR /app/background_jobs/rag_ingestion | ||
|
|
||
| # Make startup script executable | ||
| RUN chmod +x startup-rag-ingestion.sh | ||
|
|
||
| # Set entrypoint to run startup script | ||
| CMD ["./startup-rag-ingestion.sh"] No newline at end of file |
There was a problem hiding this comment.
Run as non-root user for improved security.
The container runs as root, which is flagged by Trivy (DS-0002). Running containers as non-root limits the impact of a container compromise.
🛡️ Proposed fix to add a non-root user
# Download the tiktoken encoding file and NLTK data
RUN mkdir -p /root/.cache/tiktoken
RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')"
RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"
+# Create non-root user
+RUN useradd --create-home --shell /bin/bash appuser \
+ && cp -r /root/.cache /home/appuser/.cache \
+ && chown -R appuser:appuser /app /home/appuser/.cache
+
WORKDIR /app/background_jobs/rag_ingestion
# Make startup script executable
RUN chmod +x startup-rag-ingestion.sh
+USER appuser
+
# Set entrypoint to run startup script
CMD ["./startup-rag-ingestion.sh"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 26 - 32,
Create and switch to a non-root user in the Dockerfile: add a step that creates
a system/group and a non-root user (e.g., "appuser"), chown the WORKDIR
(/app/background_jobs/rag_ingestion) and the startup-rag-ingestion.sh to that
user, keep the chmod +x step, and add a USER appuser instruction before the CMD
so the container runs as the non-root user; reference the existing WORKDIR,
startup-rag-ingestion.sh and CMD ["./startup-rag-ingestion.sh"] when making
these changes.
Summary by CodeRabbit