Skip to content

added rag_ingestion action, script and dockerfile#267

Merged
rootflo-hardik merged 1 commit intodevelopfrom
fix_added_rag_ingestion_action
Apr 4, 2026
Merged

added rag_ingestion action, script and dockerfile#267
rootflo-hardik merged 1 commit intodevelopfrom
fix_added_rag_ingestion_action

Conversation

@rootflo-hardik
Copy link
Copy Markdown
Contributor

@rootflo-hardik rootflo-hardik commented Apr 3, 2026

Summary by CodeRabbit

  • Chores
    • Added automated CI/CD pipeline for building and deploying RAG Ingestion service container images to AWS, GCP, and Azure registries
    • Images are tagged with commit hash and timestamp for unique version identification and deployment flexibility across cloud platforms

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

This pull request introduces Docker containerization and CI/CD automation for a RAG ingestion service. It adds a GitHub Actions workflow that builds a Docker image and pushes it to GCP Artifact Registry, AWS ECR, and Azure ACR registries, along with the corresponding Dockerfile and startup script.

Changes

Cohort / File(s) Summary
CI/CD Workflow
.github/workflows/build-rag-ingestion-develop.yaml
New manually-triggered GitHub Actions workflow that builds a Docker image with commit hash and timestamp tags, then authenticates and pushes sequentially to GCP Artifact Registry, AWS ECR, and Azure ACR.
Docker Configuration
wavefront/server/docker/rag_ingestion.Dockerfile
New Dockerfile building a Python 3.11 slim image with uv package manager; installs RAG ingestion dependencies, pre-downloads tiktoken and NLTK runtime data during build, and configures container entrypoint.
Startup Script
wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh
New Bash script that activates Python virtual environment and runs the RAG ingestion main module.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • added azure to github actions #263: Adds Azure ACR support to GitHub Actions workflows with matching ACR environment variables and authentication/tag/push steps for multi-registry deployments.

Suggested reviewers

  • vizsatiz
  • vishnurk6247

Poem

🐰 A container for ingestion so fine,
Built with uv, across clouds it will shine,
To three registries it hops with delight,
AWS, GCP, Azure—deployed just right!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely describes the three main additions: a GitHub Actions workflow for RAG ingestion, a startup script, and a Dockerfile.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix_added_rag_ingestion_action

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +22 to +107
runs-on: ubuntu-latest

steps:
- name: "Checkout"
uses: "actions/checkout@v3"

- name: Get commit hash
id: get-commit-hash
run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"

- name: Get timestamp
id: get-timestamp
run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"

- name: Cache Docker layers
id: cache-docker-layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-docker-${{ github.sha }}
restore-keys: |
${{ runner.os }}-docker-

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build Docker Image
id: build-image
run: |
docker build -f wavefront/server/docker/rag_ingestion.Dockerfile -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} .
echo "IMAGE_TAG=${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }}" >> $GITHUB_ENV

- id: "Auth-to-GCP"
uses: "google-github-actions/auth@v1"
with:
credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}"

- name: "Set up Cloud SDK"
uses: "google-github-actions/setup-gcloud@v1"

- name: "Docker auth for GCP"
run: |-
gcloud auth configure-docker ${{ env.GCP_REGION }}-docker.pkg.dev --quiet

- name: Tag and push image to GCP Artifact Registry
run: |
docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}
docker push ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}

# Configure AWS credentials and push to ECR
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}

- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1

- name: Tag and push image to Amazon ECR
run: |
docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}
docker push ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}

# Configure Azure credentials and push to ACR
- name: Login to Azure
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}

- name: Docker auth for Azure ACR
run: az acr login --name ${{ env.ACR_REGISTRY_NAME }}

- name: Tag and push image to Azure Container Registry
run: |
docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }}
docker push ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }}

- name: Cleanup Docker images
run: |
docker rmi rootflo:${{ env.IMAGE_TAG }} || true
docker rmi ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} || true
docker rmi ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true
docker rmi ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 11 days ago

In general, the fix is to explicitly define a least‑privilege permissions block for the workflow or individual jobs instead of relying on default GITHUB_TOKEN permissions. For this workflow, no step needs write access to repository contents or other GitHub resources, so we can safely restrict to contents: read at the workflow level. This will apply to all jobs unless they override it.

The best minimal change is to add a root-level permissions section right after the name: (before on:). This keeps existing behavior for all steps (they still can read the repo via actions/checkout), while ensuring GITHUB_TOKEN cannot perform write operations. Concretely, in .github/workflows/build-rag-ingestion-develop.yaml, insert:

permissions:
  contents: read

after line 1. No additional methods, imports, or definitions are needed.

Suggested changeset 1
.github/workflows/build-rag-ingestion-develop.yaml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/build-rag-ingestion-develop.yaml b/.github/workflows/build-rag-ingestion-develop.yaml
--- a/.github/workflows/build-rag-ingestion-develop.yaml
+++ b/.github/workflows/build-rag-ingestion-develop.yaml
@@ -1,5 +1,8 @@
 name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure
 
+permissions:
+  contents: read
+
 on:
   workflow_dispatch:
 
EOF
@@ -1,5 +1,8 @@
name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure

permissions:
contents: read

on:
workflow_dispatch:

Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
wavefront/server/docker/rag_ingestion.Dockerfile (3)

3-4: Pin the uv version for reproducible builds.

Using :latest tag can cause unexpected build failures if uv introduces breaking changes. Since your pyproject.toml specifies required-version = ">=0.7.3", pin to a specific compatible version.

♻️ Proposed fix
 # Copy UV from official image
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+COPY --from=ghcr.io/astral-sh/uv:0.7.3 /uv /uvx /bin/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 3 - 4, The
Dockerfile uses COPY --from=ghcr.io/astral-sh/uv:latest which is unstable;
update the image tag to a specific pinned uv version compatible with
pyproject.toml's required-version (e.g., ghcr.io/astral-sh/uv:0.7.3) so builds
are reproducible, i.e., replace ghcr.io/astral-sh/uv:latest in the COPY line
with the chosen pinned tag and ensure the pinned version satisfies
required-version >=0.7.3.

21-24: Consider consolidating RUN commands to reduce image layers.

Multiple RUN commands create separate layers. Consolidating them can slightly reduce image size and improve build efficiency.

♻️ Proposed consolidation
 # Download the tiktoken encoding file and NLTK data
-RUN mkdir -p /root/.cache/tiktoken
-RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')"
-RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"
+RUN mkdir -p /root/.cache/tiktoken \
+    && uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')" \
+    && uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 21 - 24, The
three separate RUN statements that create /root/.cache/tiktoken and call Python
(RUN mkdir -p /root/.cache/tiktoken; RUN uv run python3 -c "import tiktoken...";
RUN uv run python3 -c "import nltk...") should be consolidated into a single RUN
layer: perform the mkdir -p then run both python -c calls joined with && (and
optionally use set -e or set -eux for fail-fast) so the Dockerfile has one RUN
instruction that creates the directory and executes the tiktoken and nltk
downloads in sequence; update the Dockerfile lines containing the three RUN
commands to a single RUN that preserves the same commands and ordering.

1-1: Consider using a newer base image; Debian Buster reached end-of-life.

python:3.11-slim-buster is based on Debian 10 (Buster), which reached end-of-life in June 2024 and no longer receives security updates. Consider upgrading to slim-bookworm for continued security patches.

♻️ Proposed fix
-FROM python:3.11-slim-buster
+FROM python:3.11-slim-bookworm
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/docker/rag_ingestion.Dockerfile` at line 1, The Dockerfile
uses an EOL base image "python:3.11-slim-buster"; update the FROM line to a
maintained Debian base such as "python:3.11-slim-bookworm" to restore security
updates—locate the FROM instruction (the "python:3.11-slim-buster" token) in
rag_ingestion.Dockerfile and replace it with "python:3.11-slim-bookworm", then
rebuild and run smoke tests to verify compatibility.
.github/workflows/build-rag-ingestion-develop.yaml (1)

36-52: Docker layer cache is configured but not utilized.

The workflow sets up Buildx and a cache directory but uses plain docker build which doesn't leverage the cache. Either use docker buildx build with cache flags or remove the unused cache step.

♻️ Option 1: Use buildx with caching
       - name: Build Docker Image
         id: build-image
         run: |
-          docker build -f wavefront/server/docker/rag_ingestion.Dockerfile -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} .
+          docker buildx build \
+            --cache-from type=local,src=/tmp/.buildx-cache \
+            --cache-to type=local,dest=/tmp/.buildx-cache-new,mode=max \
+            --load \
+            -f wavefront/server/docker/rag_ingestion.Dockerfile \
+            -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} .
+          # Rotate cache to prevent unbounded growth
+          rm -rf /tmp/.buildx-cache
+          mv /tmp/.buildx-cache-new /tmp/.buildx-cache
           echo "IMAGE_TAG=${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }}" >> $GITHUB_ENV
♻️ Option 2: Remove unused cache step

If caching isn't needed, remove lines 36-43 to simplify the workflow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 36 - 52, The
workflow declares a cache step (id: cache-docker-layers) and sets up Buildx but
the Build Docker Image step (id: build-image, name: Build Docker Image) uses
plain docker build so the cache is never used; fix by either switching the Build
Docker Image step to use docker buildx build with appropriate cache flags (e.g.,
--cache-from and --cache-to pointing at /tmp/.buildx-cache or a registry) so the
cache created by Set up Docker Buildx is consumed, or remove the Cache Docker
layers step entirely if you don’t want caching — update the build step
referenced as Build Docker Image and the cache step id cache-docker-layers
accordingly.
wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh (1)

1-6: Add error handling and use exec for proper signal handling in containers.

The script lacks error handling (set -e) and doesn't use exec for the final command. In containers, using exec ensures the Python process becomes PID 1 and receives signals (SIGTERM, etc.) directly, enabling graceful shutdown.

♻️ Proposed improvements
 #!/bin/bash
+set -e
 
 source /app/.venv/bin/activate
 
 # Run the main application for RAG Ingestion
-python rag_ingestion/main.py 
+exec python rag_ingestion/main.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh` around lines
1 - 6, Add strict error handling and ensure the Python process becomes PID 1 by
updating the startup script: enable "set -e" (or "set -euo pipefail") after the
shebang to fail fast on errors, keep the virtualenv activation via "source
/app/.venv/bin/activate", and replace the final "python rag_ingestion/main.py"
invocation with an "exec python rag_ingestion/main.py" so signals are forwarded
to the Python process for graceful shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/build-rag-ingestion-develop.yaml:
- Around line 25-26: Update the GitHub Actions steps to current major releases:
bump "uses" from actions/checkout@v3 to the latest stable major (e.g.,
actions/checkout@v4) and likewise update actions/setup-node,
actions/upload-artifact, actions/cache, and docker/build-push-action to their
current major tags; after updating each action (identify them by their action
IDs like actions/checkout, actions/setup-node, actions/upload-artifact,
actions/cache, docker/build-push-action) run a quick workflow lint and adjust
any inputs that changed between major versions to match new schemas or required
fields.
- Around line 28-34: The two workflow steps "Get commit hash" (id:
get-commit-hash) and "Get timestamp" (id: get-timestamp) use the deprecated
::set-output syntax; update each step to write outputs to the $GITHUB_OUTPUT
file instead (i.e., produce lines like "commit-hash=..." and "timestamp=..."
appended to $GITHUB_OUTPUT) so the steps set outputs via the supported mechanism
and keep the step names/ids unchanged.
- Around line 20-23: Add an explicit permissions block to the workflow to avoid
default token privileges: insert a top-level or job-level permissions mapping
(for the job named build-push-artifact) that grants only the minimal scopes
required (for example, contents: read, packages: write, id-token: write,
actions: read) instead of leaving permissions undefined; update any steps that
rely on GITHUB_TOKEN to work with these restricted scopes and place the
permissions block adjacent to the runs-on/job definition for
build-push-artifact.

In `@wavefront/server/docker/rag_ingestion.Dockerfile`:
- Around line 26-32: Create and switch to a non-root user in the Dockerfile: add
a step that creates a system/group and a non-root user (e.g., "appuser"), chown
the WORKDIR (/app/background_jobs/rag_ingestion) and the
startup-rag-ingestion.sh to that user, keep the chmod +x step, and add a USER
appuser instruction before the CMD so the container runs as the non-root user;
reference the existing WORKDIR, startup-rag-ingestion.sh and CMD
["./startup-rag-ingestion.sh"] when making these changes.

---

Nitpick comments:
In @.github/workflows/build-rag-ingestion-develop.yaml:
- Around line 36-52: The workflow declares a cache step (id:
cache-docker-layers) and sets up Buildx but the Build Docker Image step (id:
build-image, name: Build Docker Image) uses plain docker build so the cache is
never used; fix by either switching the Build Docker Image step to use docker
buildx build with appropriate cache flags (e.g., --cache-from and --cache-to
pointing at /tmp/.buildx-cache or a registry) so the cache created by Set up
Docker Buildx is consumed, or remove the Cache Docker layers step entirely if
you don’t want caching — update the build step referenced as Build Docker Image
and the cache step id cache-docker-layers accordingly.

In `@wavefront/server/docker/rag_ingestion.Dockerfile`:
- Around line 3-4: The Dockerfile uses COPY --from=ghcr.io/astral-sh/uv:latest
which is unstable; update the image tag to a specific pinned uv version
compatible with pyproject.toml's required-version (e.g.,
ghcr.io/astral-sh/uv:0.7.3) so builds are reproducible, i.e., replace
ghcr.io/astral-sh/uv:latest in the COPY line with the chosen pinned tag and
ensure the pinned version satisfies required-version >=0.7.3.
- Around line 21-24: The three separate RUN statements that create
/root/.cache/tiktoken and call Python (RUN mkdir -p /root/.cache/tiktoken; RUN
uv run python3 -c "import tiktoken..."; RUN uv run python3 -c "import nltk...")
should be consolidated into a single RUN layer: perform the mkdir -p then run
both python -c calls joined with && (and optionally use set -e or set -eux for
fail-fast) so the Dockerfile has one RUN instruction that creates the directory
and executes the tiktoken and nltk downloads in sequence; update the Dockerfile
lines containing the three RUN commands to a single RUN that preserves the same
commands and ordering.
- Line 1: The Dockerfile uses an EOL base image "python:3.11-slim-buster";
update the FROM line to a maintained Debian base such as
"python:3.11-slim-bookworm" to restore security updates—locate the FROM
instruction (the "python:3.11-slim-buster" token) in rag_ingestion.Dockerfile
and replace it with "python:3.11-slim-bookworm", then rebuild and run smoke
tests to verify compatibility.

In `@wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh`:
- Around line 1-6: Add strict error handling and ensure the Python process
becomes PID 1 by updating the startup script: enable "set -e" (or "set -euo
pipefail") after the shebang to fail fast on errors, keep the virtualenv
activation via "source /app/.venv/bin/activate", and replace the final "python
rag_ingestion/main.py" invocation with an "exec python rag_ingestion/main.py" so
signals are forwarded to the Python process for graceful shutdown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3821c593-8b4a-4a32-a441-accd9e742ccd

📥 Commits

Reviewing files that changed from the base of the PR and between 207a081 and 17a1bdb.

📒 Files selected for processing (3)
  • .github/workflows/build-rag-ingestion-develop.yaml
  • wavefront/server/docker/rag_ingestion.Dockerfile
  • wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh

Comment on lines +20 to +23
jobs:
build-push-artifact:
runs-on: ubuntu-latest

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit permissions to follow the principle of least privilege.

The workflow lacks a permissions block, which means it uses default token permissions. CodeQL flagged this as a security concern. Explicitly declaring minimal permissions reduces the blast radius if the workflow is compromised.

🛡️ Proposed fix
 jobs:
   build-push-artifact:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
 
     steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
build-push-artifact:
runs-on: ubuntu-latest
jobs:
build-push-artifact:
runs-on: ubuntu-latest
permissions:
contents: read
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 20 - 23, Add
an explicit permissions block to the workflow to avoid default token privileges:
insert a top-level or job-level permissions mapping (for the job named
build-push-artifact) that grants only the minimal scopes required (for example,
contents: read, packages: write, id-token: write, actions: read) instead of
leaving permissions undefined; update any steps that rely on GITHUB_TOKEN to
work with these restricted scopes and place the permissions block adjacent to
the runs-on/job definition for build-push-artifact.

Comment on lines +25 to +26
- name: "Checkout"
uses: "actions/checkout@v3"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update GitHub Actions to current versions.

Several actions are using outdated major versions that may have compatibility issues or security vulnerabilities. The static analyzer flagged these as potentially too old to run.

🔧 Proposed version updates
       - name: "Checkout"
-        uses: "actions/checkout@v3"
+        uses: "actions/checkout@v4"
       - name: Cache Docker layers
         id: cache-docker-layers
-        uses: actions/cache@v3
+        uses: actions/cache@v4
       - id: "Auth-to-GCP"
-        uses: "google-github-actions/auth@v1"
+        uses: "google-github-actions/auth@v2"
         with:
           credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}"

       - name: "Set up Cloud SDK"
-        uses: "google-github-actions/setup-gcloud@v1"
+        uses: "google-github-actions/setup-gcloud@v2"
       - name: Configure AWS credentials
-        uses: aws-actions/configure-aws-credentials@v1
+        uses: aws-actions/configure-aws-credentials@v4
         with:
           aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
           aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
           aws-region: ${{ env.AWS_REGION }}

       - name: Login to Amazon ECR
         id: login-ecr
-        uses: aws-actions/amazon-ecr-login@v1
+        uses: aws-actions/amazon-ecr-login@v2

Also applies to: 36-38, 54-60, 72-73, 79-81

🧰 Tools
🪛 actionlint (1.7.11)

[error] 26-26: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 25 - 26,
Update the GitHub Actions steps to current major releases: bump "uses" from
actions/checkout@v3 to the latest stable major (e.g., actions/checkout@v4) and
likewise update actions/setup-node, actions/upload-artifact, actions/cache, and
docker/build-push-action to their current major tags; after updating each action
(identify them by their action IDs like actions/checkout, actions/setup-node,
actions/upload-artifact, actions/cache, docker/build-push-action) run a quick
workflow lint and adjust any inputs that changed between major versions to match
new schemas or required fields.

Comment on lines +28 to +34
- name: Get commit hash
id: get-commit-hash
run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"

- name: Get timestamp
id: get-timestamp
run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace deprecated set-output workflow commands.

The ::set-output command was deprecated in October 2022 and may stop working. Use $GITHUB_OUTPUT instead.

🔧 Proposed fix
       - name: Get commit hash
         id: get-commit-hash
-        run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"
+        run: echo "commit-hash=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

       - name: Get timestamp
         id: get-timestamp
-        run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"
+        run: echo "timestamp=$(date +'%Y-%m-%d-%H-%M')" >> $GITHUB_OUTPUT
🧰 Tools
🪛 actionlint (1.7.11)

[error] 30-30: workflow command "set-output" was deprecated. use echo "{name}={value}" >> $GITHUB_OUTPUT instead: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions

(deprecated-commands)


[error] 34-34: workflow command "set-output" was deprecated. use echo "{name}={value}" >> $GITHUB_OUTPUT instead: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions

(deprecated-commands)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 28 - 34, The
two workflow steps "Get commit hash" (id: get-commit-hash) and "Get timestamp"
(id: get-timestamp) use the deprecated ::set-output syntax; update each step to
write outputs to the $GITHUB_OUTPUT file instead (i.e., produce lines like
"commit-hash=..." and "timestamp=..." appended to $GITHUB_OUTPUT) so the steps
set outputs via the supported mechanism and keep the step names/ids unchanged.

Comment on lines +26 to +32
WORKDIR /app/background_jobs/rag_ingestion

# Make startup script executable
RUN chmod +x startup-rag-ingestion.sh

# Set entrypoint to run startup script
CMD ["./startup-rag-ingestion.sh"] No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run as non-root user for improved security.

The container runs as root, which is flagged by Trivy (DS-0002). Running containers as non-root limits the impact of a container compromise.

🛡️ Proposed fix to add a non-root user
 # Download the tiktoken encoding file and NLTK data
 RUN mkdir -p /root/.cache/tiktoken
 RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')"
 RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"

+# Create non-root user
+RUN useradd --create-home --shell /bin/bash appuser \
+    && cp -r /root/.cache /home/appuser/.cache \
+    && chown -R appuser:appuser /app /home/appuser/.cache
+
 WORKDIR /app/background_jobs/rag_ingestion

 # Make startup script executable
 RUN chmod +x startup-rag-ingestion.sh

+USER appuser
+
 # Set entrypoint to run startup script
 CMD ["./startup-rag-ingestion.sh"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 26 - 32,
Create and switch to a non-root user in the Dockerfile: add a step that creates
a system/group and a non-root user (e.g., "appuser"), chown the WORKDIR
(/app/background_jobs/rag_ingestion) and the startup-rag-ingestion.sh to that
user, keep the chmod +x step, and add a USER appuser instruction before the CMD
so the container runs as the non-root user; reference the existing WORKDIR,
startup-rag-ingestion.sh and CMD ["./startup-rag-ingestion.sh"] when making
these changes.

@rootflo-hardik rootflo-hardik merged commit 732fc46 into develop Apr 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants