Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions .github/workflows/build-rag-ingestion-develop.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure

on:
workflow_dispatch:

env:
PROJECT_ID: aesy-330511
GCP_REGION: asia-south1
GAR_LOCATION: asia-south1-docker.pkg.dev/aesy-330511/root-hub
IMAGE_NAME: auraflo-rag-ingestion

AWS_REGION: ap-south-1
ECR_REGISTRY: 025066241490.dkr.ecr.ap-south-1.amazonaws.com
ECR_REPOSITORY: rootflo/auraflo-rag-ingestion

ACR_REGISTRY_NAME: rootflo
ACR_REGISTRY: rootflo.azurecr.io
ACR_REPOSITORY: auraflo-rag-ingestion

jobs:
build-push-artifact:
runs-on: ubuntu-latest

Comment on lines +20 to +23
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit permissions to follow the principle of least privilege.

The workflow lacks a permissions block, which means it uses default token permissions. CodeQL flagged this as a security concern. Explicitly declaring minimal permissions reduces the blast radius if the workflow is compromised.

🛡️ Proposed fix
 jobs:
   build-push-artifact:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
 
     steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
build-push-artifact:
runs-on: ubuntu-latest
jobs:
build-push-artifact:
runs-on: ubuntu-latest
permissions:
contents: read
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 20 - 23, Add
an explicit permissions block to the workflow to avoid default token privileges:
insert a top-level or job-level permissions mapping (for the job named
build-push-artifact) that grants only the minimal scopes required (for example,
contents: read, packages: write, id-token: write, actions: read) instead of
leaving permissions undefined; update any steps that rely on GITHUB_TOKEN to
work with these restricted scopes and place the permissions block adjacent to
the runs-on/job definition for build-push-artifact.

steps:
- name: "Checkout"
uses: "actions/checkout@v3"
Comment on lines +25 to +26
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update GitHub Actions to current versions.

Several actions are using outdated major versions that may have compatibility issues or security vulnerabilities. The static analyzer flagged these as potentially too old to run.

🔧 Proposed version updates
       - name: "Checkout"
-        uses: "actions/checkout@v3"
+        uses: "actions/checkout@v4"
       - name: Cache Docker layers
         id: cache-docker-layers
-        uses: actions/cache@v3
+        uses: actions/cache@v4
       - id: "Auth-to-GCP"
-        uses: "google-github-actions/auth@v1"
+        uses: "google-github-actions/auth@v2"
         with:
           credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}"

       - name: "Set up Cloud SDK"
-        uses: "google-github-actions/setup-gcloud@v1"
+        uses: "google-github-actions/setup-gcloud@v2"
       - name: Configure AWS credentials
-        uses: aws-actions/configure-aws-credentials@v1
+        uses: aws-actions/configure-aws-credentials@v4
         with:
           aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
           aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
           aws-region: ${{ env.AWS_REGION }}

       - name: Login to Amazon ECR
         id: login-ecr
-        uses: aws-actions/amazon-ecr-login@v1
+        uses: aws-actions/amazon-ecr-login@v2

Also applies to: 36-38, 54-60, 72-73, 79-81

🧰 Tools
🪛 actionlint (1.7.11)

[error] 26-26: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 25 - 26,
Update the GitHub Actions steps to current major releases: bump "uses" from
actions/checkout@v3 to the latest stable major (e.g., actions/checkout@v4) and
likewise update actions/setup-node, actions/upload-artifact, actions/cache, and
docker/build-push-action to their current major tags; after updating each action
(identify them by their action IDs like actions/checkout, actions/setup-node,
actions/upload-artifact, actions/cache, docker/build-push-action) run a quick
workflow lint and adjust any inputs that changed between major versions to match
new schemas or required fields.


- name: Get commit hash
id: get-commit-hash
run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"

- name: Get timestamp
id: get-timestamp
run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"
Comment on lines +28 to +34
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace deprecated set-output workflow commands.

The ::set-output command was deprecated in October 2022 and may stop working. Use $GITHUB_OUTPUT instead.

🔧 Proposed fix
       - name: Get commit hash
         id: get-commit-hash
-        run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"
+        run: echo "commit-hash=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

       - name: Get timestamp
         id: get-timestamp
-        run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"
+        run: echo "timestamp=$(date +'%Y-%m-%d-%H-%M')" >> $GITHUB_OUTPUT
🧰 Tools
🪛 actionlint (1.7.11)

[error] 30-30: workflow command "set-output" was deprecated. use echo "{name}={value}" >> $GITHUB_OUTPUT instead: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions

(deprecated-commands)


[error] 34-34: workflow command "set-output" was deprecated. use echo "{name}={value}" >> $GITHUB_OUTPUT instead: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions

(deprecated-commands)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-rag-ingestion-develop.yaml around lines 28 - 34, The
two workflow steps "Get commit hash" (id: get-commit-hash) and "Get timestamp"
(id: get-timestamp) use the deprecated ::set-output syntax; update each step to
write outputs to the $GITHUB_OUTPUT file instead (i.e., produce lines like
"commit-hash=..." and "timestamp=..." appended to $GITHUB_OUTPUT) so the steps
set outputs via the supported mechanism and keep the step names/ids unchanged.


- name: Cache Docker layers
id: cache-docker-layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-docker-${{ github.sha }}
restore-keys: |
${{ runner.os }}-docker-

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build Docker Image
id: build-image
run: |
docker build -f wavefront/server/docker/rag_ingestion.Dockerfile -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} .
echo "IMAGE_TAG=${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }}" >> $GITHUB_ENV

- id: "Auth-to-GCP"
uses: "google-github-actions/auth@v1"
with:
credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}"

- name: "Set up Cloud SDK"
uses: "google-github-actions/setup-gcloud@v1"

- name: "Docker auth for GCP"
run: |-
gcloud auth configure-docker ${{ env.GCP_REGION }}-docker.pkg.dev --quiet

- name: Tag and push image to GCP Artifact Registry
run: |
docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}
docker push ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}

# Configure AWS credentials and push to ECR
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}

- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1

- name: Tag and push image to Amazon ECR
run: |
docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}
docker push ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}

# Configure Azure credentials and push to ACR
- name: Login to Azure
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}

- name: Docker auth for Azure ACR
run: az acr login --name ${{ env.ACR_REGISTRY_NAME }}

- name: Tag and push image to Azure Container Registry
run: |
docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }}
docker push ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }}

- name: Cleanup Docker images
run: |
docker rmi rootflo:${{ env.IMAGE_TAG }} || true
docker rmi ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} || true
docker rmi ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true
docker rmi ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true
Comment on lines +22 to +107

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 11 days ago

In general, the fix is to explicitly define a least‑privilege permissions block for the workflow or individual jobs instead of relying on default GITHUB_TOKEN permissions. For this workflow, no step needs write access to repository contents or other GitHub resources, so we can safely restrict to contents: read at the workflow level. This will apply to all jobs unless they override it.

The best minimal change is to add a root-level permissions section right after the name: (before on:). This keeps existing behavior for all steps (they still can read the repo via actions/checkout), while ensuring GITHUB_TOKEN cannot perform write operations. Concretely, in .github/workflows/build-rag-ingestion-develop.yaml, insert:

permissions:
  contents: read

after line 1. No additional methods, imports, or definitions are needed.

Suggested changeset 1
.github/workflows/build-rag-ingestion-develop.yaml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/build-rag-ingestion-develop.yaml b/.github/workflows/build-rag-ingestion-develop.yaml
--- a/.github/workflows/build-rag-ingestion-develop.yaml
+++ b/.github/workflows/build-rag-ingestion-develop.yaml
@@ -1,5 +1,8 @@
 name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure
 
+permissions:
+  contents: read
+
 on:
   workflow_dispatch:
 
EOF
@@ -1,5 +1,8 @@
name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure

permissions:
contents: read

on:
workflow_dispatch:

Copilot is powered by AI and may make mistakes. Always verify output.
32 changes: 32 additions & 0 deletions wavefront/server/docker/rag_ingestion.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM python:3.11-slim-buster

# Copy UV from official image
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Set working directory
WORKDIR /app

# Copy project files
COPY wavefront/server/pyproject.toml wavefront/server/uv.lock ./
COPY wavefront/server/background_jobs/rag_ingestion ./background_jobs/rag_ingestion/
COPY wavefront/server/packages/flo_cloud ./packages/flo_cloud/
COPY wavefront/server/packages/flo_utils ./packages/flo_utils/
COPY wavefront/server/modules/db_repo_module ./modules/db_repo_module/
COPY wavefront/server/modules/common_module ./modules/common_module/
COPY wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh ./background_jobs/rag_ingestion/

# Install dependencies
RUN uv sync --package rag-ingestion --frozen --no-dev

# Download the tiktoken encoding file and NLTK data
RUN mkdir -p /root/.cache/tiktoken
RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')"
RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"

WORKDIR /app/background_jobs/rag_ingestion

# Make startup script executable
RUN chmod +x startup-rag-ingestion.sh

# Set entrypoint to run startup script
CMD ["./startup-rag-ingestion.sh"]
Comment on lines +26 to +32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run as non-root user for improved security.

The container runs as root, which is flagged by Trivy (DS-0002). Running containers as non-root limits the impact of a container compromise.

🛡️ Proposed fix to add a non-root user
 # Download the tiktoken encoding file and NLTK data
 RUN mkdir -p /root/.cache/tiktoken
 RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')"
 RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"

+# Create non-root user
+RUN useradd --create-home --shell /bin/bash appuser \
+    && cp -r /root/.cache /home/appuser/.cache \
+    && chown -R appuser:appuser /app /home/appuser/.cache
+
 WORKDIR /app/background_jobs/rag_ingestion

 # Make startup script executable
 RUN chmod +x startup-rag-ingestion.sh

+USER appuser
+
 # Set entrypoint to run startup script
 CMD ["./startup-rag-ingestion.sh"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/docker/rag_ingestion.Dockerfile` around lines 26 - 32,
Create and switch to a non-root user in the Dockerfile: add a step that creates
a system/group and a non-root user (e.g., "appuser"), chown the WORKDIR
(/app/background_jobs/rag_ingestion) and the startup-rag-ingestion.sh to that
user, keep the chmod +x step, and add a USER appuser instruction before the CMD
so the container runs as the non-root user; reference the existing WORKDIR,
startup-rag-ingestion.sh and CMD ["./startup-rag-ingestion.sh"] when making
these changes.

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

source /app/.venv/bin/activate

# Run the main application for RAG Ingestion
python rag_ingestion/main.py
Loading