-
Notifications
You must be signed in to change notification settings - Fork 30
added rag_ingestion action, script and dockerfile #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,107 @@ | |||||||||||||||||||||||||||||
| name: (Develop) Build and Push RAG Ingestion to AWS, GCP and Azure | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| on: | |||||||||||||||||||||||||||||
| workflow_dispatch: | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| env: | |||||||||||||||||||||||||||||
| PROJECT_ID: aesy-330511 | |||||||||||||||||||||||||||||
| GCP_REGION: asia-south1 | |||||||||||||||||||||||||||||
| GAR_LOCATION: asia-south1-docker.pkg.dev/aesy-330511/root-hub | |||||||||||||||||||||||||||||
| IMAGE_NAME: auraflo-rag-ingestion | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| AWS_REGION: ap-south-1 | |||||||||||||||||||||||||||||
| ECR_REGISTRY: 025066241490.dkr.ecr.ap-south-1.amazonaws.com | |||||||||||||||||||||||||||||
| ECR_REPOSITORY: rootflo/auraflo-rag-ingestion | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| ACR_REGISTRY_NAME: rootflo | |||||||||||||||||||||||||||||
| ACR_REGISTRY: rootflo.azurecr.io | |||||||||||||||||||||||||||||
| ACR_REPOSITORY: auraflo-rag-ingestion | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| jobs: | |||||||||||||||||||||||||||||
| build-push-artifact: | |||||||||||||||||||||||||||||
| runs-on: ubuntu-latest | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| steps: | |||||||||||||||||||||||||||||
| - name: "Checkout" | |||||||||||||||||||||||||||||
| uses: "actions/checkout@v3" | |||||||||||||||||||||||||||||
|
Comment on lines
+25
to
+26
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update GitHub Actions to current versions. Several actions are using outdated major versions that may have compatibility issues or security vulnerabilities. The static analyzer flagged these as potentially too old to run. 🔧 Proposed version updates - name: "Checkout"
- uses: "actions/checkout@v3"
+ uses: "actions/checkout@v4" - name: Cache Docker layers
id: cache-docker-layers
- uses: actions/cache@v3
+ uses: actions/cache@v4 - id: "Auth-to-GCP"
- uses: "google-github-actions/auth@v1"
+ uses: "google-github-actions/auth@v2"
with:
credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}"
- name: "Set up Cloud SDK"
- uses: "google-github-actions/setup-gcloud@v1"
+ uses: "google-github-actions/setup-gcloud@v2" - name: Configure AWS credentials
- uses: aws-actions/configure-aws-credentials@v1
+ uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
- uses: aws-actions/amazon-ecr-login@v1
+ uses: aws-actions/amazon-ecr-login@v2Also applies to: 36-38, 54-60, 72-73, 79-81 🧰 Tools🪛 actionlint (1.7.11)[error] 26-26: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue (action) 🤖 Prompt for AI Agents |
|||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Get commit hash | |||||||||||||||||||||||||||||
| id: get-commit-hash | |||||||||||||||||||||||||||||
| run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)" | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Get timestamp | |||||||||||||||||||||||||||||
| id: get-timestamp | |||||||||||||||||||||||||||||
| run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')" | |||||||||||||||||||||||||||||
|
Comment on lines
+28
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replace deprecated The 🔧 Proposed fix - name: Get commit hash
id: get-commit-hash
- run: echo "::set-output name=commit-hash::$(git rev-parse --short HEAD)"
+ run: echo "commit-hash=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
- name: Get timestamp
id: get-timestamp
- run: echo "::set-output name=timestamp::$(date +'%Y-%m-%d-%H-%M')"
+ run: echo "timestamp=$(date +'%Y-%m-%d-%H-%M')" >> $GITHUB_OUTPUT🧰 Tools🪛 actionlint (1.7.11)[error] 30-30: workflow command "set-output" was deprecated. use (deprecated-commands) [error] 34-34: workflow command "set-output" was deprecated. use (deprecated-commands) 🤖 Prompt for AI Agents |
|||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Cache Docker layers | |||||||||||||||||||||||||||||
| id: cache-docker-layers | |||||||||||||||||||||||||||||
| uses: actions/cache@v3 | |||||||||||||||||||||||||||||
| with: | |||||||||||||||||||||||||||||
| path: /tmp/.buildx-cache | |||||||||||||||||||||||||||||
| key: ${{ runner.os }}-docker-${{ github.sha }} | |||||||||||||||||||||||||||||
| restore-keys: | | |||||||||||||||||||||||||||||
| ${{ runner.os }}-docker- | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Set up Docker Buildx | |||||||||||||||||||||||||||||
| uses: docker/setup-buildx-action@v3 | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Build Docker Image | |||||||||||||||||||||||||||||
| id: build-image | |||||||||||||||||||||||||||||
| run: | | |||||||||||||||||||||||||||||
| docker build -f wavefront/server/docker/rag_ingestion.Dockerfile -t rootflo:${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }} . | |||||||||||||||||||||||||||||
| echo "IMAGE_TAG=${{ steps.get-commit-hash.outputs.commit-hash }}-${{ steps.get-timestamp.outputs.timestamp }}" >> $GITHUB_ENV | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - id: "Auth-to-GCP" | |||||||||||||||||||||||||||||
| uses: "google-github-actions/auth@v1" | |||||||||||||||||||||||||||||
| with: | |||||||||||||||||||||||||||||
| credentials_json: "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}" | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: "Set up Cloud SDK" | |||||||||||||||||||||||||||||
| uses: "google-github-actions/setup-gcloud@v1" | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: "Docker auth for GCP" | |||||||||||||||||||||||||||||
| run: |- | |||||||||||||||||||||||||||||
| gcloud auth configure-docker ${{ env.GCP_REGION }}-docker.pkg.dev --quiet | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Tag and push image to GCP Artifact Registry | |||||||||||||||||||||||||||||
| run: | | |||||||||||||||||||||||||||||
| docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} | |||||||||||||||||||||||||||||
| docker push ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| # Configure AWS credentials and push to ECR | |||||||||||||||||||||||||||||
| - name: Configure AWS credentials | |||||||||||||||||||||||||||||
| uses: aws-actions/configure-aws-credentials@v1 | |||||||||||||||||||||||||||||
| with: | |||||||||||||||||||||||||||||
| aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | |||||||||||||||||||||||||||||
| aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | |||||||||||||||||||||||||||||
| aws-region: ${{ env.AWS_REGION }} | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Login to Amazon ECR | |||||||||||||||||||||||||||||
| id: login-ecr | |||||||||||||||||||||||||||||
| uses: aws-actions/amazon-ecr-login@v1 | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Tag and push image to Amazon ECR | |||||||||||||||||||||||||||||
| run: | | |||||||||||||||||||||||||||||
| docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} | |||||||||||||||||||||||||||||
| docker push ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| # Configure Azure credentials and push to ACR | |||||||||||||||||||||||||||||
| - name: Login to Azure | |||||||||||||||||||||||||||||
| uses: azure/login@v2 | |||||||||||||||||||||||||||||
| with: | |||||||||||||||||||||||||||||
| creds: ${{ secrets.AZURE_CREDENTIALS }} | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Docker auth for Azure ACR | |||||||||||||||||||||||||||||
| run: az acr login --name ${{ env.ACR_REGISTRY_NAME }} | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Tag and push image to Azure Container Registry | |||||||||||||||||||||||||||||
| run: | | |||||||||||||||||||||||||||||
| docker tag rootflo:${{ env.IMAGE_TAG }} ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} | |||||||||||||||||||||||||||||
| docker push ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} | |||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| - name: Cleanup Docker images | |||||||||||||||||||||||||||||
| run: | | |||||||||||||||||||||||||||||
| docker rmi rootflo:${{ env.IMAGE_TAG }} || true | |||||||||||||||||||||||||||||
| docker rmi ${{ env.GAR_LOCATION }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} || true | |||||||||||||||||||||||||||||
| docker rmi ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true | |||||||||||||||||||||||||||||
| docker rmi ${{ env.ACR_REGISTRY }}/${{ env.ACR_REPOSITORY }}:${{ env.IMAGE_TAG }} || true | |||||||||||||||||||||||||||||
|
Comment on lines
+22
to
+107
Check warningCode scanning / CodeQL Workflow does not contain permissions Medium
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}
Copilot AutofixAI 11 days ago In general, the fix is to explicitly define a least‑privilege The best minimal change is to add a root-level permissions:
contents: readafter line 1. No additional methods, imports, or definitions are needed.
Suggested changeset
1
.github/workflows/build-rag-ingestion-develop.yaml
Copilot is powered by AI and may make mistakes. Always verify output.
Refresh and try again.
|
|||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| FROM python:3.11-slim-buster | ||
|
|
||
| # Copy UV from official image | ||
| COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ | ||
|
|
||
| # Set working directory | ||
| WORKDIR /app | ||
|
|
||
| # Copy project files | ||
| COPY wavefront/server/pyproject.toml wavefront/server/uv.lock ./ | ||
| COPY wavefront/server/background_jobs/rag_ingestion ./background_jobs/rag_ingestion/ | ||
| COPY wavefront/server/packages/flo_cloud ./packages/flo_cloud/ | ||
| COPY wavefront/server/packages/flo_utils ./packages/flo_utils/ | ||
| COPY wavefront/server/modules/db_repo_module ./modules/db_repo_module/ | ||
| COPY wavefront/server/modules/common_module ./modules/common_module/ | ||
| COPY wavefront/server/scripts/rag_ingestion/startup-rag-ingestion.sh ./background_jobs/rag_ingestion/ | ||
|
|
||
| # Install dependencies | ||
| RUN uv sync --package rag-ingestion --frozen --no-dev | ||
|
|
||
| # Download the tiktoken encoding file and NLTK data | ||
| RUN mkdir -p /root/.cache/tiktoken | ||
| RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')" | ||
| RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')" | ||
|
|
||
| WORKDIR /app/background_jobs/rag_ingestion | ||
|
|
||
| # Make startup script executable | ||
| RUN chmod +x startup-rag-ingestion.sh | ||
|
|
||
| # Set entrypoint to run startup script | ||
| CMD ["./startup-rag-ingestion.sh"] | ||
|
Comment on lines
+26
to
+32
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Run as non-root user for improved security. The container runs as root, which is flagged by Trivy (DS-0002). Running containers as non-root limits the impact of a container compromise. 🛡️ Proposed fix to add a non-root user # Download the tiktoken encoding file and NLTK data
RUN mkdir -p /root/.cache/tiktoken
RUN uv run python3 -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4')"
RUN uv run python3 -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"
+# Create non-root user
+RUN useradd --create-home --shell /bin/bash appuser \
+ && cp -r /root/.cache /home/appuser/.cache \
+ && chown -R appuser:appuser /app /home/appuser/.cache
+
WORKDIR /app/background_jobs/rag_ingestion
# Make startup script executable
RUN chmod +x startup-rag-ingestion.sh
+USER appuser
+
# Set entrypoint to run startup script
CMD ["./startup-rag-ingestion.sh"]🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| #!/bin/bash | ||
|
|
||
| source /app/.venv/bin/activate | ||
|
|
||
| # Run the main application for RAG Ingestion | ||
| python rag_ingestion/main.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explicit permissions to follow the principle of least privilege.
The workflow lacks a
permissionsblock, which means it uses default token permissions. CodeQL flagged this as a security concern. Explicitly declaring minimal permissions reduces the blast radius if the workflow is compromised.🛡️ Proposed fix
jobs: build-push-artifact: runs-on: ubuntu-latest + permissions: + contents: read steps:📝 Committable suggestion
🤖 Prompt for AI Agents