Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions build/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM python:3.12-slim

# Install uv.
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

# Set working directory
WORKDIR /app

# Copy the workspace configuration files
COPY pyproject.toml uv.lock ./

# Copy the package directories
COPY packages/ ./packages/

# Install the dependencies, strictly from the lockfile
RUN uv sync --frozen --no-dev --no-install-project

# Install the project itself
RUN uv sync --frozen --no-dev

# Place the virtualenv in the PATH
ENV PATH="/app/.venv/bin:$PATH"

# Expose the API port
EXPOSE 5000

# Set the entrypoint to the CLI
ENTRYPOINT ["datacommons", "api"]
CMD ["--host", "0.0.0.0", "--port", "5000"]
Comment on lines +1 to +29
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The Dockerfile does not specify a non-root user using the USER instruction. By default, Docker containers run as the root user. This violates the principle of least privilege and poses a significant security risk. The provided code suggestion remediates this by creating a dedicated non-root user and switching to it before the entrypoint. Additionally, for improved build performance and reproducibility, consider optimizing Docker layer caching and pinning the uv version instead of using :latest.

FROM python:3.12-slim

# Install uv.
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

# Set working directory
WORKDIR /app

# Copy the workspace configuration files
COPY pyproject.toml uv.lock ./

# Copy the package directories
COPY packages/ ./packages/

# Install the dependencies, strictly from the lockfile
RUN uv sync --frozen --no-dev --no-install-project

# Install the project itself
RUN uv sync --frozen --no-dev

# Place the virtualenv in the PATH
ENV PATH="/app/.venv/bin:$PATH"

# Create a non-root user and switch to it
RUN groupadd -r datacommons && useradd -r -g datacommons datacommons
USER datacommons

# Expose the API port
EXPOSE 5000

# Set the entrypoint to the CLI
ENTRYPOINT ["datacommons", "api"]
CMD ["--host", "0.0.0.0", "--port", "5000"]

12 changes: 12 additions & 0 deletions build/cloudbuild.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
steps:
# Build the container image
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/datcom-ci/datacommons-platform:latest', '-t', 'gcr.io/datcom-ci/datacommons-platform:$COMMIT_SHA', '-f', 'build/Dockerfile', '.']
# Push the container image to Container Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/datcom-ci/datacommons-platform:latest']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/datcom-ci/datacommons-platform:$COMMIT_SHA']
images:
- 'gcr.io/datcom-ci/datacommons-platform:latest'
- 'gcr.io/datcom-ci/datacommons-platform:$COMMIT_SHA'
Comment on lines +1 to +12
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This Cloud Build configuration is well-defined. I have two suggestions for improvement:

  1. Use Artifact Registry: The PR description and documentation mention deploying to Artifact Registry, but this configuration uses the legacy Container Registry (GCR). It's a Google Cloud best practice to use Artifact Registry for new projects. You would need to update the image path (e.g., to us-central1-docker.pkg.dev/datcom-ci/datacommons/datacommons-platform).
  2. Use substitutions: To improve maintainability and avoid repetition, you can define the image name as a substitution variable.

The suggestion below applies the substitution pattern. You will still need to update the image path to point to your Artifact Registry repository.

substitutions:
  _IMAGE_NAME: 'gcr.io/datcom-ci/datacommons-platform'
steps:
  # Build the container image
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', '${_IMAGE_NAME}:latest', '-t', '${_IMAGE_NAME}:$COMMIT_SHA', '-f', 'build/Dockerfile', '.']
  # Push the container image to Container Registry
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', '${_IMAGE_NAME}:latest']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', '${_IMAGE_NAME}:$COMMIT_SHA']
images:
  - '${_IMAGE_NAME}:latest'
  - '${_IMAGE_NAME}:$COMMIT_SHA'

46 changes: 46 additions & 0 deletions docs/deploy_artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Deploying Data Commons Platform Artifacts

> **Internal Process Only**
> This document describes the process for deploying the Data Commons Platform docker artifacts to Google's managed Artifact Registry. These instructions are not intended for general users or external deployments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This document states that artifacts are deployed to "Google's managed Artifact Registry", but all subsequent examples and references (e.g., on lines 30 and 46) use gcr.io, which is for the older Google Container Registry. This is inconsistent and should be corrected. Please update the documentation to use Artifact Registry paths, which aligns with the recommendation for the cloudbuild.yaml configuration.


## Prerequisites

- [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) installed and authenticated.
- Access to the `datcom-ci` GCP project.
- `docker` installed locally (optional, for local builds).

## Building Locally

To build the Docker image locally, you **must run the command from the repository root**, pointing to the Dockerfile in `build/`.

```bash
docker build -f build/Dockerfile -t datacommons-platform:local .
```

To run the container locally:

```bash
docker run -p 5000:5000 datacommons-platform:local
```

Access the API at `http://localhost:5000`.

## Deploying via Cloud Build

We use Google Cloud Build to build and push images to Google Container Registry (GCR).

### Manual Deployment

You can manually trigger a build from your local machine using the `gcloud` CLI. You must provide the `COMMIT_SHA` substitution manually.

```bash
gcloud builds submit --config build/cloudbuild.yaml \
--substitutions=COMMIT_SHA=$(git rev-parse HEAD) \
--project=datcom-ci \
.
```

This will:
1. Upload your current workspace (files in `.`) to Cloud Build.
2. Execute steps in `build/cloudbuild.yaml`.
3. Push images to `gcr.io/datcom-ci/datacommons-platform:latest` and `:CommitSHA`.