feat: dockerfile for backend#13
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 14 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughContainerization and production compose manifests were added: multi-stage Dockerfiles for backend and frontend, compose files for dev and prod with Kafka/Mongo/Qdrant and app services, environment examples, Dockerignore files, and an Nginx config for frontend routing (startup healthchecks and persistent volumes included). (50 words) Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Frontend
participant Backend
participant Kafka
participant MongoDB
participant Qdrant
User->>Frontend: HTTP request / SPA navigation
Frontend->>Backend: API request (VITE_API_BASE_URL)
Backend->>MongoDB: Read/Write documents
Backend->>Qdrant: Index / query embeddings
Backend->>Kafka: Publish/Consume events
Kafka-->>Backend: Deliver messages to consumer service
Backend->>Frontend: API response
Frontend-->>User: Rendered content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
backend/.dockerignore (1)
1-9: LGTM — reasonable ignore set.Core offenders (
node_modules,dist,coverage,uploads,.env*,.git, debug logs) are all excluded, which keeps the build context small and prevents local secrets from leaking into the image.Optional additions for a tighter context:
*.log,.DS_Store, editor folders (.vscode,.idea),test/*.spec.ts, and the compose/docs files (docker-compose.yml,README.md) since they aren't needed for the build.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/.dockerignore` around lines 1 - 9, The .dockerignore is good but add optional entries to further shrink build context: append patterns like *.log, .DS_Store, editor dirs (.vscode, .idea), test artifacts (test, *.spec.ts) and non-build docs (docker-compose.yml, README.md) to the existing file; update the .dockerignore by adding these patterns so they are excluded during docker build while keeping current entries (node_modules, dist, coverage, uploads, .env*, .git, npm/pnpm debug logs) intact.backend/docker-compose.yml (2)
85-85: Use distinctKAFKA_CLIENT_IDfor producer vs consumer.Both services default to
paperstack-backend. Not a functional bug (consumer coordination keys offgroupId), but observability suffers — client metrics and broker-side logs can't distinguish the two. Considerpaperstack-apiandpaperstack-consumerdefaults.Also applies to: 121-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/docker-compose.yml` at line 85, The KAFKA_CLIENT_ID environment variable is set to the same default ("paperstack-backend") for both producer and consumer services, making client-side metrics and broker logs indistinguishable; update the KAFKA_CLIENT_ID default for each service so they are unique (e.g., use "paperstack-api" or "paperstack-producer" for the API/producer service and "paperstack-consumer" for the consumer service) by changing the corresponding KAFKA_CLIENT_ID entries in the compose file so each service has its own distinct default.
63-63:platform: linux/amd64forces emulation on ARM hosts.
node:22-bookworm-slimis a multi-arch image, and the code is pure TS/JS (no native pins that require amd64). Pinninglinux/amd64means Apple Silicon / ARM CI runners will run via qemu with a significant performance penalty and occasional flakiness. If there's a specific dependency forcing this (e.g., a prebuilt native binary), please add a comment; otherwise drop the platform pin and let buildx pick the host arch.Also applies to: 100-100
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/docker-compose.yml` at line 63, Remove the explicit platform pin "platform: linux/amd64" that forces amd64 emulation; since the base image "node:22-bookworm-slim" is multi-arch and the project is pure TS/JS, allow buildx to select the host arch by deleting that platform line (or, if a specific native dependency truly requires amd64, replace the line with a short explanatory comment referencing that dependency and why emulation is required). Also apply this same change to the other occurrence mentioned ("100-100") to avoid ARM emulation across the compose files.backend/dockerfile (2)
41-43:/pnpmcopy andPNPM_HOMEare unused at runtime.The final
CMDinvokesnodedirectly, neverpnpm. Copying/pnpmfromprod-depsand settingPNPM_HOME/PATHinruntime-baseadds weight without benefit. Drop them unless you plan to run pnpm scripts inside the container.♻️ Suggested fix
-COPY --from=prod-deps /pnpm /pnpm -ENV PNPM_HOME=/pnpm -ENV PATH=$PNPM_HOME:$PATH - COPY --from=prod-deps /app/node_modules ./node_modules🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/dockerfile` around lines 41 - 43, Remove the unused pnpm artifacts added to the runtime image: delete the COPY --from=prod-deps /pnpm /pnpm line and the ENV PNPM_HOME=/pnpm and ENV PATH=$PNPM_HOME:$PATH lines from the Dockerfile since the final CMD runs node directly and the container does not use pnpm at runtime; if you do intend to run pnpm inside the container instead, keep them and switch CMD/entrypoint to invoke pnpm scripts rather than node.
45-49: Avoid recursivechownovernode_modules— useCOPY --chowninstead.
chown -R node:node /appafter copyingnode_modulesanddistrewrites every file's ownership in a new layer, which roughly doubles the on-disk size for those paths (the previous layer with root-owned files still exists). For a typical NestJS + pnpm tree this is tens to hundreds of MB of avoidable bloat.♻️ Suggested fix
-COPY --from=prod-deps /app/node_modules ./node_modules -COPY --from=build /app/dist ./dist -COPY package.json ./ - -RUN mkdir -p /app/uploads && chown -R node:node /app +COPY --chown=node:node --from=prod-deps /app/node_modules ./node_modules +COPY --chown=node:node --from=build /app/dist ./dist +COPY --chown=node:node package.json ./ + +RUN mkdir -p /app/uploads && chown node:node /app /app/uploads🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/dockerfile` around lines 45 - 49, The Dockerfile currently runs a recursive chown (chown -R node:node /app) which re-writes ownership for all copied layers; instead, change the COPY instructions for node_modules and dist to use the --chown=node:node flag (i.e. COPY --chown=node:node --from=prod-deps /app/node_modules ./node_modules and COPY --chown=node:node --from=build /app/dist ./dist and COPY --chown=node:node package.json ./), and then remove the global chown -R call; keep creating the uploads directory but set ownership only for that path (mkdir -p /app/uploads && chown node:node /app/uploads) so you avoid rewriting ownership for entire /app.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/docker-compose.yml`:
- Around line 52-60: The Qdrant service in docker-compose is exposed without an
API key; either set the QDRANT__SERVICE__API_KEY env to a strong secret (so it
matches the backend's QDRANT_API_KEY read in vectordb.service.ts) or
remove/unpublish the host ports (6333/6334) if only in-network access is
required; update the qdrant service block (service name "qdrant" in the diff) to
add the environment variable QDRANT__SERVICE__API_KEY with a secure value and
ensure the backend's QDRANT_API_KEY is set to the same secret, or delete the
ports section to avoid publishing to the host.
- Around line 38-50: The docker-compose mongo service is exposed without auth
and publishes port '27017:27017'; secure it by adding MONGO_INITDB_ROOT_USERNAME
and MONGO_INITDB_ROOT_PASSWORD environment variables to the "mongo" service and
remove the host port binding (the ports: mapping) for internal-only deployments,
or if you must expose the port keep it but enable auth and strong credentials;
lastly update the application's MONGO_DB_URI default to use the credentials
(e.g. mongodb://<user>:<pass>@mongo:27017/<db>?authSource=admin) so the app
connects with authentication.
- Line 87: The docker-compose entry currently supplies a dangerous default for
JWT_SECRET; remove the fallback and require the variable so the service fails
fast if missing. Replace the current value for the JWT_SECRET environment
variable with the shell-style required expansion (e.g., use
${JWT_SECRET:?JWT_SECRET is required}) so docker compose will error and not
start when JWT_SECRET is unset; update any README or env example to document
setting JWT_SECRET before booting.
---
Nitpick comments:
In `@backend/.dockerignore`:
- Around line 1-9: The .dockerignore is good but add optional entries to further
shrink build context: append patterns like *.log, .DS_Store, editor dirs
(.vscode, .idea), test artifacts (test, *.spec.ts) and non-build docs
(docker-compose.yml, README.md) to the existing file; update the .dockerignore
by adding these patterns so they are excluded during docker build while keeping
current entries (node_modules, dist, coverage, uploads, .env*, .git, npm/pnpm
debug logs) intact.
In `@backend/docker-compose.yml`:
- Line 85: The KAFKA_CLIENT_ID environment variable is set to the same default
("paperstack-backend") for both producer and consumer services, making
client-side metrics and broker logs indistinguishable; update the
KAFKA_CLIENT_ID default for each service so they are unique (e.g., use
"paperstack-api" or "paperstack-producer" for the API/producer service and
"paperstack-consumer" for the consumer service) by changing the corresponding
KAFKA_CLIENT_ID entries in the compose file so each service has its own distinct
default.
- Line 63: Remove the explicit platform pin "platform: linux/amd64" that forces
amd64 emulation; since the base image "node:22-bookworm-slim" is multi-arch and
the project is pure TS/JS, allow buildx to select the host arch by deleting that
platform line (or, if a specific native dependency truly requires amd64, replace
the line with a short explanatory comment referencing that dependency and why
emulation is required). Also apply this same change to the other occurrence
mentioned ("100-100") to avoid ARM emulation across the compose files.
In `@backend/dockerfile`:
- Around line 41-43: Remove the unused pnpm artifacts added to the runtime
image: delete the COPY --from=prod-deps /pnpm /pnpm line and the ENV
PNPM_HOME=/pnpm and ENV PATH=$PNPM_HOME:$PATH lines from the Dockerfile since
the final CMD runs node directly and the container does not use pnpm at runtime;
if you do intend to run pnpm inside the container instead, keep them and switch
CMD/entrypoint to invoke pnpm scripts rather than node.
- Around line 45-49: The Dockerfile currently runs a recursive chown (chown -R
node:node /app) which re-writes ownership for all copied layers; instead, change
the COPY instructions for node_modules and dist to use the --chown=node:node
flag (i.e. COPY --chown=node:node --from=prod-deps /app/node_modules
./node_modules and COPY --chown=node:node --from=build /app/dist ./dist and COPY
--chown=node:node package.json ./), and then remove the global chown -R call;
keep creating the uploads directory but set ownership only for that path (mkdir
-p /app/uploads && chown node:node /app/uploads) so you avoid rewriting
ownership for entire /app.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 987890dd-400e-45a4-a6e2-72119f410771
📒 Files selected for processing (3)
backend/.dockerignorebackend/docker-compose.ymlbackend/dockerfile
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
backend/docker-compose.yml (2)
88-137: Large duplication betweenbackendandconsumerenv blocks — consider YAML anchors.The environment,
depends_on, andvolumessections are almost identical between the two services (and the same pattern repeats indocker-compose.prod.yml). A YAML anchor / merge key significantly reduces drift risk when a new variable is added — today, onlyLLM_PROVIDER/GROQ_API_KEYbeing API-only is the intentional delta; everything else should stay in sync.Sketch
x-backend-env: &backend-env NODE_ENV: production UPLOAD_DIR: /app/uploads MONGO_DB_URI: ${MONGO_DB_URI:-mongodb://${MONGO_INITDB_ROOT_USERNAME:-paperstack_admin}:${MONGO_INITDB_ROOT_PASSWORD:-change-me-mongo-password}@mongo:27017/paperstack?authSource=admin} QDRANT_URL: ${QDRANT_URL:-http://qdrant:6333} QDRANT_API_KEY: ${QDRANT_API_KEY:-} KAFKA_BROKERS: ${KAFKA_BROKERS:-kafka:29092} KAFKA_CLIENT_ID: ${KAFKA_CLIENT_ID:-paperstack-backend} KAFKA_TOPIC: ${KAFKA_TOPIC:-document-processing} EMBEDDING_PROVIDER: ${EMBEDDING_PROVIDER:-fastembed} GEMINI_API_KEY: ${GEMINI_API_KEY:-} x-app-depends: &app-depends kafka: condition: service_healthy mongo: condition: service_healthy qdrant: condition: service_startedThen
backendmerges<<: *backend-envand addsPORT,JWT_*,AUTH_COOKIE_NAME,LLM_PROVIDER,GROQ_API_KEY.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/docker-compose.yml` around lines 88 - 137, The backend and consumer services duplicate nearly identical environment, depends_on, and volumes blocks; extract a shared anchor (e.g., x-backend-env: &backend-env) containing the common env vars (NODE_ENV, UPLOAD_DIR, MONGO_DB_URI, QDRANT_*, KAFKA_*, EMBEDDING_PROVIDER, GEMINI_API_KEY) and a shared depends anchor (e.g., x-app-depends: &app-depends) for the kafka/mongo/qdrant conditions, then merge them into each service using <<: *backend-env and <<: *app-depends, keeping service-specific keys by adding PORT, JWT_SECRET/JWT_EXPIRATION_TIME/AUTH_COOKIE_NAME and LLM_PROVIDER/GROQ_API_KEY only to the backend service and leaving consumer-specific differences as-is; apply the same anchor/merge refactor to the corresponding production compose file to eliminate duplication and reduce drift between the backend and consumer services.
105-106: Hardcoded host port is inconsistent with the prod compose.
docker-compose.prod.ymlparameterizes this as'${BACKEND_HOST_PORT:-8001}:8001', but here it is hardcoded. Consider aligning for consistency so developers can override the host port when 8001 is already in use locally.Proposed change
ports: - - '8001:8001' + - '${BACKEND_HOST_PORT:-8001}:8001'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/docker-compose.yml` around lines 105 - 106, The ports mapping currently uses a hardcoded entry '- '8001:8001'' (under the service's ports key); replace it with the same parameterized form used in production so local hosts can override the port, i.e. use '${BACKEND_HOST_PORT:-8001}:8001' for the ports mapping, and ensure BACKEND_HOST_PORT is documented or present in .env.example if you want developers to be able to override it easily.docker-compose.prod.yml (1)
87-99: Frontend starts before backend HTTP server is ready; add backend healthcheck with service_healthy condition.Currently,
depends_on: - backendwaits only for the container to start, not for Nest to boot and listen on port 8001. A browser request hitting the frontend immediately afterdocker compose upcompletes will fail on API calls until the backend finishes initialization. Add a healthcheck to the backend service (e.g.,curl -f http://localhost:8001/health) and change frontend's dependency todepends_on: backend: condition: service_healthyfor proper ordering.Note:
VITE_API_BASE_URLis baked into the SPA at build time; the frontend/dockerfile and README.md already document this, but be aware the image must be rebuilt if the API URL changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.prod.yml` around lines 87 - 99, Add a healthcheck to the backend service and make the frontend depend on the backend becoming healthy: modify the backend service definition to include a healthcheck that polls the Nest health endpoint (for example using curl -f http://localhost:8001/health with appropriate interval/retries/start-period) and set the frontend's depends_on to use the service_healthy condition (replace the simple "depends_on: - backend" with "depends_on: backend: condition: service_healthy"). Ensure the healthcheck command, and service names "backend" and "frontend" match the existing docker-compose keys so the frontend waits until the Nest server is ready before starting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/.env.example`:
- Around line 5-7: The MONGO_DB_URI uses ${VAR} expansion but
ConfigModule.forRoot() is not enabling dotenv variable expansion, so add
expandVariables: true to the ConfigModule.forRoot() calls (look for
ConfigModule.forRoot(...) in the AppModule and ConsumerModule — e.g., the
functions/classes named ConfigModule.forRoot in app.module.ts and
consumer.module.ts) so MONGO_DB_URI resolves ENV vars, or alternatively replace
the ${MONGO_INITDB_ROOT_USERNAME}/${MONGO_INITDB_ROOT_PASSWORD} placeholders
with actual credentials in the .env/.env.example so the URI is literal; ensure
MONGO_DB_URI and the MONGO_INITDB_ROOT_* vars remain consistent after the
change.
In `@backend/docker-compose.yml`:
- Around line 66-71: Add a Docker healthcheck to the qdrant service and update
backend and consumer to depend on qdrant with condition: service_healthy so they
wait until Qdrant is actually ready on :6333; specifically, add a HEALTHCHECK
that probes the /readyz HTTP endpoint (or a TCP/HTTP probe that validates Qdrant
readiness) to the qdrant service definition and change the backend and consumer
depends_on entries to use service_healthy instead of service_started; if the
official qdrant image lacks curl, implement the probe using a shell /dev/tcp
based check or build a minimal derivative image (FROM qdrant/qdrant + install
curl) or use a sidecar healthcheck container so the HEALTHCHECK can reliably
validate QDRANT_URL readiness.
In `@docker-compose.prod.yml`:
- Around line 18-19: The compose service currently publishes Kafka to the host
via the ports mapping '- '${KAFKA_HOST_PORT:-19092}:9092'', which exposes an
unauthenticated PLAINTEXT_HOST listener; either remove the ports: entry for the
kafka service entirely (so services use internal kafka:29092) or change the
mapping to bind only to loopback (e.g., bind to 127.0.0.1) to avoid exposing
Kafka on 0.0.0.0; apply the same change/consideration to the BACKEND_HOST_PORT
ports mapping used for the backend service to ensure no unintended public
exposure.
- Around line 16-17: Replace short-form depends_on entries with the long-form
service conditions and add healthchecks so services wait for readiness: add a
healthcheck block to the kafka service (and mongo if missing) that checks broker
availability (e.g., kafka-topics.sh or a TCP check on advertised port) and
change dependent services (backend, consumer, frontend) to use depends_on: {
kafka: { condition: service_healthy } } (and mongo: { condition: service_healthy
} where applicable); update all short-form depends_on occurrences referencing
kafka/mongo to the long-form with condition: service_healthy so containers are
gated by health rather than mere start.
- Around line 45-57: Replace bare interpolations that allow empty values with
Docker Compose fail-fast assertions for required secrets: change MONGO_DB_URI,
QDRANT_URL and JWT_SECRET in the service env block to use the ${VAR:?VAR is
required} form (e.g., ${JWT_SECRET:?JWT_SECRET is required}) so the process
fails at compose time if missing, and apply the same pattern to the consumer
service's MONGO_DB_URI and QDRANT_URL envs and to frontend build args like
VITE_API_BASE_URL; leave optional keys (e.g., QDRANT_API_KEY, GROQ_API_KEY,
GEMINI_API_KEY) as-is or document optionality.
---
Nitpick comments:
In `@backend/docker-compose.yml`:
- Around line 88-137: The backend and consumer services duplicate nearly
identical environment, depends_on, and volumes blocks; extract a shared anchor
(e.g., x-backend-env: &backend-env) containing the common env vars (NODE_ENV,
UPLOAD_DIR, MONGO_DB_URI, QDRANT_*, KAFKA_*, EMBEDDING_PROVIDER, GEMINI_API_KEY)
and a shared depends anchor (e.g., x-app-depends: &app-depends) for the
kafka/mongo/qdrant conditions, then merge them into each service using <<:
*backend-env and <<: *app-depends, keeping service-specific keys by adding PORT,
JWT_SECRET/JWT_EXPIRATION_TIME/AUTH_COOKIE_NAME and LLM_PROVIDER/GROQ_API_KEY
only to the backend service and leaving consumer-specific differences as-is;
apply the same anchor/merge refactor to the corresponding production compose
file to eliminate duplication and reduce drift between the backend and consumer
services.
- Around line 105-106: The ports mapping currently uses a hardcoded entry '-
'8001:8001'' (under the service's ports key); replace it with the same
parameterized form used in production so local hosts can override the port, i.e.
use '${BACKEND_HOST_PORT:-8001}:8001' for the ports mapping, and ensure
BACKEND_HOST_PORT is documented or present in .env.example if you want
developers to be able to override it easily.
In `@docker-compose.prod.yml`:
- Around line 87-99: Add a healthcheck to the backend service and make the
frontend depend on the backend becoming healthy: modify the backend service
definition to include a healthcheck that polls the Nest health endpoint (for
example using curl -f http://localhost:8001/health with appropriate
interval/retries/start-period) and set the frontend's depends_on to use the
service_healthy condition (replace the simple "depends_on: - backend" with
"depends_on: backend: condition: service_healthy"). Ensure the healthcheck
command, and service names "backend" and "frontend" match the existing
docker-compose keys so the frontend waits until the Nest server is ready before
starting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0559016-5712-4882-b541-2b4886333ef5
📒 Files selected for processing (7)
.env.prod.examplebackend/.env.examplebackend/docker-compose.ymldocker-compose.prod.ymlfrontend/.dockerignorefrontend/dockerfilefrontend/nginx.conf
✅ Files skipped from review due to trivial changes (3)
- frontend/nginx.conf
- frontend/.dockerignore
- frontend/dockerfile
Summary by CodeRabbit