Skip to content

Commit 6f19648

Browse files
feat: add doccano-django sample for keploy postgres-v3 bind regression
Minimum reproducer for the polymorphic-resourcetype failure that motivated keploy/integrations#177. The sample wraps doccano v1.8.5 + django-rest-polymorphic + postgres 13.3 — the same shape the bug originally surfaced on (keploy/enterprise PRs #1889 and #1964, pipelines 3556 / 3572). The sample is a thin orchestration layer around the upstream doccano backend image; the upstream version pin lives in this directory's Dockerfile so future doccano releases that change the bug-triggering shape are addressed by retagging here, not by hunting through lane scripts in the keploy CI tree. Contents per the keploy-ci-debug skill anatomy: * `Dockerfile` — `FROM doccano/doccano:backend`, version pin in comments. The wrapper exists so a future patch (e.g. backporting a doccano fix that breaks the bug repro) is a one-line change in this repo. * `docker-compose.yml` — postgres-13.3-alpine + the sample's doccano image on a fixed subnet, env-driven so the lane scripts can override port / IP / network as needed. Two-phase boot (DOCCANO_SKIP_BOOTSTRAP=0 → migrations + admin user; volume retained; DOCCANO_SKIP_BOOTSTRAP=1 → gunicorn-only against the populated volume) so record/replay see a deterministic DB state. * `flow.sh` — two subcommands. `bootstrap` logs in as admin and installs a fixed authtoken_token row so record-time and replay-time API calls share the same Authorization header (without this, every replay run would diff on a fresh random token). `record-traffic` drives ~10 HTTP calls — POST a polymorphic TextClassificationProject, GET it back, PATCH it, plus dependent category-types / examples / metrics reads that exercise the multi-bind django_content_type lookups the bug FIFO-collapses. * `README.md` — bug shape, how to run locally, and pointers to the CI lanes that consume the sample. Lanes that will pin to this sample in the next two PRs: * keploy/integrations `.woodpecker/doccano-postgres.yml` (three-way matrix, depends-on prepare-and-run). * keploy/enterprise `.woodpecker/doccano-linux.yml` (already exists; will be migrated from inline compose generation to cloning this sample once the integrations lane lands). Signed-off-by: Akash Kumar <meakash7902@gmail.com>
1 parent 57856de commit 6f19648

4 files changed

Lines changed: 374 additions & 0 deletions

File tree

doccano-django/Dockerfile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Thin wrapper around doccano's official backend image at the version
2+
# this sample tracks. Pinning here (rather than in each lane script
3+
# under keploy/integrations / keploy/enterprise) means a future
4+
# doccano release that changes the bug-triggering shape is a one-line
5+
# retag in this repo, not a hunt across the CI tree.
6+
#
7+
# Upstream tag: doccano/doccano:backend (the rolling backend tag)
8+
# Source pin: doccano/doccano @ v1.8.5
9+
# https://github.com/doccano/doccano/releases/tag/v1.8.5
10+
#
11+
# v1.8.5 was the version exercised on keploy/enterprise pipeline 3556
12+
# (PR #1889) and pipeline 3572 (PR #1964 minimal repro) where the
13+
# bug originally manifested.
14+
FROM doccano/doccano:backend

doccano-django/README.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# doccano-django — keploy postgres-v3 simple-Query bind regression sample
2+
3+
Minimal reproducer for the doccano polymorphic-resourcetype failure
4+
that motivated [keploy/integrations#177](https://github.com/keploy/integrations/pull/177)
5+
("fix(postgres-v3): extract simple-Query literals into bindValues").
6+
7+
The sample wraps doccano (Django + django-rest-polymorphic + psycopg2)
8+
at version `v1.8.5` against postgres `13.3-alpine`. The shape under
9+
test: a polymorphic Django model (`Project` with subclass
10+
`TextClassificationProject`) created over the REST API and re-read via
11+
DRF's polymorphic queryset. Without the integrations fix, every
12+
`SELECT … FROM django_content_type WHERE app_label = $1 AND model = $2`
13+
at replay returns the same recorded mock (the matcher's
14+
`pickSessionFallback` FIFO-collapses every variant onto the first
15+
recording when the bind signature is empty), so the polymorphic
16+
serializer can't resolve the project's subclass and `resourcetype`
17+
flips from `"TextClassificationProject"` to `"Project"`.
18+
19+
The bug is in keploy's recorder + replayer simple-Query path; doccano
20+
is just a vehicle. Same pattern would reproduce on any Django app
21+
that:
22+
23+
* Uses a polymorphic ORM (django-polymorphic / django-rest-polymorphic).
24+
* Sends parameterised reads via psycopg2's simple-Query mode
25+
(literals interpolated into the SQL text rather than carried in a
26+
separate Bind packet).
27+
* Exercises the polymorphic queryset across multiple HTTP requests
28+
against the same recorded backend.
29+
30+
## What's in here
31+
32+
* `Dockerfile` — thin wrapper around `doccano/doccano:backend` pinning
33+
the upstream version this sample tracks. Future doccano releases
34+
that change the bug-triggering shape are addressed by retagging
35+
here, not by scattering version pins across the lane scripts in
36+
`keploy/integrations` / `keploy/enterprise`.
37+
* `docker-compose.yml` — the orchestration: postgres-13 alongside
38+
the doccano backend, on a fixed subnet so the lane scripts can
39+
rely on stable IPs across record/replay phases.
40+
* `flow.sh` — the minimum reproducer traffic, ~10 HTTP calls. POST
41+
`/v1/projects` (creates a `TextClassificationProject`), then GET
42+
list / GET single / PATCH single / a few dependent reads. The
43+
GET / PATCH responses are what diverge under the bug — POST
44+
passes either way because the in-memory subclass instance shapes
45+
the response without consulting the DB.
46+
* `keploy.yml.template` — keploy config skeleton (proxy port, DNS
47+
port, container name placeholders) that lane scripts in
48+
`keploy/integrations` and `keploy/enterprise` `envsubst` into a
49+
per-job copy.
50+
51+
## Running locally
52+
53+
```sh
54+
# Bring doccano up + bootstrap the admin token (one-shot; the volume
55+
# is reused for the actual record run).
56+
docker compose up -d
57+
./flow.sh bootstrap
58+
59+
# Record
60+
keploy record \
61+
-c "docker compose up" \
62+
--container-name doccano_backend \
63+
--proxy-port 18081 --dns-port 18082
64+
65+
# (in another shell, while keploy record is up)
66+
./flow.sh record-traffic
67+
# → SIGINT keploy when traffic returns
68+
69+
# Replay
70+
keploy test \
71+
-c "docker compose up" \
72+
--containerName doccano_backend \
73+
--apiTimeout 60 --delay 20 \
74+
--proxy-port 18081 --dns-port 18082
75+
```
76+
77+
Expected outcome with the integrations fix in place: 0 failures,
78+
all `is_text_project: true` / `resourcetype: "TextClassificationProject"`
79+
across the project-read responses.
80+
81+
Expected outcome **without** the fix: tests covering GET-after-POST
82+
project reads fail with `is_text_project: true → false` and
83+
`resourcetype: "TextClassificationProject" → "Project"`.
84+
85+
## CI lanes that consume this sample
86+
87+
* `keploy/integrations``.woodpecker/doccano-postgres.yml` /
88+
`.ci/scripts/python/doccano/doccano-linux.sh`. Three-way matrix
89+
(record-build × replay-build, record-latest × replay-build,
90+
record-build × replay-latest) — the cross-binary cells stay red
91+
until both keploy releases pick up the bind-extraction fix.
92+
* `keploy/enterprise``.woodpecker/doccano-linux.yml` /
93+
`.ci/scripts/doccano-linux.sh`. Same three-way matrix wired to
94+
the enterprise compat-matrix harness.
95+
96+
Both clone this directory at the branch / tag pinned by the
97+
respective lane script.
98+
99+
## Related
100+
101+
* [keploy/integrations#177](https://github.com/keploy/integrations/pull/177) — the fix this sample falsifies.
102+
* [keploy/enterprise#1889](https://github.com/keploy/enterprise/pull/1889) — original failing PR where the bug surfaced.
103+
* [django-rest-polymorphic](https://github.com/apirobot/django-rest-polymorphic) — the upstream library whose serialisation path the bug breaks.

doccano-django/docker-compose.yml

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# doccano-django sample compose. Postgres + doccano backend on a
2+
# fixed subnet so the lane scripts in keploy/integrations and
3+
# keploy/enterprise can pin the DB IP without runtime discovery.
4+
#
5+
# Two-phase boot pattern (used by the lane scripts but valid for
6+
# local runs too):
7+
#
8+
# 1. DOCCANO_SKIP_BOOTSTRAP=0 → backend runs migrations, creates
9+
# the admin user, sets the auth token; once that returns we
10+
# `compose down` the stack but keep the named volume so the
11+
# DB state persists.
12+
# 2. DOCCANO_SKIP_BOOTSTRAP=1 → backend re-launches in
13+
# gunicorn-only mode against the populated volume; recording /
14+
# replay run against this incarnation.
15+
#
16+
# The split is what gives the lane a deterministic DB starting state
17+
# without paying the migration cost on every record/replay invocation.
18+
services:
19+
backend:
20+
build:
21+
context: .
22+
dockerfile: Dockerfile
23+
container_name: doccano_backend
24+
init: true
25+
stop_grace_period: 5s
26+
ports:
27+
- "${DOCCANO_APP_PORT:-18080}:8000"
28+
environment:
29+
ADMIN_USERNAME: ${DOCCANO_ADMIN_USER:-admin}
30+
ADMIN_PASSWORD: ${DOCCANO_ADMIN_PASSWORD:-password}
31+
ADMIN_EMAIL: ${DOCCANO_ADMIN_EMAIL:-admin@example.com}
32+
DATABASE_URL: postgres://doccano:doccano@${DOCCANO_DB_IP:-172.34.0.10}:5432/doccano?sslmode=disable
33+
ALLOW_SIGNUP: "False"
34+
DEBUG: "False"
35+
DJANGO_SETTINGS_MODULE: config.settings.production
36+
DOCCANO_SKIP_BOOTSTRAP: "${DOCCANO_SKIP_BOOTSTRAP:-0}"
37+
depends_on:
38+
postgres:
39+
condition: service_healthy
40+
networks:
41+
- doccano-net
42+
43+
postgres:
44+
image: postgres:13.3-alpine
45+
container_name: doccano_db
46+
stop_grace_period: 5s
47+
environment:
48+
POSTGRES_USER: doccano
49+
POSTGRES_PASSWORD: doccano
50+
POSTGRES_DB: doccano
51+
healthcheck:
52+
test: ["CMD-SHELL", "pg_isready -U doccano -d doccano"]
53+
interval: 5s
54+
timeout: 5s
55+
retries: 20
56+
volumes:
57+
- doccano-db-data:/var/lib/postgresql/data
58+
networks:
59+
doccano-net:
60+
ipv4_address: ${DOCCANO_DB_IP:-172.34.0.10}
61+
62+
networks:
63+
doccano-net:
64+
driver: bridge
65+
ipam:
66+
config:
67+
- subnet: ${DOCCANO_NETWORK_SUBNET:-172.34.0.0/24}
68+
69+
volumes:
70+
doccano-db-data:

doccano-django/flow.sh

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
#!/usr/bin/env bash
2+
#
3+
# Minimum-reproducer traffic for the keploy postgres-v3 simple-Query
4+
# bind regression on doccano. Two subcommands:
5+
#
6+
# bootstrap — log in as admin, replace the random
7+
# authtoken_token row with a fixed token so
8+
# record-time and replay-time API calls share
9+
# the same Authorization header. Runs once
10+
# against the DOCCANO_SKIP_BOOTSTRAP=0 launch.
11+
# record-traffic — drive the actual recording: POST a polymorphic
12+
# project, GET it back twice, PATCH it, plus a
13+
# couple of dependent reads. The GET / PATCH
14+
# responses are what diverge under the bug.
15+
#
16+
# Inputs (all overrideable, defaults chosen to match the
17+
# docker-compose.yml in this directory):
18+
#
19+
# DOCCANO_APP_PORT host-side port the backend is exposed on
20+
# DOCCANO_ADMIN_USER admin login (set on first boot)
21+
# DOCCANO_ADMIN_PASSWORD admin password
22+
# DOCCANO_FIXED_TOKEN deterministic auth token to install
23+
# DOCCANO_DB_CONTAINER postgres container name (for psql)
24+
# DOCCANO_PHASE a label spliced into the project name so
25+
# record/replay phase logs are
26+
# distinguishable; safe-to-omit for local
27+
# runs.
28+
set -Eeuo pipefail
29+
30+
DOCCANO_APP_PORT="${DOCCANO_APP_PORT:-18080}"
31+
DOCCANO_ADMIN_USER="${DOCCANO_ADMIN_USER:-admin}"
32+
DOCCANO_ADMIN_PASSWORD="${DOCCANO_ADMIN_PASSWORD:-password}"
33+
DOCCANO_FIXED_TOKEN="${DOCCANO_FIXED_TOKEN:-ac38262065f0ae1476b6a707d9d697a101764a6b}"
34+
DOCCANO_DB_CONTAINER="${DOCCANO_DB_CONTAINER:-doccano_db}"
35+
DOCCANO_PHASE="${DOCCANO_PHASE:-local}"
36+
37+
base="http://127.0.0.1:${DOCCANO_APP_PORT}"
38+
h_token="Authorization: Token ${DOCCANO_FIXED_TOKEN}"
39+
h_json='Content-Type: application/json'
40+
41+
# Login + fixed-token install. Deterministic auth header is what lets
42+
# the recorded HTTP test cases match at replay — without it, every
43+
# replay run would carry a fresh random token in the headers and the
44+
# matcher would diff on the Authorization line.
45+
doccano_bootstrap_token() {
46+
local timeout=${1:-180}
47+
local start_ts
48+
start_ts=$(date +%s)
49+
50+
while true; do
51+
local code
52+
code=$(curl -sS -o /tmp/doccano-login.json -w '%{http_code}' \
53+
-H 'Content-Type: application/json' \
54+
-X POST "${base}/v1/auth/login/" \
55+
-d "{\"username\":\"${DOCCANO_ADMIN_USER}\",\"password\":\"${DOCCANO_ADMIN_PASSWORD}\"}" || true)
56+
if [ "$code" = "200" ] && jq -e '.key' /tmp/doccano-login.json >/dev/null 2>&1; then
57+
break
58+
fi
59+
if [ $(( $(date +%s) - start_ts )) -ge "$timeout" ]; then
60+
echo "Timed out waiting for doccano login (last code: ${code})" >&2
61+
cat /tmp/doccano-login.json >&2 || true
62+
return 1
63+
fi
64+
sleep 2
65+
done
66+
67+
docker exec -i "$DOCCANO_DB_CONTAINER" psql -U doccano -d doccano -v ON_ERROR_STOP=1 <<SQL
68+
UPDATE authtoken_token
69+
SET key='${DOCCANO_FIXED_TOKEN}'
70+
WHERE user_id=(SELECT id FROM auth_user WHERE username='${DOCCANO_ADMIN_USER}');
71+
SQL
72+
73+
# Confirm the fixed token is live before returning.
74+
start_ts=$(date +%s)
75+
while true; do
76+
local code
77+
code=$(curl -sS -o /tmp/doccano-me.json -w '%{http_code}' \
78+
-H "$h_token" "${base}/v1/me" || true)
79+
if [ "$code" = "200" ] && jq -e ".username == \"${DOCCANO_ADMIN_USER}\"" /tmp/doccano-me.json >/dev/null 2>&1; then
80+
return 0
81+
fi
82+
if [ $(( $(date +%s) - start_ts )) -ge "$timeout" ]; then
83+
echo "Timed out waiting for fixed token (last code: ${code})" >&2
84+
cat /tmp/doccano-me.json >&2 || true
85+
return 1
86+
fi
87+
sleep 2
88+
done
89+
}
90+
91+
# Record traffic: hits exactly the endpoints whose responses
92+
# diverge under the bug, plus the dependent reads needed to make the
93+
# polymorphic resolver fire its multi-bind django_content_type
94+
# lookups (the actual root cause we're falsifying).
95+
doccano_record_traffic() {
96+
local project_resp project_id
97+
local label_resp label_id
98+
local example_resp example_id
99+
local p
100+
101+
# Worker-cache warmup. doccano runs 4 gunicorn workers; each
102+
# worker keeps its own per-process Django ContentType cache and
103+
# populates it lazily on the first polymorphic-resolver query
104+
# that worker handles. Recording lanes that terminate with
105+
# SIGINT (rather than waiting on a long --record-timer) need
106+
# every worker's cache warmed before the explicit test traffic
107+
# fires — otherwise cold workers fire their own
108+
# django_content_type lookups at replay-time, find empty perTest
109+
# cohorts, and the dependent endpoints return HTTP 500.
110+
#
111+
# 4 workers × 4 requests = 16 calls is gunicorn-dispatch-jitter
112+
# safe; /v1/me is the cheapest authenticated endpoint and
113+
# exercises the same auth-token + ContentType chain as real
114+
# test calls, so each iteration cleanly warms one worker's
115+
# cache.
116+
local warm_idx
117+
for warm_idx in $(seq 1 16); do
118+
curl -sS -H "$h_token" "$base/v1/me" >/dev/null 2>&1 || true
119+
done
120+
121+
curl -sS -H "$h_token" "$base/v1/users" >/dev/null || true
122+
curl -sS "$base/v1/health/" >/dev/null || true
123+
124+
# POST a polymorphic project. resourcetype="TextClassificationProject"
125+
# is the polymorphic discriminator that django-rest-polymorphic
126+
# uses to instantiate the right subclass; the bug shows up at
127+
# the GET / PATCH side, not on this POST (the in-memory subclass
128+
# instance shapes the response without consulting the DB).
129+
project_resp=$(curl -fsS -H "$h_token" -H "$h_json" -X POST "$base/v1/projects" \
130+
-d "{\"name\":\"keploy-${DOCCANO_PHASE}-project\",\"project_type\":\"DocumentClassification\",\"description\":\"sample project\",\"guideline\":\"label the text\",\"resourcetype\":\"TextClassificationProject\"}")
131+
project_id=$(printf '%s' "$project_resp" | jq -r '.id')
132+
[ -n "$project_id" ] && [ "$project_id" != "null" ]
133+
134+
p="$base/v1/projects/${project_id}"
135+
136+
# The reads that fail under the bug (GET list / GET single /
137+
# PATCH single all return resourcetype="Project" instead of
138+
# "TextClassificationProject" because the polymorphic queryset
139+
# can't resolve the subclass without working bind-discrimination
140+
# on django_content_type).
141+
curl -sS -H "$h_token" "$base/v1/projects" >/dev/null || true
142+
curl -sS -H "$h_token" "$p" >/dev/null || true
143+
curl -sS -H "$h_token" -H "$h_json" -X PATCH "$p" \
144+
-d '{"description":"updated by sample"}' >/dev/null || true
145+
146+
# Dependent reads — exercise the polymorphic resolver on the
147+
# nested resources so the cohort surfaces multiple variants of
148+
# the django_content_type lookup at record time. Without these,
149+
# the recording wouldn't capture the multi-bind shape and the
150+
# falsifying half of the matrix wouldn't have anything to fail
151+
# on.
152+
curl -sS -H "$h_token" "$p/my-role" >/dev/null || true
153+
curl -sS -H "$h_token" "$p/members" >/dev/null || true
154+
155+
label_resp=$(curl -sS -H "$h_token" -H "$h_json" -X POST "$p/category-types" \
156+
-d '{"text":"positive","background_color":"#00ff00","text_color":"#ffffff"}' 2>/dev/null || true)
157+
label_id=$(jq -r '.id // empty' <<<"$label_resp" 2>/dev/null || true)
158+
curl -sS -H "$h_token" "$p/category-types" >/dev/null || true
159+
160+
example_resp=$(curl -fsS -H "$h_token" -H "$h_json" -X POST "$p/examples" \
161+
-d '{"text":"Keploy CI sample text","meta":{"source":"sample"}}')
162+
example_id=$(jq -r '.id' <<<"$example_resp")
163+
if [ -n "$example_id" ] && [ "$example_id" != "null" ]; then
164+
curl -sS -H "$h_token" "$p/examples/${example_id}" >/dev/null || true
165+
if [ -n "$label_id" ]; then
166+
curl -sS -H "$h_token" -H "$h_json" -X POST "$p/examples/${example_id}/categories" \
167+
-d "{\"label\":${label_id}}" >/dev/null || true
168+
fi
169+
fi
170+
171+
# Metrics endpoints — additional polymorphic queries.
172+
curl -sS -H "$h_token" "$p/metrics/progress" >/dev/null || true
173+
curl -sS -H "$h_token" "$p/metrics/member-progress" >/dev/null || true
174+
}
175+
176+
case "${1:-}" in
177+
bootstrap)
178+
doccano_bootstrap_token "${2:-180}"
179+
;;
180+
record-traffic)
181+
doccano_record_traffic
182+
;;
183+
*)
184+
echo "usage: $0 {bootstrap|record-traffic}" >&2
185+
exit 2
186+
;;
187+
esac

0 commit comments

Comments
 (0)