[Docs] Update the `ARCHITECTURE.md`, `METHODOLOGY.md`, etc. files to reflect current architecture by nojibe · Pull Request #18 · weval-org/app

nojibe · 2026-05-05T13:22:10Z

Description

Realigns ARCHITECTURE.md, METHODOLOGY.md, BLUEPRINT_FORMAT.md, and INTER_AGREEMENT_PLAN.md with the current system. Several diagrams and sections had drifted from the code.

Summary

`ARCHITECTURE.md`

Automated evaluations now attributed to GitHub Actions + Railway (was Netlify), in both the diagram and the "Automated Workflow Components" section.
Storage diagram: adds workshop/, pr-evals/, and ndeltas//vibes//compass/ under live/models/, with descriptions.
Documents the previously undocumented api/internal/ background routes (execute-pr-evaluation-background, execute-api-evaluation-background, generate-pairs-background, …).
Interactive workflow diagram: splits sandbox polling into its own Status API node, corrects S3 path /sandbox-runs/ → live/sandbox/runs/, corrects the dispatch edge label — callBackgroundFunction awaits a response up to a 30s timeout; Railway
continues running the handler to completion if that window is exceeded.
Pipeline diagram: replaces "5-point scale" with "10-class experimental ordinal scale"; links METHODOLOGY.md.
Fixes search_index.json → search-index.json (matches SEARCH_INDEX_FILENAME constant in storageService.ts).
Clarifies that calculateHybridScore is defined in calculationUtils.ts; summaryCalculationUtils.ts orchestrates it, not defines it.
Removes redundant prose: one-line descriptions that repeated adjacent headings, and a core.json lazy-loading paragraph that duplicated the Section 4 architectural concept.
Fixes markdownlint violations (MD022, MD007, MD032): blank lines around headings/lists, 2-space list indent.

`METHODOLOGY.md`

Updates the experimental classification scale to the current 10-class ordinal.
Krippendorff's α note now references the active scale instead of hardcoding the old values.

`BLUEPRINT_FORMAT.md` & `INTER_AGREEMENT_PLAN.md`

Corrects useExperimentalScale description from "9-point" to "10-class"; notes that FORCE_EXPERIMENTAL = true currently makes the flag a no-op.
Refreshes default-judges example (stale qwen/GPT-OSS models → current set); adds cross-family consensus rationale.
Fixes backup-judge trigger condition ("any" primary judge fails, not "all") and corrects the model identifier.
Fixes stale Netlify Blobs reference in INTER_AGREEMENT_PLAN.md — storage uses storageService (S3 in production, local FS in dev).

Testing

Cross-referenced every changed claim against source: workflow files in .github/workflows/, handlers in app/api/internal/, and S3
path constants in storageService.ts.
Re-ran markdownlint locally; no remaining violations.
Previewed both docs on GitHub to confirm diagrams and anchor links render.

… improve storage architecture section with new data organization insights.

…ault judges, and enhance classification scales with new 10-class experimental mapping.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

…le and update storage persistence details for judge agreement in evaluation configurations.

Clarified the description of evaluation paths in the Weval architecture.

- search_index.json → search-index.json (matches SEARCH_INDEX_FILENAME constant in storageService.ts) - Clarify that calculateHybridScore is defined in calculationUtils.ts; summaryCalculationUtils.ts orchestrates it, not defines it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Cut sentences that repeated adjacent headings or duplicated content already stated in Section 4 (workflow descriptions under diagram subheadings, filler section intros, core.json lazy-loading paragraph). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

callBackgroundFunction awaits a response with a 30s timeout — not truly fire-and-forget. For long evaluations the caller's connection drops but Railway continues running the handler to completion. Replace fire-and-forget language with accurate description throughout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

nojibe added 2 commits May 5, 2026 08:42

docs: Refine descriptions of automated and interactive workflows, and…

ce6cc5e

… improve storage architecture section with new data organization insights.

docs: Update methodology for judge evaluation approaches, clarify def…

67477eb

…ault judges, and enhance classification scales with new 10-class experimental mapping.

nojibe marked this pull request as ready for review May 5, 2026 13:29

claude Bot reviewed May 5, 2026

View reviewed changes

docs: Clarify the use of the experimental 10-class classification sca…

7da0017

…le and update storage persistence details for judge agreement in evaluation configurations.

nojibe changed the title ~~[Docs] Update the ARCHITECTURE.md and METHODOLOGY.md file to reflect current architecture~~ [Docs] Update the ARCHITECTURE.md, METHODOLOGY.md, etc. files to reflect current architecture May 5, 2026

nojibe and others added 5 commits May 5, 2026 18:06

Refine evaluation paths in ARCHITECTURE.md

1d2ff7b

Clarified the description of evaluation paths in the Weval architecture.

Fix formatting in ARCHITECTURE.md

ae6cc81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Update the `ARCHITECTURE.md`, `METHODOLOGY.md`, etc. files to reflect current architecture#18

[Docs] Update the `ARCHITECTURE.md`, `METHODOLOGY.md`, etc. files to reflect current architecture#18
nojibe wants to merge 8 commits into
weval-org:mainfrom
nojibe:ken/docs/update-architecture-docs

nojibe commented May 5, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nojibe commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

ARCHITECTURE.md

METHODOLOGY.md

BLUEPRINT_FORMAT.md & INTER_AGREEMENT_PLAN.md

Testing

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nojibe commented May 5, 2026 •

edited

Loading

`ARCHITECTURE.md`

`METHODOLOGY.md`

`BLUEPRINT_FORMAT.md` & `INTER_AGREEMENT_PLAN.md`