Skip to content

[Docs] Update the ARCHITECTURE.md, METHODOLOGY.md, etc. files to reflect current architecture#18

Open
nojibe wants to merge 8 commits into
weval-org:mainfrom
nojibe:ken/docs/update-architecture-docs
Open

[Docs] Update the ARCHITECTURE.md, METHODOLOGY.md, etc. files to reflect current architecture#18
nojibe wants to merge 8 commits into
weval-org:mainfrom
nojibe:ken/docs/update-architecture-docs

Conversation

@nojibe
Copy link
Copy Markdown

@nojibe nojibe commented May 5, 2026

Description

Realigns ARCHITECTURE.md, METHODOLOGY.md, BLUEPRINT_FORMAT.md, and INTER_AGREEMENT_PLAN.md with the current system. Several diagrams and sections had drifted from the code.

Summary

ARCHITECTURE.md

  • Automated evaluations now attributed to GitHub Actions + Railway (was Netlify), in both the diagram and the "Automated Workflow Components" section.
  • Storage diagram: adds workshop/, pr-evals/, and ndeltas//vibes//compass/ under live/models/, with descriptions.
  • Documents the previously undocumented api/internal/ background routes (execute-pr-evaluation-background, execute-api-evaluation-background, generate-pairs-background, …).
  • Interactive workflow diagram: splits sandbox polling into its own Status API node, corrects S3 path /sandbox-runs/live/sandbox/runs/, corrects the dispatch edge label — callBackgroundFunction awaits a response up to a 30s timeout; Railway
    continues running the handler to completion if that window is exceeded.
  • Pipeline diagram: replaces "5-point scale" with "10-class experimental ordinal scale"; links METHODOLOGY.md.
  • Fixes search_index.jsonsearch-index.json (matches SEARCH_INDEX_FILENAME constant in storageService.ts).
  • Clarifies that calculateHybridScore is defined in calculationUtils.ts; summaryCalculationUtils.ts orchestrates it, not defines it.
  • Removes redundant prose: one-line descriptions that repeated adjacent headings, and a core.json lazy-loading paragraph that duplicated the Section 4 architectural concept.
  • Fixes markdownlint violations (MD022, MD007, MD032): blank lines around headings/lists, 2-space list indent.

METHODOLOGY.md

  • Updates the experimental classification scale to the current 10-class ordinal.
  • Krippendorff's α note now references the active scale instead of hardcoding the old values.

BLUEPRINT_FORMAT.md & INTER_AGREEMENT_PLAN.md

  • Corrects useExperimentalScale description from "9-point" to "10-class"; notes that FORCE_EXPERIMENTAL = true currently makes the flag a no-op.
  • Refreshes default-judges example (stale qwen/GPT-OSS models → current set); adds cross-family consensus rationale.
  • Fixes backup-judge trigger condition ("any" primary judge fails, not "all") and corrects the model identifier.
  • Fixes stale Netlify Blobs reference in INTER_AGREEMENT_PLAN.md — storage uses storageService (S3 in production, local FS in dev).

Testing

  • Cross-referenced every changed claim against source: workflow files in .github/workflows/, handlers in app/api/internal/, and S3
    path constants in storageService.ts.
  • Re-ran markdownlint locally; no remaining violations.
  • Previewed both docs on GitHub to confirm diagrams and anchor links render.

nojibe added 2 commits May 5, 2026 08:42
… improve storage architecture section with new data organization insights.
…ault judges, and enhance classification scales with new 10-class experimental mapping.
@nojibe nojibe marked this pull request as ready for review May 5, 2026 13:29
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

…le and update storage persistence details for judge agreement in evaluation configurations.
@nojibe nojibe changed the title [Docs] Update the ARCHITECTURE.md and METHODOLOGY.md file to reflect current architecture [Docs] Update the ARCHITECTURE.md, METHODOLOGY.md, etc. files to reflect current architecture May 5, 2026
nojibe and others added 5 commits May 5, 2026 18:06
Clarified the description of evaluation paths in the Weval architecture.
- search_index.json → search-index.json (matches SEARCH_INDEX_FILENAME
  constant in storageService.ts)
- Clarify that calculateHybridScore is defined in calculationUtils.ts;
  summaryCalculationUtils.ts orchestrates it, not defines it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cut sentences that repeated adjacent headings or duplicated content
already stated in Section 4 (workflow descriptions under diagram
subheadings, filler section intros, core.json lazy-loading paragraph).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
callBackgroundFunction awaits a response with a 30s timeout — not
truly fire-and-forget. For long evaluations the caller's connection
drops but Railway continues running the handler to completion.
Replace fire-and-forget language with accurate description throughout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant