feat: add Evaluations resource with read-only evaluation API support by Mirajul-Mohin · Pull Request #129 · vlm-run/vlmrun-node-sdk

Mirajul-Mohin · 2026-04-08T16:07:21Z

Summary

Adds a new Evaluations resource to the Node SDK, exposing read-only evaluation API endpoints. This includes:

evaluations.ts: New resource class with 7 methods — list, get, preview, metrics, summaryStats, uniqueSources, delete
types.ts: 15 new TypeScript interfaces/types for evaluation models (EvaluationRunResponse, EvaluationMetricsResponse, EvaluationListOptions, etc.)
index.ts: Wires Evaluations into the VlmRun client class and re-exports

The implementation follows the same patterns as existing resources (Feedback, Artifacts, etc.), using APIRequestor for HTTP calls.

Updates since last revision

Removed write operations (run, optimizeSkill, rerunSkill) and their associated request/response types per reviewer feedback — only endpoints available in vlm-lab are included
Extracted inline parameter types into named interfaces (EvaluationListOptions, EvaluationPreviewOptions, EvaluationMetricsOptions) per gemini-code-assist review

Review & Testing Checklist for Human

Validate type definitions against the actual backend API schema: Types were inferred from vlm-cloud frontend usage, not from the backend OpenAPI spec. Check that field names, types, and nullability (especially results: Record<string, any>, nullable fields like source_id, accuracy) match the real API contract.
Verify query parameter naming: Confirm that snake_case mappings like created_at__gte, source_type, source_label match the actual API query parameter names.
No unit tests included: Consider whether tests should be added for the new resource before merging.
Test end-to-end: Instantiate VlmRun client, call client.evaluations.list() and client.evaluations.metrics() against a dev/staging environment to confirm the responses deserialize correctly.

Notes

Hardcoded defaults: list() defaults to limit=30, offset=0, order_by="created_at", descending=true. metrics() defaults to limit=20. Verify these are appropriate.
The security CI check failure is pre-existing (npm audit vulnerabilities in transitive dependencies) and unrelated to this PR.
A corresponding Python SDK PR is available at feat: add Evaluations resource with read-only evaluation API support vlmrun-python-sdk#183.

Link to Devin session: https://app.devin.ai/sessions/e8157dfb67844254bf3d6c358c57ebe5
Requested by: @Mirajul-Mohin

- Add Evaluations class with methods: list, get, run, preview, metrics, summaryStats, uniqueSources, delete, optimizeSkill, rerunSkill - Add comprehensive TypeScript type definitions for all evaluation models - Wire Evaluations resource into main VlmRun client class - Support dual base URLs (api for reads, agent for writes) Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>

devin-ai-integration · 2026-04-08T16:07:29Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Mirajul-Mohin · 2026-04-08T16:16:34Z

src/client/evaluations.ts

+   * @param options - Preview query parameters
+   * @returns Preview of available evaluation data
+   */
+  async preview(options: {


Only include endpoints that are avialble in the https://github.com/vlm-run/vlm-lab repo. Discard others from both the PRs

Got it — I'll check which evaluation endpoints actually exist in vlm-lab and remove any that don't from both SDKs.

Done — I've removed run(), optimizeSkill(), and rerunSkill() along with their associated types (RunEvaluationRequest, OptimizeSkillRequest, OptimizeSkillResponse, RerunSkillRequest, RerunSkillResponse, EvaluatorType) from both this PR and the Python SDK PR (#183).

The remaining endpoints are read-only operations (list, get, preview, metrics, summaryStats, uniqueSources, delete) available in the vlm-cloud backend.

I've also extracted inline parameter types into named interfaces (EvaluationListOptions, EvaluationPreviewOptions, EvaluationMetricsOptions) per the gemini-code-assist review suggestion.

…r feedback Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>

gemini-code-assist

Code Review

This pull request adds the Evaluations resource to the Vlm Run API client, including the Evaluations class and associated TypeScript types for managing evaluation runs and skill optimization. Feedback suggests using a separate APIRequestor for write operations to handle routing requirements, addressing type inconsistency for the 'evaluators' field, and extracting inline parameter types into named interfaces to improve maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

src/client/evaluations.ts (25-28)

The PR description raises a valid concern about routing for write operations. Currently, a single APIRequestor is used for all methods, sending them to the same base URL. If write operations (run, optimizeSkill, rerunSkill) must target a different URL (e.g., agent.vlm.run), you'll need a separate APIRequestor for them.

You could create another requestor in the constructor for the write-specific base URL and use it in the respective methods. This is critical for ensuring the API calls are routed correctly.

src/client/types.ts (841)

There appears to be an inconsistency in the type definition for the evaluators property. In RunEvaluationRequest (line 726), it is defined as evaluators?: EvaluatorType[] | null;, allowing it to be undefined, null, or an array. However, here in RerunSkillRequest, it is typed as evaluators?: EvaluatorType[], which only allows undefined or an array.

To prevent potential issues and maintain consistency, please verify if this difference is intentional. If null is a valid value for both, I recommend updating this type to match.

  evaluators?: EvaluatorType[] | null;

src/client/evaluations.ts (36-43)

For better maintainability and code clarity, it's a good practice to extract inline object types for function parameters into named interfaces. This applies to the options parameter in the list, preview (line 98), and metrics (line 126) methods.

You could define interfaces like EvaluationListOptions, EvaluationPreviewOptions, and EvaluationMetricsOptions in src/client/types.ts and import them here.

For example, for the list method:

src/client/types.ts

export interface EvaluationListOptions {
  limit?: number;
  offset?: number;
  orderBy?: string;
  descending?: boolean;
  createdAtGte?: string;
  createdAtLte?: string;
}

src/client/evaluations.ts

import { ..., EvaluationListOptions } from "./types";

// ...

async list(options?: EvaluationListOptions): Promise<EvaluationRunListResponse> {
  // ...
}

…view feedback Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>

devin-ai-integration bot assigned Mirajul-Mohin Apr 8, 2026

devin-ai-integration bot reviewed Apr 8, 2026

View reviewed changes

Mirajul-Mohin commented Apr 8, 2026

View reviewed changes

refactor: remove run, optimizeSkill, rerunSkill endpoints per reviewe…

f3ad832

…r feedback Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

refactor: extract inline parameter types into named interfaces per re…

0be37aa

…view feedback Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>

devin-ai-integration bot changed the title ~~feat: add Evaluations resource with full evaluation API support~~ feat: add Evaluations resource with read-only evaluation API support Apr 8, 2026

devin-ai-integration bot mentioned this pull request Apr 8, 2026

feat: add Evaluations resource with read-only evaluation API support vlm-run/vlmrun-python-sdk#183

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Evaluations resource with read-only evaluation API support#129

feat: add Evaluations resource with read-only evaluation API support#129
Mirajul-Mohin wants to merge 3 commits intomainfrom
devin/1775664097-add-evaluations-resource

Mirajul-Mohin commented Apr 8, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Apr 8, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Mirajul-Mohin Apr 8, 2026

Uh oh!

devin-ai-integration bot Apr 8, 2026

Uh oh!

devin-ai-integration bot Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mirajul-Mohin commented Apr 8, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Apr 8, 2026

🤖 Devin AI Engineer

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Mirajul-Mohin Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

src/client/evaluations.ts (25-28)

src/client/types.ts (841)

src/client/evaluations.ts (36-43)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mirajul-Mohin commented Apr 8, 2026 •

edited by devin-ai-integration bot

Loading