Skip to content

feat: add Evaluations resource with read-only evaluation API support#129

Open
Mirajul-Mohin wants to merge 3 commits intomainfrom
devin/1775664097-add-evaluations-resource
Open

feat: add Evaluations resource with read-only evaluation API support#129
Mirajul-Mohin wants to merge 3 commits intomainfrom
devin/1775664097-add-evaluations-resource

Conversation

@Mirajul-Mohin
Copy link
Copy Markdown
Contributor

@Mirajul-Mohin Mirajul-Mohin commented Apr 8, 2026

Summary

Adds a new Evaluations resource to the Node SDK, exposing read-only evaluation API endpoints. This includes:

  • evaluations.ts: New resource class with 7 methods — list, get, preview, metrics, summaryStats, uniqueSources, delete
  • types.ts: 15 new TypeScript interfaces/types for evaluation models (EvaluationRunResponse, EvaluationMetricsResponse, EvaluationListOptions, etc.)
  • index.ts: Wires Evaluations into the VlmRun client class and re-exports

The implementation follows the same patterns as existing resources (Feedback, Artifacts, etc.), using APIRequestor for HTTP calls.

Updates since last revision

  • Removed write operations (run, optimizeSkill, rerunSkill) and their associated request/response types per reviewer feedback — only endpoints available in vlm-lab are included
  • Extracted inline parameter types into named interfaces (EvaluationListOptions, EvaluationPreviewOptions, EvaluationMetricsOptions) per gemini-code-assist review

Review & Testing Checklist for Human

  • Validate type definitions against the actual backend API schema: Types were inferred from vlm-cloud frontend usage, not from the backend OpenAPI spec. Check that field names, types, and nullability (especially results: Record<string, any>, nullable fields like source_id, accuracy) match the real API contract.
  • Verify query parameter naming: Confirm that snake_case mappings like created_at__gte, source_type, source_label match the actual API query parameter names.
  • No unit tests included: Consider whether tests should be added for the new resource before merging.
  • Test end-to-end: Instantiate VlmRun client, call client.evaluations.list() and client.evaluations.metrics() against a dev/staging environment to confirm the responses deserialize correctly.

Notes

Link to Devin session: https://app.devin.ai/sessions/e8157dfb67844254bf3d6c358c57ebe5
Requested by: @Mirajul-Mohin


Open with Devin

- Add Evaluations class with methods: list, get, run, preview, metrics, summaryStats, uniqueSources, delete, optimizeSkill, rerunSkill
- Add comprehensive TypeScript type definitions for all evaluation models
- Wire Evaluations resource into main VlmRun client class
- Support dual base URLs (api for reads, agent for writes)

Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

* @param options - Preview query parameters
* @returns Preview of available evaluation data
*/
async preview(options: {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only include endpoints that are avialble in the https://github.com/vlm-run/vlm-lab repo. Discard others from both the PRs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it — I'll check which evaluation endpoints actually exist in vlm-lab and remove any that don't from both SDKs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — I've removed run(), optimizeSkill(), and rerunSkill() along with their associated types (RunEvaluationRequest, OptimizeSkillRequest, OptimizeSkillResponse, RerunSkillRequest, RerunSkillResponse, EvaluatorType) from both this PR and the Python SDK PR (#183).

The remaining endpoints are read-only operations (list, get, preview, metrics, summaryStats, uniqueSources, delete) available in the vlm-cloud backend.

I've also extracted inline parameter types into named interfaces (EvaluationListOptions, EvaluationPreviewOptions, EvaluationMetricsOptions) per the gemini-code-assist review suggestion.

…r feedback

Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the Evaluations resource to the Vlm Run API client, including the Evaluations class and associated TypeScript types for managing evaluation runs and skill optimization. Feedback suggests using a separate APIRequestor for write operations to handle routing requirements, addressing type inconsistency for the 'evaluators' field, and extracting inline parameter types into named interfaces to improve maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

src/client/evaluations.ts (25-28)

critical

The PR description raises a valid concern about routing for write operations. Currently, a single APIRequestor is used for all methods, sending them to the same base URL. If write operations (run, optimizeSkill, rerunSkill) must target a different URL (e.g., agent.vlm.run), you'll need a separate APIRequestor for them.

You could create another requestor in the constructor for the write-specific base URL and use it in the respective methods. This is critical for ensuring the API calls are routed correctly.

src/client/types.ts (841)

high

There appears to be an inconsistency in the type definition for the evaluators property. In RunEvaluationRequest (line 726), it is defined as evaluators?: EvaluatorType[] | null;, allowing it to be undefined, null, or an array. However, here in RerunSkillRequest, it is typed as evaluators?: EvaluatorType[], which only allows undefined or an array.

To prevent potential issues and maintain consistency, please verify if this difference is intentional. If null is a valid value for both, I recommend updating this type to match.

  evaluators?: EvaluatorType[] | null;

src/client/evaluations.ts (36-43)

medium

For better maintainability and code clarity, it's a good practice to extract inline object types for function parameters into named interfaces. This applies to the options parameter in the list, preview (line 98), and metrics (line 126) methods.

You could define interfaces like EvaluationListOptions, EvaluationPreviewOptions, and EvaluationMetricsOptions in src/client/types.ts and import them here.

For example, for the list method:

src/client/types.ts

export interface EvaluationListOptions {
  limit?: number;
  offset?: number;
  orderBy?: string;
  descending?: boolean;
  createdAtGte?: string;
  createdAtLte?: string;
}

src/client/evaluations.ts

import { ..., EvaluationListOptions } from "./types";

// ...

async list(options?: EvaluationListOptions): Promise<EvaluationRunListResponse> {
  // ...
}

…view feedback

Co-Authored-By: mohin@vlm.run <mohinmirajul@gmail.com>
@devin-ai-integration devin-ai-integration bot changed the title feat: add Evaluations resource with full evaluation API support feat: add Evaluations resource with read-only evaluation API support Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant