Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions src/services/componentSearchIndex.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -593,6 +593,133 @@ describe("lexicalSearch", () => {
expect(results.map((result) => result.digest)).toEqual(["target"]);
});

describe("search quality expectations", () => {
const qualityIndex = buildSearchIndex([
makeSourced({
digest: "train-test-split",
spec: {
name: "train_test_split",
description: "Split a dataset into train and test partitions.",
inputs: [{ name: "dataset", type: "Dataset" }],
outputs: [{ name: "train" }, { name: "test" }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "train-model",
spec: {
name: "train_model",
description: "Fit a classifier on tabular data.",
inputs: [{ name: "table", type: "Dataset" }],
outputs: [{ name: "model", type: { artifact: "Model" } }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "filter-rows",
spec: {
name: "filter_rows",
description: "Filter dataset rows with a boolean condition.",
inputs: [{ name: "dataset" }],
outputs: [{ name: "filtered_dataset" }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "load-csv",
spec: {
name: "load_csv_file",
description: "Read a CSV file into a tabular dataframe.",
inputs: [{ name: "path", type: "String" }],
outputs: [{ name: "table", type: "Dataset" }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "local-upload",
spec: {
name: "upload_file",
description: "Upload a file to a local directory.",
inputs: [{ name: "file" }],
outputs: [{ name: "path" }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "gcs-upload",
spec: {
name: "upload_to_gcs",
description: "Upload a file to GCS cloud storage.",
inputs: [{ name: "file" }],
outputs: [{ name: "gcs_uri" }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "predict-labels",
spec: {
name: "predict_labels",
description: "Infer labels from examples using a trained model.",
inputs: [{ name: "model" }, { name: "examples" }],
outputs: [{ name: "predictions" }],
implementation: { container: { image: "x" } },
},
}),
makeSourced({
digest: "text-embeddings",
spec: {
name: "create_text_embeddings",
description: "Create vector embeddings for text documents.",
inputs: [{ name: "documents" }],
outputs: [{ name: "embeddings", type: "EmbeddingVector" }],
implementation: { container: { image: "x" } },
},
}),
]);

it.each([

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

No ambiguous-multi-match or empty/nonsense-result case in the suite. Optional: add a query that should return multiple relevant components and assert both are present in the top-N.

{
query: "split dataset into train and test",
expectedDigests: ["train-test-split"],
},
{
query: "fit model on tabular data",
expectedDigests: ["train-model"],
},
{
query: "read csv file",
expectedDigests: ["load-csv"],
},
{
query: "filtr dataset rows",
expectedDigests: ["filter-rows"],
},
{
query: "infer labels from model",
expectedDigests: ["predict-labels"],
},
{
query: "make vector embeddings for text",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

No query in this suite requires synonym expansion: each shares a literal token/stem with its target (this one matches on literal vector/embeddings/text), so the synonym feature is never isolated and the suite would not catch it regressing. Add 1-2 synonym-only cases, e.g. "vectorize text documents" → ["text-embeddings"] and "store a file in a bucket" → ["gcs-upload"].

expectedDigests: ["text-embeddings"],
},
{
query: "upload a file but not to GCS",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

This case passes even if negative-constraint parsing is removed entirely: the plain query "upload a file" already ranks local-upload #1 (upload_file matches "file"; upload_to_gcs does not), and the assertion only pins rank #1 via slice(0,1). It does not exercise the exclusion. Assert the exclusion directly (e.g. expect(results.map(r => r.digest)).not.toContain("gcs-upload")), or shape the fixture so gcs-upload would out-rank local-upload absent the negative clause.

expectedDigests: ["local-upload"],
},
])(
"returns expected results for '$query'",
({ query, expectedDigests }) => {
const results = lexicalSearch(qualityIndex, query).map(
(result) => result.digest,
);

expect(results.slice(0, expectedDigests.length)).toEqual(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

These assertions pin only rank #1, not the ordering of close competitors nor that irrelevant components stay out of the visible top-5. Optional: add a couple of 2-element expectedDigests where the secondary match is stable (deterministic tie-break, so not flaky).

expectedDigests,
);
},
);
});

it("does not special-case single-letter non-stop-word tokens", () => {
const index = buildSearchIndex([
makeSourced({
Expand Down
Loading