fix: strip embedding fields from search responses#60
Merged
Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Summary
Prevent OpenSearch and MCP responses from exposing
embeddingvectors in returned source metadata.Type of Change
Changes Made
embeddingfrom BM25 and vector search_sourcepayloads.embeddingfrom term query_sourcepayloads.include_metadata=truenever returnsembedding.Motivation and Context
Search responses do not need to expose raw vector payloads, and returning them increases response size while leaking internal indexing details.
How Has This Been Tested?
go test ./...)Test Configuration
Executed:
go test ./internal/pkg/opensearch -timeout 120sgo test ./internal/mcpserver -timeout 120sgo test ./tests/unit/... -timeout 120sImpact Analysis
Components Affected
cmd/)internal/vectorizer/)internal/opensearch/)internal/s3vector/)internal/slackbot/)internal/embedding/)internal/config/)Additional affected slice:
internal/mcpserver/)AWS Resources Impact
Breaking Changes
Migration Guide
Not required.
Dependencies
Documentation
Checklist
go fmt ./...andgo vet ./...)go mod tidyto clean up dependenciesPerformance Considerations
Additional Notes
This change adds exclusion at both query-construction time and MCP response-construction time so the protection holds even if one layer changes later.
Screenshots/Logs
N/A
プルリクエスト(日本語版)
概要
OpenSearch および MCP のレスポンスに
embeddingベクトルが含まれないようにし、不要な内部データの露出を防ぎます。変更の種類
実装された変更
_sourceからembeddingを除外しました。_sourceからembeddingを除外しました。include_metadata=trueの MCP レスポンスでもembeddingを除去する防御を追加しました。動機と背景
検索レスポンスで raw ベクトルを返す必要はなく、レスポンス肥大化と内部実装の露出につながるためです。
テスト方法
go test ./...)テスト設定
実行コマンド:
go test ./internal/pkg/opensearch -timeout 120sgo test ./internal/mcpserver -timeout 120sgo test ./tests/unit/... -timeout 120s影響分析
影響を受けるコンポーネント
cmd/)internal/vectorizer/)internal/opensearch/)internal/s3vector/)internal/slackbot/)internal/embedding/)internal/config/)追加影響:
internal/mcpserver/)AWSリソースへの影響
破壊的変更
移行ガイド
不要です。
依存関係
ドキュメント
チェックリスト
go fmt ./...とgo vet ./...)go mod tidyを実行して依存関係をクリーンアップしたパフォーマンスに関する考慮事項
追加ノート
クエリ生成時と MCP レスポンス生成時の両方で除外しているため、どちらか一方の層が今後変わっても防御が残る構成です。
スクリーンショット/ログ
N/A