Skip to content

nxank4/agent-platform-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vertex Agent Smoke

Minimal Python smoke-test scaffold for evaluating Vertex AI Agent Engine Runtime platform capabilities.

This repo is not a full benchmark. It has a dry-run friendly local path and an optional cloud path that deploys a tiny custom Python agent to Vertex AI Agent Engine, invokes it once, and collects evidence.

What This Smoke Test Verifies

  • A local custom agent can run.
  • The agent exposes at least two tools:
    • retrieve_document(query: str) -> dict
    • submit_report(report: str, sensitive: bool = False, approval: bool = False) -> dict
  • The agent routes prompts to the expected tool.
  • submit_report blocks sensitive submissions unless approval=True.
  • Structured JSONL audit logs, checkpoint events, and trace events are written.
  • Metric variables are recorded for Reliability, Recoverability, Governance, Observability, and Portability.
  • Evidence artifacts are generated under evidence/.

Local Simulation

These parts run locally and do not call Google Cloud:

  • Tool routing and tool execution.
  • Governance policy check for sensitive report submission.
  • Checkpoint simulation via results/checkpoints.jsonl.
  • Audit simulation via results/audit_log.jsonl.
  • Trace simulation via results/trace_events.jsonl.
  • Smoke metrics in results/smoke_metrics.json and results/smoke_metrics.csv.
  • Evidence copies in evidence/.

The local runner executes:

  • T01: retrieve document and summarize.
  • T02: call submit_report with non-sensitive content.
  • T03: block submit_report with sensitive content.
  • T04: simulate failure after checkpoint and resume.
  • T05: simulate duplicate prevention using an idempotency key.

Google Cloud Requirements

Google Cloud is required only when you are ready to test actual Vertex AI Agent Runtime / Agent Engine deployment and managed platform behavior.

Cloud-side work may include:

  • Creating or selecting a Google Cloud project.
  • Enabling required APIs.
  • Authenticating with gcloud.
  • Packaging an ADK-compatible agent.
  • Deploying to Agent Runtime / Agent Engine.
  • Collecting managed Cloud Logging, Monitoring, and Trace evidence.

This scaffold does not hardcode credentials. Cloud scripts are explicitly invoked and can create billable resources only when you choose to run them.

Cost warning:

  • Enabling APIs is usually not the expensive part.
  • Creating a Cloud Storage staging bucket can incur storage and operation charges.
  • Deploying an Agent Engine runtime can create billable managed resources.
  • Clean up the deployed agent and staging bucket after testing.

Create a Google Cloud Free Trial Project

  1. Go to https://cloud.google.com/free.
  2. Start the free trial and complete the account setup.
  3. Open the Google Cloud Console.
  4. Create a new project and note the project ID.
  5. Open Cloud Shell for a browser-based environment with gcloud preinstalled.

Review Google Cloud pricing and free trial terms before deploying resources.

Enable APIs

From Cloud Shell or a workstation with gcloud installed:

export PROJECT_ID="your-project-id"
export REGION="us-central1" # optional; defaults to us-central1 in cloud scripts
./scripts/setup_gcloud.sh

The setup script enables:

  • aiplatform.googleapis.com
  • logging.googleapis.com
  • monitoring.googleapis.com
  • cloudtrace.googleapis.com
  • artifactregistry.googleapis.com
  • cloudbuild.googleapis.com

Run Local Smoke

Use Python 3.11 or newer.

cd vertex-agent-smoke
python --version
./scripts/run_local_smoke.sh

You can also run it as a module:

PYTHONPATH=src python -m vertex_agent_smoke.smoke_runner

Generated outputs:

  • results/audit_log.jsonl
  • results/checkpoints.jsonl
  • results/trace_events.jsonl
  • results/smoke_metrics.json
  • results/smoke_metrics.csv
  • evidence/smoke_evidence.json
  • copied evidence files under evidence/

Cloud Setup

Install the optional cloud dependency in a Python 3.11+ environment:

python -m pip install -U 'google-cloud-aiplatform[agent_engines]'

Authenticate without hardcoding credentials:

gcloud auth login
gcloud config set project "${PROJECT_ID}"

For local development outside Cloud Shell, you may also need Application Default Credentials:

gcloud auth application-default login

Set the required environment variables:

export PROJECT_ID="your-project-id"
export REGION="us-central1"
export AGENT_DISPLAY_NAME="vertex-agent-smoke"
export STAGING_BUCKET="gs://your-globally-unique-agent-staging-bucket"

Create Staging Bucket

Create the staging bucket explicitly:

./scripts/create_staging_bucket.sh

The script uses gcloud storage buckets create and skips creation if the bucket is already visible in the configured project.

Deploy To Agent Engine

The cloud deploy path uses the official Vertex AI SDK / Agent Engine SDK style shown in the Agent Engine docs:

  • from google.cloud.aiplatform import vertexai
  • client = vertexai.Client(project=..., location=...)
  • client.agent_engines.create(...)

Run:

./scripts/deploy_agent_runtime.sh

The script verifies:

  • PROJECT_ID
  • REGION, defaulting to us-central1
  • STAGING_BUCKET
  • AGENT_DISPLAY_NAME
  • active gcloud authentication
  • current gcloud project

On success it writes:

  • evidence/deployed_agent_resource.txt
  • evidence/deployed_agent_deploy_metadata.json

If the installed SDK does not expose the documented Agent Engine API, the script fails with instructions to upgrade instead of trying an undocumented fallback.

Invoke Deployed Agent

Invoke the deployed cloud agent once:

./scripts/invoke_deployed_agent.sh

Or pass a custom prompt:

./scripts/invoke_deployed_agent.sh "retrieve document for cloud smoke evidence"

The response is written to:

  • evidence/deployed_agent_response.json

Collect Cloud Evidence

Collect basic Google Cloud evidence:

./scripts/collect_cloud_evidence.sh

Generated files:

  • evidence/gcloud_enabled_services.txt
  • evidence/gcloud_config.txt
  • evidence/cloud_logging_recent.json
  • evidence/cloud_trace_monitoring_todos.txt

Cloud Trace and Cloud Monitoring exports are left as TODOs because the stable CLI filters and metric type names can vary by deployed runtime and SDK version. The script records that gap explicitly instead of producing misleading evidence.

Cleanup

Delete the Agent Engine deployment after testing:

./scripts/delete_deployed_agent.sh

The script uses the documented client-based SDK deletion path:

client.agent_engines.delete(name=RESOURCE_NAME, force=True)

The resource name comes from:

cat evidence/deployed_agent_resource.txt

Then delete the staging bucket when you no longer need deployment artifacts:

gcloud storage rm -r "${STAGING_BUCKET}" --project="${PROJECT_ID}"

This permanently deletes bucket contents. Review the bucket before running the cleanup command.

Official Docs

The cloud path is based on these official Google Cloud docs:

Legacy Placeholder Removed

scripts/deploy_agent_runtime.sh now performs an optional SDK deployment. It is no longer a placeholder.

Metrics

The smoke runner records these variables:

T, Ts, C, Cc, R, V, S, Sc, F, Fr, A, Ad, P, Pe, E, El, W, Wa, Q, Qc, D, Dref, O, Ov, M, Ms, Tm, Tref, N, Np, Rsc, Rcsc, Gsc, Osc, Psc, OS

Score formulas:

Rsc = 1/3 * (Ts/T + Cc/C + V/R)
Rcsc = 1/3 * (Sc/S + Fr/F + (1 - Ad/A))
Gsc = 1/3 * (Pe/P + El/E + Wa/W)
Osc = 1/3 * (Qc/Q + (1 - D/Dref) + Ov/O)
Psc = 1/3 * (Ms/M + (1 - Tm/Tref) + (1 - Np/N))
OS = 1/5 * (Rsc + Rcsc + Gsc + Osc + Psc)

Division by zero is handled safely by returning 0.0 for undefined ratios.

Evidence Checklist

After a local run, verify:

  • evidence/smoke_evidence.json shows all capability flags as true.
  • evidence/audit_log.jsonl contains tool selection, policy block, and duplicate prevention events.
  • evidence/checkpoints.jsonl contains checkpoint save and resume events.
  • evidence/trace_events.jsonl contains trace start, end, and simulated error events.
  • evidence/smoke_metrics.json contains all metric variables and computed scores.
  • evidence/smoke_metrics.csv contains the same variables in tabular form.

After a cloud run, verify:

  • evidence/deployed_agent_resource.txt contains the Agent Engine resource name.
  • evidence/deployed_agent_response.json contains one successful remote response.
  • evidence/gcloud_enabled_services.txt contains enabled Google Cloud services.
  • evidence/gcloud_config.txt captures the active gcloud configuration.
  • evidence/cloud_logging_recent.json contains recent logs or a clear warning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors