https://manager.headyio.com/api
Request:
{"task_type":"deployment"}Description: Creates a new Monte Carlo simulation plan for the specified task type. The simulation analyzes potential outcomes and recommends optimal execution strategies based on historical patterns and current system state.
Parameters:
task_type(string, required): Type of task to simulate (e.g., "deployment", "scaling", "rollback", "migration")config(object, optional): Additional configuration parametersiterations(integer): Number of simulation runs (default: 1000, max: 10000)confidence_level(float): Desired confidence level (default: 0.95, range: 0.80-0.99)parallel(boolean): Enable parallel execution (default: true)timeout_ms(integer): Maximum simulation duration in milliseconds (default: 30000)risk_threshold(float): Acceptable risk level (default: 0.1, range: 0.0-1.0)historical_window_days(integer): Days of historical data to analyze (default: 30)optimization_target(string): Primary optimization goal - "speed", "reliability", "cost" (default: "reliability")fallback_strategy(string): Strategy if primary plan fails - "conservative", "aggressive", "adaptive" (default: "adaptive")async_mode(boolean): Run simulation asynchronously (default: false, recommended for iterations > 5000)constraints(object, optional): Execution constraintsmax_concurrent_operations(integer): Maximum parallel operations (default: 5)resource_limits(object): CPU/memory limitscpu_cores(integer): Maximum CPU cores to allocatememory_gb(integer): Maximum memory in GB
maintenance_window(string): ISO 8601 time window for execution
notifications(object, optional): Alert configurationwebhook_url(string): Callback URL for plan completionemail(string): Email address for notificationsslack_channel(string): Slack channel for alerts
Rate Limit: 100 requests per minute per API key
Authentication: Requires Bearer token in Authorization header
Headers:
Authorization: Bearer <your_api_token>
Content-Type: application/json
X-Request-ID: <optional_unique_request_id>
X-Idempotency-Key: <optional_idempotency_key>Error Codes:
400: Invalid task_type or configuration parameters401: Missing or invalid authentication token403: Insufficient permissions for requested task type422: Unprocessable entity - configuration conflicts detected429: Rate limit exceeded503: Simulation service temporarily unavailable504: Simulation timeout - consider reducing iterations or increasing timeout_ms
Example Request (Basic):
{
"task_type": "deployment"
}Example Request (Advanced Configuration):
{
"task_type": "deployment",
"config": {
"iterations": 5000,
"confidence_level": 0.98,
"parallel": true,
"async_mode": true,
"timeout_ms": 45000,
"risk_threshold": 0.05,
"historical_window_days": 60,
"optimization_target": "reliability",
"fallback_strategy": "adaptive",
"constraints": {
"max_concurrent_operations": 3,
"resource_limits": {
"cpu_cores": 4,
"memory_gb": 8
},
"maintenance_window": "2024-01-15T22:00:00Z/2024-01-16T02:00:00Z"
},
"notifications": {
"webhook_url": "https://hooks.example.com/monte-carlo",
"email": "ops@example.com",
"slack_channel": "#deployments"
}
}
}Example Success Response (Synchronous):
{
"plan_id": "plan_123",
"strategy": "fast_parallel",
"estimated_duration_ms": 1200,
"confidence_score": 0.95,
"risk_assessment": "low",
"simulations_run": 1000,
"success_probability": 0.97,
"recommended_actions": [
"pre_warm_cache",
"enable_health_checks",
"configure_circuit_breakers"
],
"alternative_strategies": [
{
"name": "conservative_sequential",
"confidence_score": 0.98,
"estimated_duration_ms": 3500,
"success_probability": 0.99
},
{
"name": "aggressive_parallel",
"confidence_score": 0.89,
"estimated_duration_ms": 800,
"success_probability": 0.92
}
],
"resource_requirements": {
"estimated_cpu_cores": 2,
"estimated_memory_gb": 4,
"estimated_network_mbps": 100
},
"risk_factors": [
{
"factor": "high_traffic_period",
"impact": "medium",
"mitigation": "schedule_outside_peak_hours"
}
],
"created_at": "2024-01-15T10:30:00Z",
"expires_at": "2024-01-15T11:30:00Z",
"metadata": {
"model_version": "v2.1.0",
"simulation_engine": "monte-carlo-pro"
}
}Example Success Response (Asynchronous):
{
"plan_id": "plan_123",
"status": "processing",
"estimated_completion_time": "2024-01-15T10:31:00Z",
"created_at": "2024-01-15T10:30:00Z",
"poll_url": "https://manager.headyio.com/api/monte-carlo/plan/plan_123"
}Example Error Response:
{
"error": {
"code": "INVALID_PARAMETER",
"message": "confidence_level must be between 0.80 and 0.99",
"field": "config.confidence_level",
"provided_value": 1.05,
"documentation_url": "https://docs.headyio.com/api/errors#INVALID_PARAMETER"
},
"request_id": "req_abc123",
"timestamp": "2024-01-15T10:30:00Z"
}Idempotency:
- Requests with the same
X-Idempotency-Keywithin 24 hours return the same plan_id - Useful for retrying failed requests without creating duplicate plans
- Idempotency keys are case-sensitive and should be unique per intended operation
Webhooks:
When a webhook_url is provided, the following payload is sent on plan completion:
{
"event": "plan.completed",
"plan_id": "plan_123",
"status": "ready",
"timestamp": "2024-01-15T10:30:15Z",
"plan_url": "https://manager.headyio.com/api/monte-carlo/plan/plan_123"
}
**Best Practices**:
- Use `async_mode: true` for simulations with > 5000 iterations to avoid timeout issues
- Set appropriate `confidence_level` based on task criticality (0.95 for standard, 0.98+ for critical)
- Implement webhook handlers with proper error handling and logging for async operations
- Use `X-Idempotency-Key` for all deployment-related requests to ensure safe retries
- Monitor `risk_assessment` and `success_probability` before executing plans with < 0.90 confidence
- Configure `maintenance_window` constraints during off-peak hours for production deployments
- Review `alternative_strategies` when primary strategy shows medium/high risk assessment
- Set `historical_window_days` to at least 60 for more accurate predictions in production environments
- Use `X-Request-ID` for tracking requests across distributed systems and debugging
- Cache plan results locally when making similar requests to reduce API calls and improve performance
- Start with conservative `risk_threshold` values (0.05-0.10) and adjust based on observed outcomes
- Validate `resource_limits` against available infrastructure capacity before submitting plans
- Use `optimization_target: "reliability"` for production deployments and `"speed"` for development environments
- Implement circuit breakers when executing recommended actions to prevent cascading failures
- Store `plan_id` references for audit trails and post-execution analysis
- Test plans in staging environment before production deployment to validate strategy effectiveness
- Set up monitoring alerts for plans with `risk_assessment: "high"` or `success_probability < 0.85`
- Use `fallback_strategy: "adaptive"` for dynamic environments with variable load patterns
- Document rationale for custom `config` parameters in deployment runbooks for team knowledge sharing
- Implement retry logic with jitter to prevent thundering herd problems during high-traffic periods
- Use structured logging with correlation IDs to trace plan execution across microservices
- Implement plan versioning to track configuration changes and enable rollback capabilities
- Set up automated alerts for plan expiration (< 10 minutes remaining) to prevent execution failures
- Use blue-green or canary deployment patterns when executing high-risk plans in production
- Maintain a library of proven configurations for common task types to accelerate future planning
- Validate plan recommendations against organizational policies before execution
- Implement plan approval workflows for production deployments requiring multiple stakeholders
- Use feature flags to gradually roll out new simulation strategies in production
- Monitor plan execution metrics (success rate, duration, resource usage) to refine future configurations
- Establish SLOs for plan generation time and success rates to track API reliability
- Create runbooks for common failure scenarios based on `risk_factors` analysis
- Use semantic versioning for plan configurations to track breaking changes
- Implement automated testing for webhook endpoints before deploying to production
- Archive historical plans for compliance and trend analysis (recommended retention: 1 year)
- Use plan templates for standardized deployments to ensure consistency across teams
**Security Considerations**:
- API tokens should be rotated every 90 days and immediately upon suspected compromise
- Use environment variables or secure vaults (HashiCorp Vault, AWS Secrets Manager) for storing credentials
- Never commit API tokens to version control; use `.gitignore` for credential files
- Webhook URLs must use HTTPS and implement signature verification using HMAC-SHA256
- Validate webhook signatures using the `X-Heady-Signature` header before processing payloads
- Rate limits are enforced per API key; contact support for increased limits in production
- Plans containing sensitive data are automatically encrypted at rest using AES-256
- Implement IP whitelisting for production API keys when possible
- Use separate API keys for development, staging, and production environments
- Monitor API key usage patterns for anomalies that may indicate compromised credentials
- Enable audit logging for all API requests in production environments
- Restrict API key permissions to minimum required scopes (principle of least privilege)
- Implement request signing for additional authentication layer beyond API tokens
- Use short-lived tokens (JWT) with refresh mechanism for enhanced security in high-risk environments
- Enable MFA for accounts with API key management permissions
- Regularly scan webhook endpoints for vulnerabilities (OWASP Top 10)
- Implement rate limiting on webhook receivers to prevent DoS attacks
- Use TLS 1.3 or higher for all API communications; TLS 1.2 is minimum requirement
- Sanitize all user inputs in `config` parameters to prevent injection attacks
- Implement Content Security Policy (CSP) headers on webhook receiver endpoints
- Use certificate pinning for critical production API connections
- Enable API request/response encryption for sensitive task types beyond transport layer security
- Implement automatic token revocation on multiple failed authentication attempts (> 5 within 15 minutes)
- Use dedicated service accounts with limited permissions for automated API integrations
- Maintain an incident response plan for API key compromise scenarios
- Implement zero-trust architecture with mutual TLS (mTLS) for high-security environments
- Enable real-time security monitoring and alerting for suspicious API activity patterns
- Use API gateways with WAF (Web Application Firewall) capabilities for additional protection
- Implement data residency controls to comply with GDPR, CCPA, and other regulations
- Encrypt sensitive fields in plan configurations before transmission (client-side encryption)
- Use hardware security modules (HSMs) for cryptographic key management in regulated industries
- Implement session timeout and automatic logout for inactive API sessions
- Enable detailed security event logging for SOC2, ISO 27001, and PCI-DSS compliance
- Perform regular penetration testing and security audits on API infrastructure
- Use API versioning to deprecate insecure endpoints gracefully without breaking existing integrations
**Performance Tips**:
- Reduce `iterations` for faster results during development/testing (use 100-500)
- Enable `parallel: true` for simulations with > 1000 iterations (can reduce execution time by 60-80%)
- Use cached results when available (5-minute TTL) by making identical requests
- Implement exponential backoff when receiving 429 rate limit errors (start with 1s, max 60s)
- For bulk operations, batch requests and use async mode with webhooks to avoid blocking
- Leverage `poll_url` for async operations instead of repeatedly calling the main endpoint
- Set realistic `timeout_ms` values based on `iterations` count (estimate ~30ms per 1000 iterations)
- Use connection pooling and HTTP keep-alive for multiple sequential API calls
- Compress request payloads when sending large `config` objects (gzip supported)
- Monitor `estimated_completion_time` in async responses to optimize polling intervals
- Consider using GraphQL subscriptions for real-time plan status updates (beta feature)
- Profile API latency by region and route traffic to nearest endpoint for optimal performance
- Use CDN or edge caching for frequently accessed plan results in geographically distributed teams
- Implement client-side request deduplication to prevent redundant API calls
- Pre-warm connections during application startup to reduce first-request latency
- Use HTTP/2 for multiplexing multiple requests over single connection
- Optimize `historical_window_days` based on task type (30 days sufficient for frequent operations)
- Enable response compression (Accept-Encoding: gzip, br) to reduce bandwidth usage by ~70%
- Implement adaptive polling intervals based on plan complexity (longer intervals for high-iteration plans)
- Use database connection pooling for storing plan results to prevent connection exhaustion
- Enable DNS caching to reduce lookup latency for repeated API calls
- Implement request coalescing to merge duplicate in-flight requests
- Use streaming responses for large plan results to reduce memory footprint
- Leverage browser/client caching with appropriate Cache-Control headers for static plan data
- Use persistent HTTP connections to reduce TLS handshake overhead
- Implement request prioritization to handle critical plans before lower-priority requests
- Use lazy loading for plan details to reduce initial payload size
- Enable server-side result pagination for plans with large `alternative_strategies` arrays
- Implement query result caching at application layer for frequently accessed plans
- Use asynchronous I/O operations to prevent blocking on slow network calls
- Profile and optimize JSON serialization/deserialization for large plan objects
- Use WebSocket connections for long-polling scenarios to reduce connection overhead
- Implement request batching windows (e.g., 100ms) to group multiple plan requests
- Use read replicas for fetching historical plan data to reduce load on primary database
- Enable query optimization with proper indexing on `plan_id`, `task_type`, and `created_at` fields
**Webhook Retry Policy**:
- Failed webhook deliveries are retried up to 3 times with exponential backoff
- Retry intervals: 1s, 5s, 25s
- Webhooks must respond with 2xx status code within 5 seconds
**Notes**:
- Plans expire after 1 hour and must be executed or recreated
- Historical data analysis may take longer for first-time task types
- Parallel simulations require additional compute resources and may affect pricing
- Plans can be retrieved using `GET /monte-carlo/plan/{plan_id}` endpoint
- Simulation results are cached for 5 minutes to improve performance on similar requests
- For long-running simulations (iterations > 5000), use `async_mode: true` with webhooks
- The `X-Request-ID` header is useful for correlating logs and debugging issues
- Maximum payload size: 1MB
**Response**:
```json
{"plan_id":"plan_123","strategy":"fast_parallel"}Request:
{"plan_id":"plan_123", "success":true, "latency_ms":1200}Response:
{"status":"recorded"}Query Params:
category=performancestate=active
Response:
[
{"pattern_id": "perf-123", "description":"High latency in deployment tasks"}
]Description: Retrieves aggregated metrics and health status for Monte Carlo simulations
Query Parameters:
time_range(string): Time window for metrics - "1h", "24h", "7d", "30d" (default: "24h")task_type(string, optional): Filter metrics by specific task typeaggregation(string): Aggregation method - "avg", "p50", "p95", "p99" (default: "avg")
Response:
{
"success_rate": 0.97,
"avg_simulation_time_ms": 1850,
"total_plans_created": 15420,
"active_simulations": 23,
"percentiles": {
"p50": 1200,
"p95": 3400,
"p99": 5800
},
"error_rate": 0.03,
"cache_hit_rate": 0.68
}Description: Real-time status tracking for async plan execution
Response:
{
"plan_id": "plan_123",
"status": "in_progress",
"progress_percentage": 67,
"current_iteration": 3350,
"total_iterations": 5000,
"estimated_completion_time": "2024-01-15T10:35:22Z",
"partial_results": {
"current_success_probability": 0.94,
"current_confidence_score": 0.92
}
}Description: System health check and service availability status
Response:
{
"status": "healthy",
"version": "v2.1.0",
"uptime_seconds": 2847392,
"services": {
"simulation_engine": "operational",
"database": "operational",
"cache": "operational",
"webhook_delivery": "operational"
},
"performance": {
"avg_response_time_ms": 145,
"requests_per_second": 234
}
}Description: Retrieve detailed execution logs for a specific plan
Query Parameters:
level(string): Log level filter - "debug", "info", "warn", "error" (default: "info")limit(integer): Maximum number of log entries (default: 100, max: 1000)
Response:
{
"plan_id": "plan_123",
"logs": [
{
"timestamp": "2024-01-15T10:30:05Z",
"level": "info",
"message": "Simulation started with 5000 iterations",
"metadata": {"engine_version": "v2.1.0"}
},
{
"timestamp": "2024-01-15T10:30:12Z",
"level": "warn",
"message": "High resource utilization detected",
"metadata": {"cpu_usage": 0.87, "memory_usage": 0.72}
}
],
"total_entries": 156
}Description: Estimate API costs before executing expensive simulations
Query Parameters:
iterations(integer): Number of planned iterationstask_type(string): Type of taskparallel(boolean): Whether parallel execution is enabled
Response:
{
"estimated_cost_usd": 0.045,
"cost_breakdown": {
"base_cost": 0.01,
"iteration_cost": 0.03,
"parallel_overhead": 0.005
},
"alternative_options": [
{
"iterations": 1000,
"estimated_cost_usd": 0.015,
"confidence_tradeoff": -0.03
}
]
}Description: Retrieve API usage statistics and billing information
Query Parameters:
start_date(string): ISO 8601 start date (default: 30 days ago)end_date(string): ISO 8601 end date (default: now)group_by(string): Grouping dimension - "day", "week", "month", "task_type" (default: "day")
Response:
{
"total_cost_usd": 12.45,
"total_plans": 1523,
"total_iterations": 3847500,
"usage_by_period": [
{
"period": "2024-01-01",
"cost_usd": 0.89,
"plans": 45,
"iterations": 112500
}
],
"cost_by_task_type": {
"deployment": 7.23,
"scaling": 3.12,
"rollback": 2.10
}
}Description: Submit multiple plan requests in a single API call
Request:
{
"plans": [
{"task_type": "deployment", "config": {"iterations": 1000}},
{"task_type": "scaling", "config": {"iterations": 500}},
{"task_type": "rollback", "config": {"iterations": 2000}}
],
"batch_config": {
"parallel_execution": true,
"fail_fast": false,
"webhook_url": "https://hooks.example.com/batch-complete"
}
}Response:
{
"batch_id": "batch_456",
"plan_ids": ["plan_123", "plan_124", "plan_125"],
"status": "processing",
"total_plans": 3,
"estimated_completion_time": "2024-01-15T10:40:00Z"
}Description: Retrieve status and results for a batch operation
Response:
{
"batch_id": "batch_456",
"status": "completed",
"total_plans": 3,
"completed_plans": 3,
"failed_plans": 0,
"results": [
{
"plan_id": "plan_123",
"status": "ready",
"strategy": "fast_parallel"
},
{
"plan_id": "plan_124",
"status": "ready",
"strategy": "conservative_sequential"
},
{
"plan_id": "plan_125",
"status": "ready",
"strategy": "adaptive"
}
],
"created_at": "2024-01-15T10:35:00Z",
"completed_at": "2024-01-15T10:40:15Z"
}Description: AI-powered insights from historical simulation data
Query Parameters:
task_type(string, optional): Filter by task typelookback_days(integer): Historical window (default: 90, max: 365)min_confidence(float): Minimum confidence threshold (default: 0.90)
Response:
{
"insights": [
{
"type": "optimization_opportunity",
"description": "Deployments during 2-4 AM show 15% higher success rate",
"confidence": 0.94,
"potential_improvement": "12% reduction in failure rate",
"recommendation": "Schedule critical deployments during this window"
},
{
"type": "risk_pattern",
"description": "Parallel deployments with >3 concurrent ops show degraded performance",
"confidence": 0.91,
"affected_plans": 234,
"recommendation": "Set max_concurrent_operations to 3 or lower"
}
],
"trend_analysis": {
"success_rate_trend": "improving",
"avg_duration_trend": "stable",
"confidence_score_trend": "improving"
}
}Description: Retrieve historical plan execution data for analysis
Query Parameters:
task_type(string, optional): Filter by task typestart_date(string): ISO 8601 start dateend_date(string): ISO 8601 end datestatus(string, optional): Filter by status - "ready", "failed", "expired"limit(integer): Maximum results (default: 100, max: 1000)offset(integer): Pagination offset (default: 0)
Response:
{
"plans": [
{
"plan_id": "plan_123",
"task_type": "deployment",
"strategy": "fast_parallel",
"success_probability": 0.97,
"confidence_score": 0.95,
"created_at": "2024-01-15T10:30:00Z",
"status": "ready"
}
],
"total_count": 1523,
"has_more": true
}Description: Save frequently used configurations as reusable templates
Request:
{
"template_name": "production_deployment",
"description": "Standard configuration for production deployments",
"task_type": "deployment",
"config": {
"iterations": 5000,
"confidence_level": 0.98,
"parallel": true,
"optimization_target": "reliability",
"constraints": {
"max_concurrent_operations": 3,
"maintenance_window": "weekends"
}
}
}Response:
{
"template_id": "tmpl_789",
"template_name": "production_deployment",
"created_at": "2024-01-15T10:30:00Z"
}Description: List all saved plan templates
Query Parameters:
task_type(string, optional): Filter by task typelimit(integer): Maximum results (default: 50, max: 200)
Response:
{
"templates": [
{
"template_id": "tmpl_789",
"template_name": "production_deployment",
"task_type": "deployment",
"description": "Standard configuration for production deployments",
"created_at": "2024-01-15T10:30:00Z",
"usage_count": 234
}
],
"total_count": 12
}Description: Create a new plan from an existing template
Request:
{
"template_id": "tmpl_789",
"overrides": {
"config": {
"iterations": 3000
}
}
}Response:
{
"plan_id": "plan_456",
"template_id": "tmpl_789",
"strategy": "fast_parallel",
"confidence_score": 0.95
}Description: Delete a saved template
Response:
{
"status": "deleted",
"template_id": "tmpl_789"
}Description: Register a webhook endpoint for event notifications
Request:
{
"url": "https://hooks.example.com/monte-carlo",
"events": ["plan.completed", "plan.failed", "batch.completed"],
"secret": "webhook_secret_key",
"active": true
}Response:
{
"webhook_id": "wh_123",
"url": "https://hooks.example.com/monte-carlo",
"events": ["plan.completed", "plan.failed", "batch.completed"],
"created_at": "2024-01-15T10:30:00Z"
}Description: List all registered webhooks
Response:
{
"webhooks": [
{
"webhook_id": "wh_123",
"url": "https://hooks.example.com/monte-carlo",
"events": ["plan.completed", "plan.failed"],
"active": true,
"last_triggered": "2024-01-15T10:35:00Z",
"success_rate": 0.98
}
]
}Description: Unregister a webhook endpoint
Response:
{
"status": "deleted",
"webhook_id": "wh_123"
}Python:
from heady import MonteCarloClient
client = MonteCarloClient(api_key="your_api_key")
# Create plan with error handling
try:
plan = client.create_plan(
task_type="deployment",
iterations=5000,
async_mode=True,
webhook_url="https://hooks.example.com/mc"
)
print(f"Plan created: {plan.plan_id}")
except HeadyAPIError as e:
print(f"Error: {e.code} - {e.message}")
# Use templates
template = client.create_template(
name="prod_deploy",
task_type="deployment",
config={"iterations": 5000, "confidence_level": 0.98}
)
plan = client.create_plan_from_template(
template_id=template.template_id,
overrides={"config": {"iterations": 3000}}
)
# Batch operations
batch = client.create_batch([
{"task_type": "deployment", "config": {"iterations": 1000}},
{"task_type": "scaling", "config": {"iterations": 500}}
])JavaScript/TypeScript:
import { MonteCarloClient } from '@heady/sdk';
const client = new MonteCarloClient({ apiKey: 'your_api_key' });
// Create plan with promises
const plan = await client.createPlan({
taskType: 'deployment',
config: {
iterations: 5000,
confidenceLevel: 0.98,
asyncMode: true
}
});
// Poll for completion
const result = await client.waitForCompletion(plan.planId, {
pollingInterval: 2000,
timeout: 60000
});
// Get insights
const insights = await client.getInsights({
taskType: 'deployment',
lookbackDays: 90,
minConfidence: 0.90
});
// Batch operations
const batch = await client.createBatch({
plans: [
{ taskType: 'deployment', config: { iterations: 1000 } },
{ taskType: 'scaling', config: { iterations: 500 } }
],
batchConfig: { parallelExecution: true }
});Go: Go:
package main
import (
"context"
"fmt"
"time"
"github.com/heady/sdk-go"
)
func main() {
client := heady.NewMonteCarloClient("your_api_key")
// Create plan with context and timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
plan, err := client.CreatePlan(ctx, &heady.PlanRequest{
TaskType: "deployment",
Config: &heady.PlanConfig{
Iterations: 5000,
ConfidenceLevel: 0.98,
AsyncMode: true,
Parallel: true,
},
})
if err != nil {
fmt.Printf("Error creating plan: %v\n", err)
return
}
// Wait for completion with polling
result, err := client.WaitForCompletion(ctx, plan.PlanID, &heady.WaitOptions{
PollingInterval: 2 * time.Second,
Timeout: 60 * time.Second,
})
if err != nil {
fmt.Printf("Error waiting for plan: %v\n", err)
return
}
fmt.Printf("Plan completed: %s (confidence: %.2f)\n",
result.Strategy, result.ConfidenceScore)
// Get insights
insights, err := client.GetInsights(ctx, &heady.InsightsRequest{
TaskType: "deployment",
LookbackDays: 90,
MinConfidence: 0.90,
})
if err != nil {
fmt.Printf("Error getting insights: %v\n", err)
return
}
fmt.Printf("Insights: %d patterns found\n", len(insights.Patterns))
// Create template
template, err := client.CreateTemplate(ctx, &heady.TemplateRequest{
Name: "prod_deployment",
Description: "Production deployment template",
TaskType: "deployment",
Config: &heady.PlanConfig{
Iterations: 5000,
ConfidenceLevel: 0.98,
},
})
if err != nil {
fmt.Printf("Error creating template: %v\n", err)
return
}
fmt.Printf("Template created: %s\n", template.TemplateID)
// Batch operations
batch, err := client.CreateBatch(ctx, &heady.BatchRequest{
Plans: []*heady.PlanRequest{
{TaskType: "deployment", Config: &heady.PlanConfig{Iterations: 1000}},
{TaskType: "scaling", Config: &heady.PlanConfig{Iterations: 500}},
},
BatchConfig: &heady.BatchConfig{
ParallelExecution: true,
FailFast: false,
},
})
if err != nil {
fmt.Printf("Error creating batch: %v\n", err)
return
}
fmt.Printf("Batch created: %s (%d plans)\n", batch.BatchID, batch.TotalPlans)
// Register webhook
webhook, err := client.RegisterWebhook(ctx, &heady.WebhookRequest{
URL: "https://hooks.example.com/monte-carlo",
Events: []string{"plan.completed", "plan.failed"},
Secret: "webhook_secret_key",
})
if err != nil {
fmt.Printf("Error registering webhook: %v\n", err)
return
}
fmt.Printf("Webhook registered: %s\n", webhook.WebhookID)
}cURL:
# Create plan
curl -X POST https://manager.headyio.com/api/monte-carlo/plan \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-H "X-Idempotency-Key: unique_key_123" \
-d '{
"task_type": "deployment",
"config": {
"iterations": 5000,
"confidence_level": 0.98,
"parallel": true
}
}'
# Get plan status
curl -X GET https://manager.headyio.com/api/monte-carlo/plan/plan_123/status \
-H "Authorization: Bearer your_api_key"
# Create batch
curl -X POST https://manager.headyio.com/api/monte-carlo/batch \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"plans": [
{"task_type": "deployment", "config": {"iterations": 1000}},
{"task_type": "scaling", "config": {"iterations": 500}}
],
"batch_config": {
"parallel_execution": true
}
}'
# Get insights
curl -X GET "https://manager.headyio.com/api/monte-carlo/insights?task_type=deployment&lookback_days=90" \
-H "Authorization: Bearer your_api_key"
# Register webhook
curl -X POST https://manager.headyio.com/api/monte-carlo/webhooks \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://hooks.example.com/monte-carlo",
"events": ["plan.completed", "plan.failed"],
"secret": "webhook_secret_key"
}'
# List webhooks
curl -X GET https://manager.headyio.com/api/monte-carlo/webhooks \
-H "Authorization: Bearer your_api_key"
# Delete webhook
curl -X DELETE https://manager.headyio.com/api/monte-carlo/webhooks/wh_123 \
-H "Authorization: Bearer your_api_key"
# List templates
curl -X GET https://manager.headyio.com/api/monte-carlo/templates \
-H "Authorization: Bearer your_api_key"
# Create plan from template
curl -X POST https://manager.headyio.com/api/monte-carlo/plan/from-template \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"template_id": "tmpl_789",
"overrides": {
"config": {
"iterations": 3000
}
}
}'
# Delete template
curl -X DELETE https://manager.headyio.com/api/monte-carlo/templates/tmpl_789 \
-H "Authorization: Bearer your_api_key"
# Get metrics
curl -X GET https://manager.headyio.com/api/monte-carlo/metrics \
-H "Authorization: Bearer your_api_key"- Rotate API keys every 90 days
- Use separate keys for development, staging, and production environments
- Store keys in secure vaults (e.g., AWS Secrets Manager, HashiCorp Vault)
- Never commit API keys to version control
- Use environment variables or secure configuration management
- Implement IP whitelisting for production API keys
- Monitor API key usage for anomalous patterns
- All data is encrypted in transit (TLS 1.3) and at rest (AES-256)
- Plan data is retained for 90 days, then automatically purged
- GDPR compliant - request data deletion via support
- SOC 2 Type II certified
- HIPAA compliant for healthcare deployments
- No PII is stored in plan configurations or results
- Verify webhook signatures using HMAC-SHA256
- Webhooks include
X-Heady-Signatureheader for validation - Implement webhook URL validation and HTTPS-only endpoints
- Use webhook secrets to prevent unauthorized requests
- Implement replay attack prevention with timestamp validation
Webhook Signature Validation (Python):
import hmac
import hashlib
def verify_webhook(payload, signature, secret):
expected = hmac.new(
secret.encode(),
payload.encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)Webhook Signature Validation (Go):
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
)
func verifyWebhook(payload, signature, secret string) bool {
h := hmac.New(sha256.New, []byte(secret))
h.Write([]byte(payload))
expected := hex.EncodeToString(h.Sum(nil))
return hmac.Equal([]byte(expected), []byte(signature))
}Webhook Signature Validation (JavaScript):
const crypto = require('crypto');
function verifyWebhook(payload, signature, secret) {
const expected = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(signature)
);
}Current Version: v2.1.0
Breaking Changes:
- v2.0.0: Changed
confidence_levelrange from 0-1 to 0.80-0.99 - v2.0.0: Removed deprecated
simple_modeparameter - v2.1.0: Added required
X-Request-IDheader for audit logging
Deprecation Notice:
GET /patternsendpoint will be deprecated in v3.0.0 (use/monte-carlo/insightsinstead)fallback_strategy: "simple"will be removed in v2.2.0
-
Documentation: https://docs.headyio.com
-
API Status: https://status.headyio.com
-
Support Email: api-support@headyio.com
-
Community Forum: https://community.headyio.com
-
GitHub Issues: https://github.com/heady/sdk/issues
-
Slack Community: https://heady-community.slack.com
-
Rate Limits:
- Standard tier: 100 requests/minute per API key
- Premium tier: 1000 requests/minute per API key
- Enterprise tier: Custom limits available
- Rate limit headers included in all responses:
X-RateLimit-Limit: Maximum requests per windowX-RateLimit-Remaining: Remaining requests in current windowX-RateLimit-Reset: Unix timestamp when limit resets
- HTTP 429 response when rate limit exceeded with
Retry-Afterheader - Burst allowance: 20% above tier limit for short periods (< 10 seconds)
- Rate limits apply per endpoint; different endpoints have independent quotas
- GraphQL API has separate rate limiting based on query complexity points
- Webhook deliveries do not count against API rate limits
- Template-based plan creation shares quota with standard plan endpoint
-
SLA & Uptime:
- 99.9% uptime guarantee for production endpoints
- 99.99% uptime for Enterprise tier with multi-region failover
- Planned maintenance windows announced 72 hours in advance
- Real-time status updates at https://status.headyio.com
- Historical uptime metrics available via
/monte-carlo/metrics - Automated failover to secondary regions within 30 seconds
- Service credits available for SLA violations (consult terms of service)
- Status page includes per-endpoint availability metrics
- Incident post-mortems published within 5 business days
-
Cost Optimization:
- Use template-based plans to reduce API calls by up to 60%
- Batch operations cost 30% less than individual requests
- Cache plan results locally for 5 minutes when appropriate
- Monitor
estimated_cpu_coresandestimated_memory_gbto optimize resource allocation - Set
historical_window_daysto minimum required value (default: 60 days) - Reduce
iterationsduring development/testing (100-500 vs production 5000+) - Use
async_modewith webhooks to avoid long-running HTTP connections - Leverage CDN caching for frequently accessed plan results
- Enable response compression to reduce bandwidth costs by ~70%
- Use read replicas for historical data queries to reduce primary database load
- Implement request coalescing to merge duplicate in-flight requests
- Use lazy loading for plan details to reduce initial payload size
- Enable server-side result pagination for plans with large alternative strategies
- Profile and optimize JSON serialization for large plan objects
- Use streaming responses for large plan results to reduce memory footprint
-
Error Recovery Patterns:
- Implement exponential backoff for 429 and 503 errors (start 1s, max 60s)
- Use circuit breaker pattern for webhook endpoints
- Store
request_idfrom error responses for support tickets - Implement dead letter queue for failed webhook deliveries
- Log all API errors with correlation IDs for debugging
- Use idempotency keys (
X-Idempotency-Key) for safe request retries - Implement timeout handling with graceful degradation
- Use fallback strategies when primary simulation strategy fails
- Monitor error rates and alert on anomalies (> 5% error rate)
- Implement retry logic with jitter to prevent thundering herd problems
- Use fallback to cached results when API is unavailable
- Implement request timeout with appropriate values based on iteration count
- Store webhook payloads for manual replay after service recovery
- Use health check endpoints to detect service degradation early
-
Testing & Development:
- Sandbox environment available at
https://sandbox.manager.headyio.com/api - Test API keys prefixed with
test_for development - Mock webhook endpoints available for integration testing
- Postman collection: https://www.postman.com/heady-api/workspace/monte-carlo
- OpenAPI specification: https://manager.headyio.com/api/openapi.json
- Swagger UI documentation: https://manager.headyio.com/api/docs
- SDK examples available for Python, JavaScript, Go, Java, Ruby, .NET
- Integration test suites available in GitHub repository
- Load testing guidelines and benchmarks in documentation
- Sandbox data automatically reset every 24 hours
- Test webhook signature verification with provided mock payloads
- CI/CD integration examples for GitHub Actions, GitLab CI, Jenkins
- Sandbox environment available at
-
Performance Tuning:
- Use
parallel: truefor simulations with > 1000 iterations (2-3x faster, 60-80% reduction) - Enable connection pooling in SDK clients (recommended: 10-20 connections)
- Implement request compression (gzip, brotli) for payloads > 1KB
- Use HTTP/2 for multiplexed requests when available
- Set appropriate
timeout_msbased on iteration count (recommended: iterations * 10ms, ~30ms per 1000) - Leverage
poll_urlfor async operations instead of repeated endpoint calls - Use WebSocket connections for long-polling scenarios
- Implement client-side request deduplication
- Pre-warm connections during application startup
- Profile API latency by region and route to nearest endpoint
- Use database connection pooling for storing plan results
- Enable DNS caching to reduce lookup latency
- Implement adaptive polling intervals based on
estimated_completion_time - Use streaming responses for large plan results
- Enable browser/client caching with Cache-Control headers
- Use persistent HTTP connections with keep-alive to reduce TLS handshake overhead
- Implement request batching windows (e.g., 100ms) to group multiple plan requests
- Use asynchronous I/O operations to prevent blocking on slow network calls
- Enable edge caching for frequently accessed plan results in distributed teams
- Use
-
Security Best Practices:
- Rotate API keys every 90 days for enhanced security
- Use separate keys for development, staging, and production environments
- Store keys in secure vaults (AWS Secrets Manager, HashiCorp Vault)
- Never commit API keys to version control; use
.gitignorefor credential files - Implement IP whitelisting for production API keys when possible
- Monitor API key usage for anomalous patterns
- Webhook URLs must use HTTPS with TLS 1.3 or higher
- Validate webhook signatures using
X-Heady-Signatureheader with HMAC-SHA256 - Enable MFA for accounts with API key management permissions
- Restrict API key permissions to minimum required scopes (least privilege)
- Use short-lived tokens (JWT) with refresh mechanism for high-risk environments
- Regularly scan webhook endpoints for vulnerabilities (OWASP Top 10)
- Implement rate limiting on webhook receivers to prevent DoS attacks
- Sanitize all user inputs in
configparameters to prevent injection attacks - Implement Content Security Policy (CSP) headers on webhook receiver endpoints
- Enable audit logging for all API requests in production environments
- Use request signing for additional authentication layer beyond API tokens
-
Compliance & Certifications:
- SOC 2 Type II certified
- GDPR compliant with data residency options (US, EU, APAC)
- HIPAA compliant tier available for healthcare applications
- ISO 27001 certified for information security management
- PCI DSS compliant for payment-related simulations
- CCPA compliant for California data privacy requirements
- Data retention: 90 days automatic purging (configurable for enterprise)
- Right to deletion requests processed within 30 days
- Audit logging available for SOC2 and compliance requirements
- Encryption: TLS 1.3 in transit, AES-256 at rest
- No PII stored in plan configurations or results
- Data processing agreements (DPA) available for GDPR compliance
- Regular third-party security audits and penetration testing
- Vulnerability disclosure program with responsible disclosure policy
-
Advanced Features:
- Multi-region deployment support for low-latency access
- Custom simulation models available for enterprise customers
- Real-time collaboration features for team-based planning
- Integration with CI/CD pipelines via GitHub Actions, GitLab CI, Jenkins
- Terraform provider available:
terraform-provider-heady - Kubernetes operator for automated plan execution:
heady-k8s-operator - GraphQL API with subscriptions for real-time updates (beta)
- Webhook retry policy: 3 attempts with exponential backoff (1s, 5s, 25s)
- Template management for reusable plan configurations
- Batch processing with parallel execution support
- Insights API for historical trend analysis and pattern detection
- Alternative strategy recommendations with risk assessment
- Adaptive fallback strategies for dynamic environments
- Resource limit validation against infrastructure capacity
- Monitoring and observability via
/monte-carlo/metricsendpoint - Idempotency support with 24-hour key retention
- Custom webhook headers for authentication and routing
- Plan versioning and rollback capabilities for tracking configuration changes over time
- Automated plan archival and retention policies for compliance requirements
- Plan comparison tools to analyze differences between versions
- Template inheritance for building complex configurations from base templates
- Custom simulation algorithms for domain-specific optimization scenarios
- Multi-tenancy support with isolated plan execution environments
- Advanced analytics dashboard for visualizing plan performance trends
- Machine learning-based recommendations for optimal iteration counts
- Disaster recovery automation with automated failover testing
- Blue-green deployment simulation with traffic split analysis
- Canary deployment planning with gradual rollout strategies
- A/B testing simulation for feature flag deployments
- Cost forecasting and budget alerts based on historical usage patterns
- Integration with monitoring tools (Datadog, New Relic, Prometheus, Grafana)
- Custom metrics collection for business-specific KPIs
- Scheduled plan execution with cron-like syntax
- Plan dependencies and execution ordering for complex workflows
- Dry-run mode for validating plans without actual execution
- Rollback simulation to assess risk before reverting changes
- Multi-cloud support (AWS, Azure, GCP) with provider-specific optimizations
- Infrastructure-as-Code integration (Terraform, Pulumi, CloudFormation)
- Service mesh integration for microservices deployment planning
- Database migration simulation with downtime estimation
- Capacity planning recommendations based on growth projections
- SLO/SLA validation against proposed changes
- Chaos engineering integration for resilience testing
- Feature flag impact analysis before rollout
- Dependency graph visualization for complex deployments
- Risk scoring with configurable thresholds and escalation policies
- Approval workflows with multi-stage gates for production deployments
- Distributed tracing integration with OpenTelemetry for request correlation
- Rate limiting and throttling configuration for API protection
- Circuit breaker pattern simulation for fault tolerance testing
- Load testing integration with k6/Artillery for performance validation
- Query optimization recommendations for database operations
- CDN configuration planning for static asset delivery
- WebSocket connection simulation for real-time features
- OAuth2/JWT authentication strategy validation
- RBAC policy simulation for access control testing
- GDPR/CCPA compliance validation for data handling workflows
- Automated security scanning integration (Snyk, Trivy)
- Container image optimization recommendations
- Kubernetes resource quota and limit planning
- Auto-scaling threshold optimization based on historical patterns
- Memory leak detection and prevention strategies
- Session management optimization with Redis clustering
- Email delivery simulation with SendGrid/Postmark integration
- File upload optimization for S3-compatible storage
- Image optimization pipeline planning with CDN integration
- Database connection pooling optimization (max 20 connections)
- Async job processing simulation with Bull/BullMQ
- Webhook delivery reliability testing with retry logic (3 attempts: 1s, 5s, 25s)
- API versioning strategy validation (v1, v2, v3)
- GraphQL query cost analysis and optimization
- GraphQL introspection security controls (development only)
- Response compression strategy (gzip/brotli) optimization
- TLS 1.3 certificate management and renewal automation
- DDoS protection and rate limiting configuration
- IPv6 support validation and migration planning
- HTTP/2 and HTTP/3 (QUIC) performance optimization
- Edge caching strategy with configurable TTL
- Slack/Discord notification integration for deployment events
- GitHub Actions CI/CD pipeline optimization
- Pre-deployment smoke test configuration
- Post-deployment validation hook execution
- Automatic rollback on health check failure simulation
- Multi-environment management (dev/staging/prod) orchestration
- Environment variable templating and secrets management
- HashiCorp Vault integration for secure credential storage
- Database schema migration automation (Prisma/TypeORM)
- Schema version control and rollback capabilities
- Read replica configuration for database scaling
- Query performance monitoring and optimization recommendations
- Cron job scheduling optimization for background tasks
- Content Security Policy (CSP) header configuration
- X-Frame-Options and HSTS security header validation
- OpenAPI/Swagger documentation auto-generation
- SDK generation for multiple programming languages (Python, JavaScript, Go, Java, Ruby, .NET)
- Mock server configuration for development/testing
- Carbon footprint tracking for sustainability metrics
- Incident response playbook automation
- Business continuity planning with disaster recovery procedures
- SLA monitoring and enforcement automation
- Audit logging for SOC2, ISO27001, PCI-DSS compliance
- Data residency controls for multi-region deployments (US, EU, APAC)
- Penetration testing automation and vulnerability scanning
- Zero-trust architecture validation with mTLS
- Hardware security module (HSM) integration for key management
- Client-side encryption for sensitive plan configurations
- API gateway and WAF integration for additional protection
- Real-time security monitoring and anomaly detection
- Session timeout and automatic logout configuration
- Certificate pinning for critical production connections
- Idempotency key management with 24-hour retention
- Request deduplication and conflict resolution strategies
- Connection pooling optimization (10-20 connections recommended)
- HTTP keep-alive and persistent connection management
- DNS caching configuration for reduced lookup latency
- Request batching windows (100ms) for grouped plan submissions
- Streaming response optimization for large result sets
- Browser/client caching strategy with Cache-Control headers
- Edge caching for distributed team collaboration
- Adaptive polling intervals based on completion estimates
- WebSocket long-polling optimization for real-time updates
- Pre-warming connection pools during application startup
- Regional routing for nearest endpoint selection
- Asynchronous I/O operation optimization
- Dead letter queue configuration for failed operations
- Health check endpoint monitoring (30s intervals) for service degradation
- Fallback to cached results (5 minute TTL) during API unavailability
- Graceful degradation strategies for partial outages
- Graceful shutdown handling (SIGTERM) for zero-downtime deployments
- Sandbox environment testing with automatic 24-hour data reset
- Mock webhook endpoint validation with signature verification
- Integration test suite automation for CI/CD pipelines
- Load testing benchmark configuration and performance baselines
- Postman collection and OpenAPI specification generation
- Request timeout optimization (30s default, configurable based on iterations)
- Container image caching for faster deployment cycles
- Horizontal pod autoscaling based on CPU/memory thresholds
- Log aggregation with structured JSON logging for observability
- Correlation ID tracking across distributed systems (X-Request-ID)
- Exponential backoff retry strategy for 429 and 503 errors (1s to 60s max)
- Thundering herd prevention with jittered retry intervals
- Point-in-time recovery for database backups
- Custom domain support with automatic DNS configuration
- Cost optimization recommendations based on resource usage analytics
- Resource usage forecasting for capacity planning
- Infrastructure-as-Code (IaC) validation and linting
- Container registry security scanning
- Canary deployment traffic shifting strategies
- API token rotation policies (90-day recommended)
- Secrets rotation automation and encryption at rest
- CORS configuration for cross-origin request handling
- Compliance reporting automation (SOC2, ISO27001)
- Vulnerability disclosure program integration
- Data processing agreements (DPA) for GDPR compliance
- Right to deletion request automation (30-day SLA)
- Automatic data purging after 90 days (configurable for enterprise)
- Multi-stage approval gates for high-risk production changes
- Team collaboration features with role-based access control
- Plan template library for standardized deployments
- Historical trend analysis via Insights API
- Alternative strategy recommendations with risk assessment
- Adaptive fallback strategies for dynamic load environments
- Resource limit validation against infrastructure capacity
- Monitoring and observability via
/monte-carlo/metricsendpoint - Custom webhook headers for authentication and routing
- Terraform provider (
terraform-provider-heady) for IaC workflows - Kubernetes operator (
heady-k8s-operator) for automated execution - GraphQL API with real-time subscriptions (beta)
- Batch processing with parallel execution support
- Plan expiration management (1-hour TTL with renewal options)
- First-time task type analysis with extended processing time
- Result caching (5 minutes) for similar requests
- Maximum payload size enforcement (1MB limit)
- Asynchronous mode for long-running simulations (>5000 iterations)
- Poll URL tracking for async operation status
- Estimated completion time calculation for progress monitoring
- HIPAA compliance tier for healthcare applications
- PCI DSS compliance for payment-related simulations
- Third-party security audit integration
- Responsible disclosure policy for vulnerability reporting
- Real-time collaboration features for team-based planning
- Semantic versioning for plan configuration tracking
- Runbook documentation for common failure scenarios
- Feature flag gradual rollout infrastructure
- Request/response encryption beyond transport layer
- Automatic token revocation on failed authentication (>5 attempts in 15 minutes)
- Service account management with least privilege permissions
- Incident response plan templates for API key compromise
- Security event logging for compliance and audit trails
- API deprecation strategy with graceful migration paths
- Success rate monitoring with configurable SLO thresholds
- Plan execution duration tracking and optimization
- Resource usage analytics per plan type
- Webhook signature validation using HMAC-SHA256
- MFA enforcement for API key management accounts
- Short-lived JWT tokens with refresh mechanism
- OWASP Top 10 vulnerability scanning for webhook endpoints
- Input sanitization for injection attack prevention
- Audit trail maintenance with 1-year retention recommendation
- Cost estimation and budget tracking features
- Team collaboration with role-based access control (RBAC)