feat: add Azure provider support with full infra, provisioning, and Bastion workflows#161
Merged
Conversation
**Added:**
- Documentation for Azure authentication validation proof-of-concept, including
prerequisites, usage, and validation steps - `infra/azure/README.md`
- Terragrunt configuration to deploy a single Windows Server 2022 VM in Azure for
authentication testing, with local state and tagging -
`infra/azure/eastus/auth-validation/terragrunt.hcl`
- Reusable Terraform module to provision a Windows VM in Azure with:
- Self-contained networking: VNet, subnet, NSG, NIC
- Resource group and all resources named from a common prefix
- Configurable admin username, VM size, address space, subnet, image, and tags
- Random password generation for the admin user
- Sensitive outputs for admin credentials and resource metadata
- Provider, version, and variable definitions
- Comprehensive README and autogenerated documentation
- Source files: `main.tf`, `network.tf`, `outputs.tf`, `variables.tf`,
`versions.tf`, and `README.md` in `modules/terraform-azure-vm/`
…labs
**Added:**
- Azure provider implementation for dreadgoad CLI, including:
- Azure provider registration and interface implementation in Go
- Azure VM lifecycle operations (discovery, start/stop/destroy, run command)
- CLI integration for `apply`, `destroy`, `output`, and `validate` actions
- Modularized Azure infrastructure for GOAD labs:
- `terraform-azure-instance-factory` module for per-VM provisioning with bootstrap
- `terraform-azure-net` module for VNet, subnets, NAT gateway, and NSG
- Example Terragrunt structure for Azure lab deployments, including sample
environment and region config files, and PowerShell bootstrap templates
- Pre-commit hook for `terraform_docs` to enforce module documentation
**Changed:**
- CLI provider flag and help text to include "azure" as a supported provider
- Provider factory and config logic to support Azure region resolution
- Terragrunt host registry path in GOAD deployment to use absolute path for Azure compatibility
**Removed:**
- Legacy `terraform-azure-vm` module in favor of the new composable instance
and network modules
- Old Azure proof-of-concept Terragrunt configuration for auth validation
…I support **Added:** - Azure Bastion support: - New `terraform-azure-bastion` Terraform module for optional Bastion host deployment - CLI command group `bastion` with subcommands for status, SSH, RDP, and port tunneling - Bastion discovery and connection logic in Go (`cli/internal/azure/bastion.go`) - Terragrunt configuration for Bastion module with opt-in gating - Prerequisite and usage documentation for Bastion workflows - In-VNet Ansible controller support: - New `terraform-azure-controller` Terraform module for an SSH-accessible Ansible controller VM in a private subnet - Terragrunt configuration for controller, gated via environment variable or flag - Cloud-init template to bootstrap Ansible and dependencies on controller VM - CLI command group `runcmd` for Azure Run Command: - Subcommands to run PowerShell commands across instances or open a REPL-like shell per host - Interactive shell simulation over Run Command with $PWD persistence, output capping, and cancellation support - Hostname/resource ID resolution for Azure VMs via inventory and live discovery - Provider interface `InteractiveShell` for abstracting interactive shell support - Compile-time interface checks for new provider capabilities - Azure-specific checks in `doctor` for CLI, Bastion extension, and SSH extension **Changed:** - Azure provider implementation: - Added `StartInteractiveShell` to enable interactive shell sessions via Run Command - VM discovery now exposes Azure resource tags for downstream use (e.g., controller key auto-selection) - AWS provider implementation: - Renamed `StartInteractiveSession` to `StartInteractiveShell` for interface consistency - Registered as `InteractiveShell` provider - `infra_cmd.go`: - Added `--with-bastion` and `--with-controller` flags to Terragrunt commands for Azure - Azure infra actions set environment variables for module gating - Refactored Terragrunt module execution to support both AWS and Azure via shared logic - `doctor`: - Runs Azure-specific prerequisite checks when provider is Azure, including CLI, login, Bastion, and SSH extension validation - Documentation: - Expanded Azure provider docs with Bastion and controller workflows, runcmd usage, and REPL caveats - Updated module README to describe new Azure modules and usage patterns - Various Terragrunt configurations for Azure: - Updated to include or reference new Bastion and controller modules - Standardized `include` blocks for root config consistency - Added local variables for Bastion/controller options in environment configs **Removed:** - N/A (no logical removals detected; only code refactoring and new features introduced)
**Added:** - Introduced `azure.ProvisionTunnel` which chains Azure Bastion port-forwarding and a SOCKS5 proxy to enable WinRM connectivity from the local machine to private Azure VMs via the controller - `cli/internal/azure/provision_tunnel.go` - Added provider-aware SOCKS5 tunnel selection to provisioning logic, enabling support for Azure environments that require Bastion relays for connectivity - Provided logic to discover the Ansible controller VM and automatically locate its ephemeral SSH key for tunneling in Azure **Changed:** - Refactored provisioning logic to use a generic `closableTunnel` interface, supporting both Ludus and Azure SOCKS5 tunnels - Updated Azure instance HCL definitions to explicitly specify Windows Server 2016 or 2019 Datacenter images instead of 2022, improving compatibility - Improved documentation and comments for provisioning workflows, especially around tunnel setup and Ansible connection variables - Updated Ludus SSH client configuration to support `InsecureIgnoreHostKey` and `IdentitiesOnly` flags, enabling reliable SSH through ephemeral Bastion tunnels and avoiding SSH agent key exhaustion - Modified bootstrap script template for Azure VMs to remove redundant TLS 1.2 SCHANNEL registry tweaks (no longer needed for 2019/2016), and clarified WinRM and firewall setup steps for clarity and minimalism - Enhanced Azure VM bootstrap extension to use a script hash in the public settings, ensuring script re-execution when the template changes **Removed:** - Removed unnecessary TLS 1.2 registry fixups from Azure bootstrap script since Windows Server 2019/2016 images have appropriate defaults - Eliminated redundant or outdated comments in provisioning and bootstrap logic
…ll logic **Added:** - Implemented per-VM serialization for Azure Run Command to avoid 409 errors when concurrent commands target the same instance (cli/internal/azure/provider.go) - Added instance-level mutex management for AzureProvider to ensure only one Run Command runs at a time per VM **Changed:** - Rewrote DSC module installation logic to install modules sequentially instead of checking and installing in parallel, unblocking files after install and improving reliability on Windows hosts (ansible/roles/common/tasks/main.yml) - Updated documentation to describe the new sequential DSC module installation approach (ansible/roles/common/README.md) - Enhanced runChecks in Validator to flush check output to stdout as soon as each check completes, giving operators real-time progress feedback (instead of in-order submission buffering) (cli/internal/validate/validator.go) - Improved bootstrap script for Azure VMs to reliably rename the built-in Administrator (SID-500) account to 'administrator' before provisioning, ensuring compatibility with GOAD playbooks and idempotency across reboots (infra/azure/goad-deployment/test/centralus/goad/templates/bootstrap.ps1.tpl) **Removed:** - Removed parallel DSC module install steps and logic for async status polling in favor of the new sequential approach (ansible/roles/common/tasks/main.yml) - Removed obsolete steps and documentation regarding checking and installing all required modules in parallel (ansible/roles/common/README.md)
…d improve resource cleanup **Added:** - Introduced Azure SDK-based implementation for VM discovery, lifecycle, and Run Command, replacing most uses of the `az` CLI - Added internal WinRM runner for Azure VMs, enabling fast, parallel validator checks via WinRM/NTLM tunneled through Bastion and controller - Implemented per-VM and global concurrency limits for Azure Managed Run Commands to respect ARM API and VM resource limits - Added `Drainer` interface to provider abstraction and implemented it for Azure to ensure cleanup goroutines complete before exit - Provided `SOCKSAddr()` in `ProvisionTunnel` for direct SOCKS5 access by Go WinRM client - New tests for Azure SDK-based VM lifecycle and Run Command operations (`lifecycle_sdk_test.go`, `runcommand_sdk_test.go`, `vm_sdk_test.go`, `credentials_test.go`) - Azure provider now accepts and wires through environment name and inventory path for side-channel state required by WinRM - Terraform controller module: added support for deploying from a custom `source_image_id` (Shared Image Gallery, managed image, etc.) **Changed:** - VM lifecycle (start, stop, delete) and discovery now use Azure SDK clients with improved parallelism and error handling - PowerShell validator timeout increased to 180s to account for Azure Run Command tail latency under concurrency - Azure Run Command now uses Managed Run Command subresources for safe, parallel execution; old `az vm run-command invoke` path removed from hot path - AzureProvider refactored to always route validator checks through the new WinRM runner; managed Run Commands remain available for ad-hoc use - Provider construction now passes environment and inventory path to Azure for WinRM tunnel/inventory integration - Inventory parser now extracts `ansible_password` and handles quoted values for WinRM authentication - Documentation and comments updated to clarify the new Azure provider architecture and Terraform controller image logic **Removed:** - Eliminated most uses of `az` CLI for hot-path operations in Azure provider - Removed per-VM mutex/serialization for Azure Run Command (no longer needed with Managed Run Command subresources) - Deprecated legacy group membership parsing in inventory parser for host lines with key-value pairs (now treated as host definitions)
**Added:** - Added `useNTLM` flag to host credentials to select WinRM transport based on whether the host is a domain controller or member server - Implemented `isDomainController` helper to detect DCs from inventory groups - Added debug logging for WinRM client initialization showing auth method used **Changed:** - Updated credential loading to set `useNTLM` based on host group, ensuring the correct authentication protocol is selected for each host type - Modified credential construction to strip the `.\` prefix from usernames, preventing issues with authentication headers and aligning with WinRM library expectations - Adjusted WinRM client creation to select NTLM or Basic transport dynamically according to the `useNTLM` flag, improving compatibility with both DCs and member servers
**Changed:** - Clarified VM OS support to include generic Linux, not just Ubuntu 24.04 LTS - Documented support for booting from Shared Image Gallery images via `source_image_id` for prebuilt attacker images - Noted that Ansible dependencies setup via cloud-init can be skipped if using gallery images that include them - Improved accuracy and flexibility of feature descriptions in the bastion module README
**Added:** - Added installation and version check steps for terraform-docs v0.20.0 in the pre-commit GitHub Actions workflow to ensure docs generation is available **Changed:** - Updated inventory parser to strip quotes from ansible_user value, ensuring consistency with how ansible_password is handled
**Changed:** - Updated the test to ensure each check's output is a contiguous block (header and results) and not interleaved, regardless of check completion order - Clarified test name and comments to reflect that output is grouped by check, not strictly in submission order - Simplified assertions to check grouping and contiguity of output for each check **Removed:** - Removed the setup step for terraform-docs in the pre-commit GitHub Actions workflow to streamline dependencies
…ace fix **Added:** - Introduced `terraform-azure-vnet-peering` module with support for bidirectional VNet peering, optional remote NSG rules, and configurable inputs/outputs - Added Terraform files: `main.tf`, `variables.tf`, `outputs.tf`, `versions.tf`, and a comprehensive `README.md` with usage and input documentation **Changed:** - Added per-VM serialization to WinRM runner to prevent NTLM handshake races, using a new `vmLocks` map and locking in `runPS` - Modified `runScriptText` in script runner to propagate run errors and return partial output, ensuring callers can differentiate between template/rendering bugs and transport failures **Removed:** - Cleared `vmLocks` in WinRM runner's `close()` method to avoid stale locks after shutdown
**Added:** - Added TFD_VERSION environment variable to specify terraform-docs version - Introduced step to download, extract, and install terraform-docs binary in pre-commit workflow, ensuring terraform-docs is available for documentation generation and linting
**Changed:** - Bumped `TFD_VERSION` environment variable from v0.20.0 to v0.22.0 in the pre-commit GitHub Actions workflow to ensure use of the latest terraform-docs features and fixes
…ment **Added:** - Introduced `.terraform.lock.hcl` files for all Terraform modules to ensure consistent provider versions and improve reproducibility across environments - Locked AWS providers for `terraform-aws-instance-factory` and `terraform-aws-net` modules, including `aws`, `http`, and `random` as required - Locked AzureRM provider for all Azure-related modules, specifying compatible versions per module (`terraform-azure-bastion`, `terraform-azure-controller`, `terraform-azure-instance-factory`, `terraform-azure-net`, and `terraform-azure-vnet-peering`) - Included additional provider locks for `local` and `tls` in `terraform-azure-controller` to match its dependencies
**Changed:** - Added new provider hashes for AWS, AzureRM, HTTP, Local, Random, and TLS providers in multiple `.terraform.lock.hcl` files to ensure compatibility with updated provider releases and improve supply chain verification. This change affects module lock files for `terraform-aws-instance-factory`, `terraform-aws-net`, `terraform-azure-bastion`, `terraform-azure-controller`, `terraform-azure-instance-factory`, `terraform-azure-net`, and `terraform-azure-vnet-peering`. No provider versions or constraints were changed—only additional hash entries were added for integrity.
**Added:** - Added task to check all required DSC modules and identify missing ones - Implemented parallel installation of missing DSC modules using async jobs - Introduced async status polling to wait for module installations to complete **Changed:** - Replaced sequential DSC module installation with a check-and-parallel-install workflow to improve efficiency and reliability - Updated documentation to reflect new module installation logic and async process **Removed:** - Removed the old sequential DSC module installation task and related looping logic
l50
added a commit
that referenced
this pull request
May 1, 2026
…astion workflows (#161) **Key Changes:** - Implemented Azure provider for infrastructure, provisioning, and validation - Added new Terraform modules for Azure networking, VM factory, Bastion, and controller - Introduced `runcmd` and `bastion` CLI verbs for Azure-native host access - Extended CLI, inventory parsing, and doctor checks for Azure support **Added:** - Azure CLI provider (`internal/azure`) implementing all provider interfaces, including VM discovery, lifecycle, and fast WinRM-based command execution - Azure-specific CLI commands: - `cli/cmd/runcmd.go`: Azure Run Command (stateless, SSM-like shell and command runner) - `cli/cmd/bastion.go`: Native Bastion SSH/RDP/tunnel workflows - `cli/internal/azure` package: - Native VM, Run Command, and Bastion management via Azure SDK and CLI - WinRM runner with SOCKS5 tunnel through Bastion → controller for fast parallel provisioning and validation - Comprehensive unit/integration tests for Azure flows - New Terraform modules: - `terraform-azure-net`: VNet, subnets, NAT gateway, NSG - `terraform-azure-instance-factory`: Windows VM with bootstrap, identity - `terraform-azure-bastion`: Optional Bastion host with SKU/tunnel support - `terraform-azure-controller`: In-VNet Ansible controller with ephemeral key handling - Azure Terragrunt configuration under `infra/azure/goad-deployment/` for full lab deployment - Cloud-init and bootstrap templates for controller and lab hosts - Documentation for Azure usage, CLI workflows, and tips **Changed:** - CLI provider selection and infra commands: - Added `azure` as a supported provider and updated flags, help, and region logic - `infra apply|plan|destroy` now support Azure region and optional Bastion/controller flags - Unified Terragrunt runner for AWS/Azure, including opt-in module flags - Provisioning logic: - Added Azure SOCKS5 tunnel setup for in-VNet WinRM/PSRP via Bastion and controller - Genericized the SOCKS tunnel helper to work for both Ludus and Azure - Inventory parser: - Now supports `ansible_password` for WinRM auth in Azure - Improved parsing of host/group entries for compatibility with Azure-generated inventories - Doctor checks: - Added Azure CLI, Bastion extension, and SSH extension checks - Provider-specific help text and environment validation - Validator: - Increased PowerShell timeout for Azure's longer Run Command latency - Output is now streamed as checks complete, not in submission order, for better UX with slow providers - SSM/SessionManager interface: - Abstracted interactive shell interface so both AWS SSM and Azure Run Command can implement native shells - Pre-commit hooks and documentation: - Added terraform-docs hook for Azure modules - Updated module documentation and top-level Azure provider docs **Removed:** - Parallel DSC module installation in Ansible Windows role (now installs sequentially to avoid race conditions with Azure/WinRM) - Redundant/legacy AWS-specific region and provider logic where Azure now applies
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Key Changes:
runcmdandbastionCLI verbs for Azure-native host accessAdded:
internal/azure) implementing all provider interfaces, including VM discovery, lifecycle, and fast WinRM-based command executioncli/cmd/runcmd.go: Azure Run Command (stateless, SSM-like shell and command runner)cli/cmd/bastion.go: Native Bastion SSH/RDP/tunnel workflowscli/internal/azurepackage:terraform-azure-net: VNet, subnets, NAT gateway, NSGterraform-azure-instance-factory: Windows VM with bootstrap, identityterraform-azure-bastion: Optional Bastion host with SKU/tunnel supportterraform-azure-controller: In-VNet Ansible controller with ephemeral key handlinginfra/azure/goad-deployment/for full lab deploymentChanged:
azureas a supported provider and updated flags, help, and region logicinfra apply|plan|destroynow support Azure region and optional Bastion/controller flagsansible_passwordfor WinRM auth in AzureRemoved: