Expanding the Security Assessment Playbook
How to Evaluate DevOps and AI Systems with Modern Threat Models and Testing Techniques
Why Traditional Assessments Fall Short
In March 2023, the 3CX supply chain attack exposed a critical vulnerability in modern software delivery: the build environment itself. Attackers compromised 3CX's build infrastructure, injecting malware into the company's legitimate desktop application. The poisoned software was signed with valid certificates, passed all security checks, and was distributed to over 600,000 companies globally, including critical infrastructure providers, financial institutions, and government agencies. Traditional security assessments had validated 3CX's application security, code quality, and compliance posture. Yet none of these caught the compromise of the build pipeline that produced the software.
This wasn't an isolated incident. It followed similar attacks on CircleCI (January 2023), Codecov (2021), and others. The pattern is clear: attackers are targeting the infrastructure that builds and deploys software, not just the software itself.
Legacy assessments fail modern organizations because:
SDLC-only thinking misses pipeline realities. Traditional reviews focus on code, testing, and production, but modern teams ship code hundreds of times daily through automated pipelines with admin-level production access. These pipelines manage secrets and make deployment decisions, yet they're rarely included in security reviews. In the 3CX attack, the application code was fine; the build system that compiled it was compromised.
Compliance frameworks lag behind. SOC 2 and ISO 27001 check boxes but don't ask: "Can a compromised GitHub Action access production? Do ML pipelines have unrestricted data access? Can an attacker modify your build artifacts without detection?" Compliance provides a baseline, not security. 3CX had passed security audits, but those audits didn't include their build infrastructure.
Ephemeral infrastructure is invisible. Containers and serverless functions exist briefly and disappear. Build agents spin up, compile code, and terminate. Traditional tools struggle with constantly shifting attack surfaces where the infrastructure executing your most critical processes may only exist for minutes.
AI/ML systems are uncharted territory. Most security teams can't assess ML pipelines, model training security, or vector database risks. As organizations rush to deploy AI, with models trained on sensitive data and LLMs integrated into production systems, these gaps become critical vulnerabilities.
The result? Organizations pass audits while harboring critical exposures: hardcoded secrets in CI/CD configs, over-permissioned service accounts, unmonitored ML pipelines with access to customer data, and build systems that could be compromised to inject malicious code at massive scale.
Modern threats exploit modern infrastructure. If your assessment playbook doesn't cover CI/CD, containers, IaC, and AI systems, you're not assessing your actual attack surface, you're auditing documentation while attackers are targeting the pipelines that deliver your software to customers.
What Modern Assessments Must Include
Cloud-Native and Microservices
Container security: image provenance, vulnerability scanning, runtime policies
Service mesh security: mTLS, authorization, rate limiting
Serverless: IAM permissions, secret handling, event validation
Kubernetes: RBAC, pod security, network policies, admission controllers
Infrastructure as Code and CI/CD
IaC security: Scan Terraform/CloudFormation for overly permissive IAM, review state file security, detect drift
CI/CD pipelines: Secure runners, pipeline-as-code review, approval gates, artifact signing (SLSA)
Secrets management: Vault/cloud-native solutions (never environment variables), rotation policies, least-privilege access
Container orchestration: RBAC, network policies, etcd encryption, audit logging
The AI/ML Assessment Delta
Data pipelines and training:
Training data access controls and provenance tracking
Training environment isolation from production
Compute permissions scoped to minimum requirements
Experiment tracking security (MLflow, Weights & Biases)
Model serving and inference:
Model access controls and signing
Input validation and output filtering (prompt injection, adversarial inputs)
Rate limiting and authentication on inference endpoints
Model API protection
AI tooling and agents:
LLM coding assistant permissions (GitHub Copilot access scope)
Autonomous agent permission boundaries
Vector database access controls and encryption
Model registry security
Key questions: If an attacker compromises training, what data leaks? If they poison a model, how do you detect it? If inference is compromised, what's the blast radius? AI infrastructure needs the same rigor as production databases.
What Modern Assessments Must Include
A comprehensive DevOps security assessment must evaluate the full technology stack that builds, delivers, and operates modern applications, not just the applications themselves. This means extending your security review beyond traditional application security to include the CI/CD pipelines, container infrastructure, cloud-native architectures, and increasingly, AI/ML systems that have become critical production components. Here's what that means in practice:
Cloud-Native and Microservices
Container security: image provenance, vulnerability scanning, runtime policies
Service mesh security: mTLS, authorization, rate limiting
Serverless: IAM permissions, secret handling, event validation
Kubernetes: RBAC, pod security, network policies, admission controllers
Infrastructure as Code and CI/CD
IaC security: Scan Terraform/CloudFormation for overly permissive IAM, review state file security, detect drift
CI/CD pipelines: Secure runners, pipeline-as-code review, approval gates, artifact signing (SLSA)
Secrets management: Vault/cloud-native solutions (never environment variables), rotation policies, least-privilege access
Container orchestration: RBAC, network policies, etcd encryption, audit logging
The AI/ML Assessment Delta
Data pipelines and training:
Training data access controls and provenance tracking
Training environment isolation from production
Compute permissions scoped to minimum requirements
Experiment tracking security (MLflow, Weights & Biases)
Model serving and inference:
Model access controls and signing
Input validation and output filtering (prompt injection, adversarial inputs)
Rate limiting and authentication on inference endpoints
Model API protection
AI tooling and agents:
LLM coding assistant permissions (GitHub Copilot access scope)
Autonomous agent permission boundaries
Vector database access controls and encryption
Model registry security
Key questions: If an attacker compromises training, what data leaks? If they poison a model, how do you detect it? If inference is compromised, what's the blast radius? AI infrastructure needs the same rigor as production databases.
Red Teaming and Adversarial Simulation
The most effective way to validate your DevOps security posture is to attack it. Red teaming exercises simulate real-world adversary behavior, exposing weaknesses that paper assessments and compliance checklists miss. Here's how to structure adversarial simulations that test the security of modern DevOps pipelines, AI systems, and cloud infrastructure.
Attack Scenarios
Git repository compromise:
Attack path: Stolen credentials → push malicious code or modify GitHub Actions to exfiltrate secrets
Test: Branch protection, code review enforcement, secret scanning
CI/CD runner compromise:
Attack path: Compromised runner → access secrets, modify artifacts, pivot using cloud IAM credentials
Test: Runner isolation, secret injection, IAM scope, artifact signing
Training pipeline compromise:
Attack path: Access ML training → poison data, exfiltrate datasets, modify models, pivot to cloud resources
Test: Data access controls, environment isolation, model signing
Inference endpoint exploitation:
Attack path: Prompt injection, model inversion, adversarial inputs, resource exhaustion
Test: Input validation, rate limiting, output filtering, monitoring
Cloud IAM abuse:
Attack path: Compromise pipeline with excessive AWS permissions → access databases, create backdoor accounts
Test: Least-privilege IAM, audit logs
Tools and Techniques
Offensive tools: Nuclei, Semgrep, TruffleHog (secret scanning), ScoutSuite (cloud auditing), Peirates (K8s testing), Garak (LLM vulnerability scanning)
Goal: Demonstrate realistic attack paths from initial access to business impact. Document every permission that enabled the attack and every control that could prevent it.
Threat Modeling with STRIDE
STRIDE is Microsoft's threat modeling framework that helps identify threats across six categories: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. It's particularly well-suited for DevOps because it focuses on data flows, trust boundaries, and process interactions, exactly what CI/CD pipelines are made of. Let's walk through each pipeline stage and identify threats using STRIDE:
Mapping STRIDE to CI/CD Stages
Source (Code Repo)
Trust boundary: Developer workstation → Git repository → CI/CD trigger
Spoofing:
Attacker impersonates developer (stolen credentials, session hijacking)
Commit signature spoofing (unsigned or improperly verified commits)
Mitigation: MFA, commit signing (GPG), IP allowlisting for Git access
Tampering:
Malicious code injection via compromised developer account
Direct push to protected branches (bypassing PR review)
Rewriting git history to hide malicious changes
Mitigation: Branch protection, required reviews, signed commits, audit logs
Repudiation:
Developer claims "I didn't commit that malicious code"
Lack of audit trail for who approved/merged PRs
Mitigation: Commit signing, immutable audit logs, PR approval tracking
Information Disclosure:
Secrets hardcoded in code (API keys, passwords)
Sensitive data in commit history
Public repo accidentally containing proprietary code
Mitigation: Secret scanning (GitHub Advanced Security, TruffleHog), pre-commit hooks, private repos with access controls
Denial of Service:
Repo filled with large files (Git LFS abuse)
Excessive webhook triggers overwhelming CI/CD system
Mitigation: Repo size limits, webhook rate limiting, CI/CD job quotas
Elevation of Privilege:
Attacker gains write access to repo → can now trigger CI/CD with elevated permissions
Mitigation: Least-privilege repo access, separate CI/CD credentials from repo access
Build (CI/CD)
Trust boundary: Git repo → CI/CD runner → Build artifacts
Spoofing:
Attacker submits malicious PR that appears legitimate
Fake build artifacts uploaded to registry
Mitigation: PR review, artifact signing, provenance attestation (SLSA)
Tampering:
Malicious code injected during build (dependency confusion, compromised build script)
Build cache poisoning
Modification of artifacts before upload
Mitigation: Dependency pinning/vendoring, reproducible builds, artifact signing, isolated build environments
Repudiation:
"This artifact came from CI/CD, trust it" but no provenance
Can't prove what source code produced what artifact
Mitigation: SLSA provenance, signed build logs, SBOM generation
Information Disclosure:
Secrets leaked in build logs
Proprietary code exposed through build artifacts
CI/CD runner can access excessive cloud resources (list all S3 buckets, read all secrets)
Mitigation: Secret masking in logs, least-privilege IAM for runners, ephemeral runners, log access controls
Denial of Service:
Resource-exhaustive builds (crypto mining in CI/CD)
Infinite loop in build script
Filling artifact storage with junk
Mitigation: Build timeouts, resource quotas, cost monitoring
Elevation of Privilege:
CI/CD runner has admin-level cloud IAM permissions
Build process can modify production infrastructure
Attacker compromises build → inherits excessive permissions
Mitigation: Least-privilege IAM, separate build and deploy permissions, approval gates before production changes
Test Stage (Automated Testing)
Trust boundary: Build artifacts → Test environment → Test results
Spoofing:
Fake test results (tests claim to pass but were never run)
Compromised test dependencies
Mitigation: Signed test reports, immutable test infrastructure, verified test framework versions
Tampering:
Test data manipulation to hide vulnerabilities
Modification of test configs to skip security tests
Disabling code coverage or vulnerability scanning
Mitigation: Policy enforcement (required tests can't be disabled), test result integrity checks, audit logs
Repudiation:
"Tests passed" but no audit trail of what was actually tested
Mitigation: Detailed test reports, logs of all tests run, version-controlled test configurations
Information Disclosure:
Test environments using production data without sanitization
Test results exposing sensitive business logic or vulnerabilities
Mitigation: Synthetic test data, test environment isolation, access controls on test reports
Denial of Service:
Resource exhaustion in test environment
Long-running tests blocking deployment pipeline
Mitigation: Test timeouts, parallel testing, test environment resource limits
Elevation of Privilege:
Test environment has access to production secrets/data
Attacker compromises test job → accesses production
Mitigation: Separate test and production environments, least-privilege access, network segmentation
Deploy Stage (Release to Production)
Trust boundary: Test results → Deployment system → Production environment
Spoofing:
Deployment triggered by unauthorized user
Fake deployment approval
Mitigation: Authenticated deployment requests, approval workflows, audit logs
Tampering:
Artifact swapped between test and deploy stages
IaC templates modified during deployment
Configuration drift (deployed resources don't match IaC definitions)
Mitigation: Artifact signing and verification, immutable artifacts, drift detection
Repudiation:
"Who deployed this broken change?" with no audit trail
Mitigation: Deployment logs, immutable audit records, integration with change management systems
Information Disclosure:
Deployment logs exposing production secrets or architecture details
Deployment system can read all production secrets
Mitigation: Secret masking, least-privilege deployment credentials, log access controls
Denial of Service:
Malicious deployment takes down production
Deployment process lacks rollback capability
Mitigation: Blue-green or canary deployments, automated rollback, health checks before traffic shift
Elevation of Privilege:
Deployment system has excessive production permissions (can modify IAM, delete databases)
Compromised deployment pipeline → full production control
Mitigation: Least-privilege deployment IAM, approval gates for sensitive changes, separate control plane access
Runtime Stage (Production Operation)
Trust boundary: Production infrastructure → Application/services → External users
Spoofing:
Service impersonation (unauthorized service claims to be legitimate microservice)
Compromised identity provider
Mitigation: mTLS between services, service mesh authentication, strong identity verification
Tampering:
Configuration drift from IaC definitions
Runtime modification of container images or binaries
Data tampering in databases or message queues
Mitigation: Immutable infrastructure, runtime integrity monitoring, encryption at rest
Repudiation:
Insufficient logging of user/service actions
Logs can be modified or deleted
Mitigation: Centralized logging, immutable log storage, log integrity verification
Information Disclosure:
Exposed APIs or dashboards (Kubernetes dashboard, database admin panels)
Cloud storage misconfiguration (public S3 buckets)
Excessive logging of sensitive data
Mitigation: Network policies, authentication on all admin interfaces, data classification, log sanitization
Denial of Service:
Application vulnerabilities (unpatched CVEs)
Resource exhaustion from malicious traffic
Mitigation: WAF, rate limiting, auto-scaling, vulnerability management
Elevation of Privilege:
Container escape to host
Compromised pod accessing Kubernetes API with excessive RBAC
Service account abuse
Mitigation: Pod security standards, least-privilege RBAC, network policies, runtime security tools (Falco)
Critical Trust Boundaries
Developer workstation ↔ Git: MFA, commit signing
Git ↔ CI/CD: Webhook authentication, least-privilege access
CI/CD ↔ Cloud: Minimal IAM, network isolation
Staging ↔ Production: Network segmentation, separate credentials, approval gates
Service ↔ Service: mTLS, authorization
Principle: Every boundary crossing requires authentication, authorization, and audit logging.
Validation and Evidence Collection
Threat modeling identifies potential vulnerabilities, but validation proves they exist (or don't). This section covers how to systematically collect evidence during a DevOps security assessment, what logs to review, what configurations to inspect, and how to test whether your detection capabilities actually work when an attack occurs.
Key Log Sources
AWS CloudTrail / Azure Activity / GCP Audit:
Red flags: Unexpected IAM changes, excessive API calls, secret access from wrong systems
GitHub/GitLab Audit Logs:
Red flags: Branch protection changes, new deploy keys, webhook creation, failed auth attempts
Kubernetes Audit Logs:
Red flags: Exec into pods, secret reads by wrong accounts, RBAC changes, privileged pods
CI/CD Run Histories:
Red flags: Secrets in logs, unexpected network connections, unusual job durations
Configuration Review
IaC (Terraform, CloudFormation): Check for IAM wildcards, overly permissive security groups, hardcoded secrets, public resource access
CI/CD Configs: Review secret handling, runner specifications, approval requirements, third-party actions
Kubernetes Manifests: Validate security contexts, resource limits, RBAC bindings, network policies
IAM Permissions: Identify admin-level permissions for CI/CD/apps, unused permissions, long-lived credentials
Testing Detection Capabilities
Log injection: Can attackers hide activity?
Model misfire: Do anomalous ML outputs trigger alerts?
Pipeline poisoning: Are malicious pipeline changes detected?
Secret exposure: How fast from exposure to revocation?
Unauthorized access: Do suspicious patterns trigger alerts?
The Assessment Playbook
This playbook provides a structured, repeatable approach to conducting DevOps security assessments. Use it to ensure comprehensive coverage whether you have limited time for a focused review or weeks for deep analysis. The key is consistency. Applying the same framework lets you measure improvement over time and compare security posture across teams.
2-Day Assessment (High-Risk Focus aka The Sniff Test)
Day 1:
Morning: Interview key personnel (1 DevOps lead, 1 Security person) to understand architecture
Afternoon: Focus on highest-risk areas:
Review IAM permissions for CI/CD runners and production service accounts
Audit production Kubernetes RBAC and pod security
Check for secrets in code/logs (automated scan with TruffleHog/Semgrep)
Review recent CloudTrail/audit logs for anomalies
Day 2:
Morning: Red team simulation (1-2 attack scenarios, e.g., compromised GitHub Action, container escape attempt)
Afternoon: Document findings, prioritize top 5 critical issues, deliver brief with recommendations
Output: Executive summary with critical issues, recommended immediate actions, and proposal for deeper assessment
2-Week Assessment (Comprehensive Coverage):
Week 1:
Days 1-2: Discovery and documentation
Architecture review meetings with DevOps, Security, and ML teams
Document all CI/CD pipelines, cloud environments, and AI/ML systems
Collect configurations, policies, and access documentation
Days 3-5: Automated scanning and configuration review
Run comprehensive security scans (IaC, containers, cloud configs)
Review IAM permissions, RBAC policies, network configurations
Analyze audit logs for past 90 days
Document all findings with severity ratings
Week 2:
Days 6-8: Red team simulation and adversarial testing
Execute 5-7 attack scenarios across different trust boundaries
Test AI/ML pipeline security, model inference endpoints
Validate detection and response capabilities
Days 9-10: Analysis, reporting, and remediation planning
Consolidate findings, eliminate false positives
Prioritize issues by risk (likelihood × impact)
Create detailed remediation roadmap with timelines
Present comprehensive findings to stakeholders
Output: Detailed assessment report, risk register, remediation roadmap, executive briefing
What to Prioritize Based on Time:
Always cover (regardless of time):
Production access controls (who can deploy/modify production?)
Secret management (are secrets hardcoded anywhere?)
CI/CD runner permissions (do they have excessive cloud access?)
Add if you have 3-7 days:
Container security (image scanning, runtime policies)
Network segmentation (can dev access prod? Can prod egress freely?)
Kubernetes security (RBAC, pod security, network policies)
Add if you have 1-2 weeks:
AI/ML pipeline security
Comprehensive red teaming
Detection and response validation
Supply chain security (dependencies, third-party integrations)
Add if you have 2+ weeks:
Threat modeling workshops with development teams
Custom tooling development for automated testing
Policy-as-code implementation
Compliance mapping (SOC 2, ISO 27001, etc.)
Tools to Inspect by Layer
Source Code Management (GitHub, GitLab, Bitbucket)
Tools: Git-secrets, TruffleHog, Gitleaks
What to check:
Branch protection rules
Required reviewers and status checks
Commit signing enforcement
Repository visibility settings
Deploy keys and webhook configurations
Access audit logs
CI/CD (Jenkins, GitHub Actions, GitLab CI, CircleCI)
Tools: Semgrep, CodeQL, custom scripts
What to check:
Pipeline-as-code files (.github/workflows/, .gitlab-ci.yml, Jenkinsfile)
Secret management (are secrets in code or properly vaulted?)
Third-party integrations (actions, plugins)
Runner/agent configurations
Build logs for secret exposure
Approval workflows
Container Images and Registries (ECR, GCR, Docker Hub, ACR)
Tools: Trivy, Grype, Clair, Anchore
What to check:
Vulnerability scan results
Image signatures and attestations
Base image sources
Image layers (secrets baked in?)
Registry access controls
Image pull/push logs
Infrastructure as Code (Terraform, CloudFormation, Pulumi)
Tools: Checkov, tfsec, Terrascan, Sentinel
What to check:
IAM policies (wildcards, excessive permissions)
Security group rules
Encryption settings
Public access configurations
State file security
Drift detection (does prod match IaC?)
Kubernetes (EKS, GKE, AKS, self-managed)
Tools: Kube-bench, Kube-hunter, Kubesec, Polaris, Falco
What to check:
Pod security standards/policies
RBAC configurations
Network policies
Service account permissions
Secrets management (external secrets operator?)
Audit logging configuration
Admission controllers
Cloud Platforms (AWS, Azure, GCP)
Tools: Prowler, ScoutSuite, CloudSploit, Cloud Custodian
What to check:
IAM policies and roles
Resource-based policies (S3 bucket policies, etc.)
Network configurations (VPCs, security groups, NACLs)
Encryption settings (at-rest, in-transit)
Logging and monitoring (CloudTrail, VPC Flow Logs, etc.)
Publicly accessible resources
AI/ML Infrastructure (MLflow, Kubeflow, SageMaker, Vertex AI)
Tools: Garak (LLM testing), custom scripts, cloud provider tools
What to check:
Experiment tracking access controls
Model registry permissions
Training job IAM permissions (ensure least-privilege access; audit with AWS IAM Access Analyzer, GCP Policy Analyzer)
Training job IAM permissions
Inference endpoint authentication
Data access controls
Model versioning and signing
Vector database security (if using RAG)
Runtime Security (Production Applications)
Tools: Falco, Sysdig, Aqua, Datadog Security, Wiz
What to check:
Runtime behavior anomalies
Container escapes or privilege escalation attempts
Network connections (unexpected egress)
File integrity monitoring
Process execution monitoring
Next Up: Deep Dive – Securing the Build-to-Deploy Chain
In the next post, we'll take everything we've learned about assessing DevOps security and apply it to the most critical attack surface: the build-to-deploy chain.