Expanding the Security Assessment Playbook

Nov 6

How to Evaluate DevOps and AI Systems with Modern Threat Models and Testing Techniques

Why Traditional Assessments Fall Short

In March 2023, the 3CX supply chain attack exposed a critical vulnerability in modern software delivery: the build environment itself. Attackers compromised 3CX's build infrastructure, injecting malware into the company's legitimate desktop application. The poisoned software was signed with valid certificates, passed all security checks, and was distributed to over 600,000 companies globally, including critical infrastructure providers, financial institutions, and government agencies. Traditional security assessments had validated 3CX's application security, code quality, and compliance posture. Yet none of these caught the compromise of the build pipeline that produced the software.

This wasn't an isolated incident. It followed similar attacks on CircleCI (January 2023), Codecov (2021), and others. The pattern is clear: attackers are targeting the infrastructure that builds and deploys software, not just the software itself.

Legacy assessments fail modern organizations because:

SDLC-only thinking misses pipeline realities. Traditional reviews focus on code, testing, and production, but modern teams ship code hundreds of times daily through automated pipelines with admin-level production access. These pipelines manage secrets and make deployment decisions, yet they're rarely included in security reviews. In the 3CX attack, the application code was fine; the build system that compiled it was compromised.

Compliance frameworks lag behind. SOC 2 and ISO 27001 check boxes but don't ask: "Can a compromised GitHub Action access production? Do ML pipelines have unrestricted data access? Can an attacker modify your build artifacts without detection?" Compliance provides a baseline, not security. 3CX had passed security audits, but those audits didn't include their build infrastructure.

Ephemeral infrastructure is invisible. Containers and serverless functions exist briefly and disappear. Build agents spin up, compile code, and terminate. Traditional tools struggle with constantly shifting attack surfaces where the infrastructure executing your most critical processes may only exist for minutes.

AI/ML systems are uncharted territory. Most security teams can't assess ML pipelines, model training security, or vector database risks. As organizations rush to deploy AI, with models trained on sensitive data and LLMs integrated into production systems, these gaps become critical vulnerabilities.

The result? Organizations pass audits while harboring critical exposures: hardcoded secrets in CI/CD configs, over-permissioned service accounts, unmonitored ML pipelines with access to customer data, and build systems that could be compromised to inject malicious code at massive scale.

Modern threats exploit modern infrastructure. If your assessment playbook doesn't cover CI/CD, containers, IaC, and AI systems, you're not assessing your actual attack surface, you're auditing documentation while attackers are targeting the pipelines that deliver your software to customers.

What Modern Assessments Must Include

Cloud-Native and Microservices

Container security: image provenance, vulnerability scanning, runtime policies
Service mesh security: mTLS, authorization, rate limiting
Serverless: IAM permissions, secret handling, event validation
Kubernetes: RBAC, pod security, network policies, admission controllers

Infrastructure as Code and CI/CD

IaC security: Scan Terraform/CloudFormation for overly permissive IAM, review state file security, detect drift
CI/CD pipelines: Secure runners, pipeline-as-code review, approval gates, artifact signing (SLSA)
Secrets management: Vault/cloud-native solutions (never environment variables), rotation policies, least-privilege access
Container orchestration: RBAC, network policies, etcd encryption, audit logging

The AI/ML Assessment Delta

Data pipelines and training:

Training data access controls and provenance tracking
Training environment isolation from production
Compute permissions scoped to minimum requirements
Experiment tracking security (MLflow, Weights & Biases)

Model serving and inference:

Model access controls and signing
Input validation and output filtering (prompt injection, adversarial inputs)
Rate limiting and authentication on inference endpoints
Model API protection

AI tooling and agents:

LLM coding assistant permissions (GitHub Copilot access scope)
Autonomous agent permission boundaries
Vector database access controls and encryption
Model registry security

Key questions: If an attacker compromises training, what data leaks? If they poison a model, how do you detect it? If inference is compromised, what's the blast radius? AI infrastructure needs the same rigor as production databases.

What Modern Assessments Must Include

A comprehensive DevOps security assessment must evaluate the full technology stack that builds, delivers, and operates modern applications, not just the applications themselves. This means extending your security review beyond traditional application security to include the CI/CD pipelines, container infrastructure, cloud-native architectures, and increasingly, AI/ML systems that have become critical production components. Here's what that means in practice:

Cloud-Native and Microservices

Container security: image provenance, vulnerability scanning, runtime policies
Service mesh security: mTLS, authorization, rate limiting
Serverless: IAM permissions, secret handling, event validation
Kubernetes: RBAC, pod security, network policies, admission controllers

Infrastructure as Code and CI/CD

IaC security: Scan Terraform/CloudFormation for overly permissive IAM, review state file security, detect drift
CI/CD pipelines: Secure runners, pipeline-as-code review, approval gates, artifact signing (SLSA)
Secrets management: Vault/cloud-native solutions (never environment variables), rotation policies, least-privilege access
Container orchestration: RBAC, network policies, etcd encryption, audit logging

The AI/ML Assessment Delta

Data pipelines and training:

Training data access controls and provenance tracking
Training environment isolation from production
Compute permissions scoped to minimum requirements
Experiment tracking security (MLflow, Weights & Biases)

Model serving and inference:

Model access controls and signing
Input validation and output filtering (prompt injection, adversarial inputs)
Rate limiting and authentication on inference endpoints
Model API protection

AI tooling and agents:

LLM coding assistant permissions (GitHub Copilot access scope)
Autonomous agent permission boundaries
Vector database access controls and encryption
Model registry security

Red Teaming and Adversarial Simulation

The most effective way to validate your DevOps security posture is to attack it. Red teaming exercises simulate real-world adversary behavior, exposing weaknesses that paper assessments and compliance checklists miss. Here's how to structure adversarial simulations that test the security of modern DevOps pipelines, AI systems, and cloud infrastructure.

Attack Scenarios

Git repository compromise:

Attack path: Stolen credentials → push malicious code or modify GitHub Actions to exfiltrate secrets
Test: Branch protection, code review enforcement, secret scanning

CI/CD runner compromise:

Attack path: Compromised runner → access secrets, modify artifacts, pivot using cloud IAM credentials
Test: Runner isolation, secret injection, IAM scope, artifact signing

Training pipeline compromise:

Attack path: Access ML training → poison data, exfiltrate datasets, modify models, pivot to cloud resources
Test: Data access controls, environment isolation, model signing

Inference endpoint exploitation:

Attack path: Prompt injection, model inversion, adversarial inputs, resource exhaustion
Test: Input validation, rate limiting, output filtering, monitoring

Cloud IAM abuse:

Attack path: Compromise pipeline with excessive AWS permissions → access databases, create backdoor accounts
Test: Least-privilege IAM, audit logs

Tools and Techniques

Offensive tools: Nuclei, Semgrep, TruffleHog (secret scanning), ScoutSuite (cloud auditing), Peirates (K8s testing), Garak (LLM vulnerability scanning)

Goal: Demonstrate realistic attack paths from initial access to business impact. Document every permission that enabled the attack and every control that could prevent it.

Threat Modeling with STRIDE

STRIDE is Microsoft's threat modeling framework that helps identify threats across six categories: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. It's particularly well-suited for DevOps because it focuses on data flows, trust boundaries, and process interactions, exactly what CI/CD pipelines are made of. Let's walk through each pipeline stage and identify threats using STRIDE:

Mapping STRIDE to CI/CD Stages

Source (Code Repo)

Trust boundary: Developer workstation → Git repository → CI/CD trigger

Spoofing:

Attacker impersonates developer (stolen credentials, session hijacking)
Commit signature spoofing (unsigned or improperly verified commits)
Mitigation: MFA, commit signing (GPG), IP allowlisting for Git access

Tampering:

Malicious code injection via compromised developer account
Direct push to protected branches (bypassing PR review)
Rewriting git history to hide malicious changes
Mitigation: Branch protection, required reviews, signed commits, audit logs

Repudiation:

Developer claims "I didn't commit that malicious code"
Lack of audit trail for who approved/merged PRs
Mitigation: Commit signing, immutable audit logs, PR approval tracking

Information Disclosure:

Secrets hardcoded in code (API keys, passwords)
Sensitive data in commit history
Public repo accidentally containing proprietary code
Mitigation: Secret scanning (GitHub Advanced Security, TruffleHog), pre-commit hooks, private repos with access controls

Denial of Service:

Repo filled with large files (Git LFS abuse)
Excessive webhook triggers overwhelming CI/CD system
Mitigation: Repo size limits, webhook rate limiting, CI/CD job quotas

Elevation of Privilege:

Attacker gains write access to repo → can now trigger CI/CD with elevated permissions
Mitigation: Least-privilege repo access, separate CI/CD credentials from repo access

Build (CI/CD)

Trust boundary: Git repo → CI/CD runner → Build artifacts

Spoofing:

Attacker submits malicious PR that appears legitimate
Fake build artifacts uploaded to registry
Mitigation: PR review, artifact signing, provenance attestation (SLSA)

Tampering:

Malicious code injected during build (dependency confusion, compromised build script)
Build cache poisoning
Modification of artifacts before upload
Mitigation: Dependency pinning/vendoring, reproducible builds, artifact signing, isolated build environments

Repudiation:

"This artifact came from CI/CD, trust it" but no provenance
Can't prove what source code produced what artifact
Mitigation: SLSA provenance, signed build logs, SBOM generation

Information Disclosure:

Secrets leaked in build logs
Proprietary code exposed through build artifacts
CI/CD runner can access excessive cloud resources (list all S3 buckets, read all secrets)
Mitigation: Secret masking in logs, least-privilege IAM for runners, ephemeral runners, log access controls

Denial of Service:

Resource-exhaustive builds (crypto mining in CI/CD)
Infinite loop in build script
Filling artifact storage with junk
Mitigation: Build timeouts, resource quotas, cost monitoring

Elevation of Privilege:

CI/CD runner has admin-level cloud IAM permissions
Build process can modify production infrastructure
Attacker compromises build → inherits excessive permissions
Mitigation: Least-privilege IAM, separate build and deploy permissions, approval gates before production changes

Test Stage (Automated Testing)

Trust boundary: Build artifacts → Test environment → Test results

Spoofing:

Fake test results (tests claim to pass but were never run)
Compromised test dependencies
Mitigation: Signed test reports, immutable test infrastructure, verified test framework versions

Tampering:

Test data manipulation to hide vulnerabilities
Modification of test configs to skip security tests
Disabling code coverage or vulnerability scanning
Mitigation: Policy enforcement (required tests can't be disabled), test result integrity checks, audit logs

Repudiation:

"Tests passed" but no audit trail of what was actually tested
Mitigation: Detailed test reports, logs of all tests run, version-controlled test configurations

Information Disclosure:

Test environments using production data without sanitization
Test results exposing sensitive business logic or vulnerabilities
Mitigation: Synthetic test data, test environment isolation, access controls on test reports

Denial of Service:

Resource exhaustion in test environment
Long-running tests blocking deployment pipeline
Mitigation: Test timeouts, parallel testing, test environment resource limits

Elevation of Privilege:

Test environment has access to production secrets/data
Attacker compromises test job → accesses production
Mitigation: Separate test and production environments, least-privilege access, network segmentation

Deploy Stage (Release to Production)

Trust boundary: Test results → Deployment system → Production environment

Spoofing:

Deployment triggered by unauthorized user
Fake deployment approval
Mitigation: Authenticated deployment requests, approval workflows, audit logs

Tampering:

Artifact swapped between test and deploy stages
IaC templates modified during deployment
Configuration drift (deployed resources don't match IaC definitions)
Mitigation: Artifact signing and verification, immutable artifacts, drift detection

Repudiation:

"Who deployed this broken change?" with no audit trail
Mitigation: Deployment logs, immutable audit records, integration with change management systems

Information Disclosure:

Deployment logs exposing production secrets or architecture details
Deployment system can read all production secrets
Mitigation: Secret masking, least-privilege deployment credentials, log access controls

Denial of Service:

Malicious deployment takes down production
Deployment process lacks rollback capability
Mitigation: Blue-green or canary deployments, automated rollback, health checks before traffic shift

Elevation of Privilege:

Deployment system has excessive production permissions (can modify IAM, delete databases)
Compromised deployment pipeline → full production control
Mitigation: Least-privilege deployment IAM, approval gates for sensitive changes, separate control plane access

Runtime Stage (Production Operation)

Trust boundary: Production infrastructure → Application/services → External users

Spoofing:

Service impersonation (unauthorized service claims to be legitimate microservice)
Compromised identity provider
Mitigation: mTLS between services, service mesh authentication, strong identity verification

Tampering:

Configuration drift from IaC definitions
Runtime modification of container images or binaries
Data tampering in databases or message queues
Mitigation: Immutable infrastructure, runtime integrity monitoring, encryption at rest

Repudiation:

Insufficient logging of user/service actions
Logs can be modified or deleted
Mitigation: Centralized logging, immutable log storage, log integrity verification

Information Disclosure:

Exposed APIs or dashboards (Kubernetes dashboard, database admin panels)
Cloud storage misconfiguration (public S3 buckets)
Excessive logging of sensitive data
Mitigation: Network policies, authentication on all admin interfaces, data classification, log sanitization

Denial of Service:

Application vulnerabilities (unpatched CVEs)
Resource exhaustion from malicious traffic
Mitigation: WAF, rate limiting, auto-scaling, vulnerability management

Elevation of Privilege:

Container escape to host
Compromised pod accessing Kubernetes API with excessive RBAC
Service account abuse
Mitigation: Pod security standards, least-privilege RBAC, network policies, runtime security tools (Falco)

Critical Trust Boundaries

Developer workstation ↔ Git: MFA, commit signing
Git ↔ CI/CD: Webhook authentication, least-privilege access
CI/CD ↔ Cloud: Minimal IAM, network isolation
Staging ↔ Production: Network segmentation, separate credentials, approval gates
Service ↔ Service: mTLS, authorization

Principle: Every boundary crossing requires authentication, authorization, and audit logging.

Validation and Evidence Collection

Threat modeling identifies potential vulnerabilities, but validation proves they exist (or don't). This section covers how to systematically collect evidence during a DevOps security assessment, what logs to review, what configurations to inspect, and how to test whether your detection capabilities actually work when an attack occurs.

Key Log Sources

AWS CloudTrail / Azure Activity / GCP Audit:

Red flags: Unexpected IAM changes, excessive API calls, secret access from wrong systems

GitHub/GitLab Audit Logs:

Red flags: Branch protection changes, new deploy keys, webhook creation, failed auth attempts

Kubernetes Audit Logs:

Red flags: Exec into pods, secret reads by wrong accounts, RBAC changes, privileged pods

CI/CD Run Histories:

Red flags: Secrets in logs, unexpected network connections, unusual job durations

Configuration Review

IaC (Terraform, CloudFormation): Check for IAM wildcards, overly permissive security groups, hardcoded secrets, public resource access

CI/CD Configs: Review secret handling, runner specifications, approval requirements, third-party actions

Kubernetes Manifests: Validate security contexts, resource limits, RBAC bindings, network policies

IAM Permissions: Identify admin-level permissions for CI/CD/apps, unused permissions, long-lived credentials

Testing Detection Capabilities

Log injection: Can attackers hide activity?
Model misfire: Do anomalous ML outputs trigger alerts?
Pipeline poisoning: Are malicious pipeline changes detected?
Secret exposure: How fast from exposure to revocation?
Unauthorized access: Do suspicious patterns trigger alerts?

The Assessment Playbook

This playbook provides a structured, repeatable approach to conducting DevOps security assessments. Use it to ensure comprehensive coverage whether you have limited time for a focused review or weeks for deep analysis. The key is consistency. Applying the same framework lets you measure improvement over time and compare security posture across teams.

2-Day Assessment (High-Risk Focus aka The Sniff Test)

Day 1:

Morning: Interview key personnel (1 DevOps lead, 1 Security person) to understand architecture
Afternoon: Focus on highest-risk areas:

Review IAM permissions for CI/CD runners and production service accounts
Audit production Kubernetes RBAC and pod security
Check for secrets in code/logs (automated scan with TruffleHog/Semgrep)
Review recent CloudTrail/audit logs for anomalies

Day 2:

Morning: Red team simulation (1-2 attack scenarios, e.g., compromised GitHub Action, container escape attempt)
Afternoon: Document findings, prioritize top 5 critical issues, deliver brief with recommendations

Output: Executive summary with critical issues, recommended immediate actions, and proposal for deeper assessment

2-Week Assessment (Comprehensive Coverage):

Week 1:

Days 1-2: Discovery and documentation

Architecture review meetings with DevOps, Security, and ML teams
Document all CI/CD pipelines, cloud environments, and AI/ML systems
Collect configurations, policies, and access documentation

Days 3-5: Automated scanning and configuration review

Run comprehensive security scans (IaC, containers, cloud configs)
Review IAM permissions, RBAC policies, network configurations
Analyze audit logs for past 90 days
Document all findings with severity ratings

Week 2:

Days 6-8: Red team simulation and adversarial testing

Execute 5-7 attack scenarios across different trust boundaries
Test AI/ML pipeline security, model inference endpoints
Validate detection and response capabilities

Days 9-10: Analysis, reporting, and remediation planning

Consolidate findings, eliminate false positives
Prioritize issues by risk (likelihood × impact)
Create detailed remediation roadmap with timelines
Present comprehensive findings to stakeholders

Output: Detailed assessment report, risk register, remediation roadmap, executive briefing

What to Prioritize Based on Time:

Always cover (regardless of time):

Production access controls (who can deploy/modify production?)
Secret management (are secrets hardcoded anywhere?)
CI/CD runner permissions (do they have excessive cloud access?)

Add if you have 3-7 days:

Container security (image scanning, runtime policies)
Network segmentation (can dev access prod? Can prod egress freely?)
Kubernetes security (RBAC, pod security, network policies)

Add if you have 1-2 weeks:

AI/ML pipeline security
Comprehensive red teaming
Detection and response validation
Supply chain security (dependencies, third-party integrations)

Add if you have 2+ weeks:

Threat modeling workshops with development teams
Custom tooling development for automated testing
Policy-as-code implementation
Compliance mapping (SOC 2, ISO 27001, etc.)

Tools to Inspect by Layer

Source Code Management (GitHub, GitLab, Bitbucket)

Tools: Git-secrets, TruffleHog, Gitleaks
What to check:

Branch protection rules
Required reviewers and status checks
Commit signing enforcement
Repository visibility settings
Deploy keys and webhook configurations
Access audit logs

CI/CD (Jenkins, GitHub Actions, GitLab CI, CircleCI)

Tools: Semgrep, CodeQL, custom scripts
What to check:

Pipeline-as-code files (.github/workflows/, .gitlab-ci.yml, Jenkinsfile)
Secret management (are secrets in code or properly vaulted?)
Third-party integrations (actions, plugins)
Runner/agent configurations
Build logs for secret exposure
Approval workflows

Container Images and Registries (ECR, GCR, Docker Hub, ACR)

Tools: Trivy, Grype, Clair, Anchore
What to check:

Vulnerability scan results
Image signatures and attestations
Base image sources
Image layers (secrets baked in?)
Registry access controls
Image pull/push logs

Infrastructure as Code (Terraform, CloudFormation, Pulumi)

Tools: Checkov, tfsec, Terrascan, Sentinel
What to check:

IAM policies (wildcards, excessive permissions)
Security group rules
Encryption settings
Public access configurations
State file security
Drift detection (does prod match IaC?)

Kubernetes (EKS, GKE, AKS, self-managed)

Tools: Kube-bench, Kube-hunter, Kubesec, Polaris, Falco
What to check:

Pod security standards/policies
RBAC configurations
Network policies
Service account permissions
Secrets management (external secrets operator?)
Audit logging configuration
Admission controllers

Cloud Platforms (AWS, Azure, GCP)

Tools: Prowler, ScoutSuite, CloudSploit, Cloud Custodian
What to check:

IAM policies and roles
Resource-based policies (S3 bucket policies, etc.)
Network configurations (VPCs, security groups, NACLs)
Encryption settings (at-rest, in-transit)
Logging and monitoring (CloudTrail, VPC Flow Logs, etc.)
Publicly accessible resources

AI/ML Infrastructure (MLflow, Kubeflow, SageMaker, Vertex AI)

Tools: Garak (LLM testing), custom scripts, cloud provider tools
What to check:

Experiment tracking access controls
Model registry permissions
Training job IAM permissions (ensure least-privilege access; audit with AWS IAM Access Analyzer, GCP Policy Analyzer)
Training job IAM permissions
Inference endpoint authentication
Data access controls
Model versioning and signing
Vector database security (if using RAG)

Runtime Security (Production Applications)

Tools: Falco, Sysdig, Aqua, Datadog Security, Wiz
What to check:

Runtime behavior anomalies
Container escapes or privilege escalation attempts
Network connections (unexpected egress)
File integrity monitoring
Process execution monitoring

Next Up: Deep Dive – Securing the Build-to-Deploy Chain

In the next post, we'll take everything we've learned about assessing DevOps security and apply it to the most critical attack surface: the build-to-deploy chain.

Mark Hammond

Expanding the Security Assessment Playbook

How to Evaluate DevOps and AI Systems with Modern Threat Models and Testing Techniques

Why Traditional Assessments Fall Short

What Modern Assessments Must Include

The AI/ML Assessment Delta

What Modern Assessments Must Include

Cloud-Native and Microservices

Infrastructure as Code and CI/CD

The AI/ML Assessment Delta

Data pipelines and training:

Model serving and inference:

AI tooling and agents:

Red Teaming and Adversarial Simulation

Attack Scenarios

Git repository compromise:

CI/CD runner compromise:

Training pipeline compromise:

Inference endpoint exploitation:

Cloud IAM abuse:

Threat Modeling with STRIDE

Mapping STRIDE to CI/CD Stages

Source (Code Repo)

Build (CI/CD)

Test Stage (Automated Testing)

Deploy Stage (Release to Production)

Runtime Stage (Production Operation)

Critical Trust Boundaries

Validation and Evidence Collection

Key Log Sources

Configuration Review

Testing Detection Capabilities

The Assessment Playbook

2-Day Assessment (High-Risk Focus aka The Sniff Test)

2-Week Assessment (Comprehensive Coverage):

What to Prioritize Based on Time:

Tools to Inspect by Layer

Source Code Management (GitHub, GitLab, Bitbucket)

CI/CD (Jenkins, GitHub Actions, GitLab CI, CircleCI)

Container Images and Registries (ECR, GCR, Docker Hub, ACR)

Infrastructure as Code (Terraform, CloudFormation, Pulumi)

Kubernetes (EKS, GKE, AKS, self-managed)

Cloud Platforms (AWS, Azure, GCP)

AI/ML Infrastructure (MLflow, Kubeflow, SageMaker, Vertex AI)

Runtime Security (Production Applications)

Next Up: Deep Dive – Securing the Build-to-Deploy Chain

AI and AIOps in DevOps – Opportunities and Risks

ARTAIS