Expanding the Security Assessment Playbook

How to Evaluate DevOps and AI Systems with Modern Threat Models and Testing Techniques

Why Traditional Assessments Fall Short

In March 2023, the 3CX supply chain attack exposed a critical vulnerability in modern software delivery: the build environment itself. Attackers compromised 3CX's build infrastructure, injecting malware into the company's legitimate desktop application. The poisoned software was signed with valid certificates, passed all security checks, and was distributed to over 600,000 companies globally, including critical infrastructure providers, financial institutions, and government agencies. Traditional security assessments had validated 3CX's application security, code quality, and compliance posture. Yet none of these caught the compromise of the build pipeline that produced the software.

This wasn't an isolated incident. It followed similar attacks on CircleCI (January 2023), Codecov (2021), and others. The pattern is clear: attackers are targeting the infrastructure that builds and deploys software, not just the software itself.

Legacy assessments fail modern organizations because:

SDLC-only thinking misses pipeline realities. Traditional reviews focus on code, testing, and production, but modern teams ship code hundreds of times daily through automated pipelines with admin-level production access. These pipelines manage secrets and make deployment decisions, yet they're rarely included in security reviews. In the 3CX attack, the application code was fine; the build system that compiled it was compromised.

Compliance frameworks lag behind. SOC 2 and ISO 27001 check boxes but don't ask: "Can a compromised GitHub Action access production? Do ML pipelines have unrestricted data access? Can an attacker modify your build artifacts without detection?" Compliance provides a baseline, not security. 3CX had passed security audits, but those audits didn't include their build infrastructure.

Ephemeral infrastructure is invisible. Containers and serverless functions exist briefly and disappear. Build agents spin up, compile code, and terminate. Traditional tools struggle with constantly shifting attack surfaces where the infrastructure executing your most critical processes may only exist for minutes.

AI/ML systems are uncharted territory. Most security teams can't assess ML pipelines, model training security, or vector database risks. As organizations rush to deploy AI, with models trained on sensitive data and LLMs integrated into production systems, these gaps become critical vulnerabilities.

The result? Organizations pass audits while harboring critical exposures: hardcoded secrets in CI/CD configs, over-permissioned service accounts, unmonitored ML pipelines with access to customer data, and build systems that could be compromised to inject malicious code at massive scale.

Modern threats exploit modern infrastructure. If your assessment playbook doesn't cover CI/CD, containers, IaC, and AI systems, you're not assessing your actual attack surface, you're auditing documentation while attackers are targeting the pipelines that deliver your software to customers.

What Modern Assessments Must Include

Cloud-Native and Microservices

  • Container security: image provenance, vulnerability scanning, runtime policies

  • Service mesh security: mTLS, authorization, rate limiting

  • Serverless: IAM permissions, secret handling, event validation

  • Kubernetes: RBAC, pod security, network policies, admission controllers

Infrastructure as Code and CI/CD

  • IaC security: Scan Terraform/CloudFormation for overly permissive IAM, review state file security, detect drift

  • CI/CD pipelines: Secure runners, pipeline-as-code review, approval gates, artifact signing (SLSA)

  • Secrets management: Vault/cloud-native solutions (never environment variables), rotation policies, least-privilege access

  • Container orchestration: RBAC, network policies, etcd encryption, audit logging

The AI/ML Assessment Delta

Data pipelines and training:

  • Training data access controls and provenance tracking

  • Training environment isolation from production

  • Compute permissions scoped to minimum requirements

  • Experiment tracking security (MLflow, Weights & Biases)

Model serving and inference:

  • Model access controls and signing

  • Input validation and output filtering (prompt injection, adversarial inputs)

  • Rate limiting and authentication on inference endpoints

  • Model API protection

AI tooling and agents:

  • LLM coding assistant permissions (GitHub Copilot access scope)

  • Autonomous agent permission boundaries

  • Vector database access controls and encryption

  • Model registry security

Key questions: If an attacker compromises training, what data leaks? If they poison a model, how do you detect it? If inference is compromised, what's the blast radius? AI infrastructure needs the same rigor as production databases.

What Modern Assessments Must Include

A comprehensive DevOps security assessment must evaluate the full technology stack that builds, delivers, and operates modern applications, not just the applications themselves. This means extending your security review beyond traditional application security to include the CI/CD pipelines, container infrastructure, cloud-native architectures, and increasingly, AI/ML systems that have become critical production components. Here's what that means in practice:

Cloud-Native and Microservices

  • Container security: image provenance, vulnerability scanning, runtime policies

  • Service mesh security: mTLS, authorization, rate limiting

  • Serverless: IAM permissions, secret handling, event validation

  • Kubernetes: RBAC, pod security, network policies, admission controllers

Infrastructure as Code and CI/CD

  • IaC security: Scan Terraform/CloudFormation for overly permissive IAM, review state file security, detect drift

  • CI/CD pipelines: Secure runners, pipeline-as-code review, approval gates, artifact signing (SLSA)

  • Secrets management: Vault/cloud-native solutions (never environment variables), rotation policies, least-privilege access

  • Container orchestration: RBAC, network policies, etcd encryption, audit logging

The AI/ML Assessment Delta

Data pipelines and training:

  • Training data access controls and provenance tracking

  • Training environment isolation from production

  • Compute permissions scoped to minimum requirements

  • Experiment tracking security (MLflow, Weights & Biases)

Model serving and inference:

  • Model access controls and signing

  • Input validation and output filtering (prompt injection, adversarial inputs)

  • Rate limiting and authentication on inference endpoints

  • Model API protection

AI tooling and agents:

  • LLM coding assistant permissions (GitHub Copilot access scope)

  • Autonomous agent permission boundaries

  • Vector database access controls and encryption

  • Model registry security

Key questions: If an attacker compromises training, what data leaks? If they poison a model, how do you detect it? If inference is compromised, what's the blast radius? AI infrastructure needs the same rigor as production databases.

Red Teaming and Adversarial Simulation

The most effective way to validate your DevOps security posture is to attack it. Red teaming exercises simulate real-world adversary behavior, exposing weaknesses that paper assessments and compliance checklists miss. Here's how to structure adversarial simulations that test the security of modern DevOps pipelines, AI systems, and cloud infrastructure.

Attack Scenarios

Git repository compromise:

  • Attack path: Stolen credentials → push malicious code or modify GitHub Actions to exfiltrate secrets

  • Test: Branch protection, code review enforcement, secret scanning

CI/CD runner compromise:

  • Attack path: Compromised runner → access secrets, modify artifacts, pivot using cloud IAM credentials

  • Test: Runner isolation, secret injection, IAM scope, artifact signing

Training pipeline compromise:

  • Attack path: Access ML training → poison data, exfiltrate datasets, modify models, pivot to cloud resources

  • Test: Data access controls, environment isolation, model signing

Inference endpoint exploitation:

  • Attack path: Prompt injection, model inversion, adversarial inputs, resource exhaustion

  • Test: Input validation, rate limiting, output filtering, monitoring

Cloud IAM abuse:

  • Attack path: Compromise pipeline with excessive AWS permissions → access databases, create backdoor accounts

  • Test: Least-privilege IAM, audit logs

Tools and Techniques

Offensive tools: Nuclei, Semgrep, TruffleHog (secret scanning), ScoutSuite (cloud auditing), Peirates (K8s testing), Garak (LLM vulnerability scanning)

Goal: Demonstrate realistic attack paths from initial access to business impact. Document every permission that enabled the attack and every control that could prevent it.

Threat Modeling with STRIDE

STRIDE is Microsoft's threat modeling framework that helps identify threats across six categories: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. It's particularly well-suited for DevOps because it focuses on data flows, trust boundaries, and process interactions, exactly what CI/CD pipelines are made of. Let's walk through each pipeline stage and identify threats using STRIDE:

Mapping STRIDE to CI/CD Stages

Source (Code Repo)

Trust boundary: Developer workstation → Git repository → CI/CD trigger

Spoofing:

  • Attacker impersonates developer (stolen credentials, session hijacking)

  • Commit signature spoofing (unsigned or improperly verified commits)

  • Mitigation: MFA, commit signing (GPG), IP allowlisting for Git access

Tampering:

  • Malicious code injection via compromised developer account

  • Direct push to protected branches (bypassing PR review)

  • Rewriting git history to hide malicious changes

  • Mitigation: Branch protection, required reviews, signed commits, audit logs

Repudiation:

  • Developer claims "I didn't commit that malicious code"

  • Lack of audit trail for who approved/merged PRs

  • Mitigation: Commit signing, immutable audit logs, PR approval tracking

Information Disclosure:

  • Secrets hardcoded in code (API keys, passwords)

  • Sensitive data in commit history

  • Public repo accidentally containing proprietary code

  • Mitigation: Secret scanning (GitHub Advanced Security, TruffleHog), pre-commit hooks, private repos with access controls

Denial of Service:

  • Repo filled with large files (Git LFS abuse)

  • Excessive webhook triggers overwhelming CI/CD system

  • Mitigation: Repo size limits, webhook rate limiting, CI/CD job quotas

Elevation of Privilege:

  • Attacker gains write access to repo → can now trigger CI/CD with elevated permissions

  • Mitigation: Least-privilege repo access, separate CI/CD credentials from repo access

Build (CI/CD)

Trust boundary: Git repo → CI/CD runner → Build artifacts

Spoofing:

  • Attacker submits malicious PR that appears legitimate

  • Fake build artifacts uploaded to registry

  • Mitigation: PR review, artifact signing, provenance attestation (SLSA)

Tampering:

  • Malicious code injected during build (dependency confusion, compromised build script)

  • Build cache poisoning

  • Modification of artifacts before upload

  • Mitigation: Dependency pinning/vendoring, reproducible builds, artifact signing, isolated build environments

Repudiation:

  • "This artifact came from CI/CD, trust it" but no provenance

  • Can't prove what source code produced what artifact

  • Mitigation: SLSA provenance, signed build logs, SBOM generation

Information Disclosure:

  • Secrets leaked in build logs

  • Proprietary code exposed through build artifacts

  • CI/CD runner can access excessive cloud resources (list all S3 buckets, read all secrets)

  • Mitigation: Secret masking in logs, least-privilege IAM for runners, ephemeral runners, log access controls

Denial of Service:

  • Resource-exhaustive builds (crypto mining in CI/CD)

  • Infinite loop in build script

  • Filling artifact storage with junk

  • Mitigation: Build timeouts, resource quotas, cost monitoring

Elevation of Privilege:

  • CI/CD runner has admin-level cloud IAM permissions

  • Build process can modify production infrastructure

  • Attacker compromises build → inherits excessive permissions

  • Mitigation: Least-privilege IAM, separate build and deploy permissions, approval gates before production changes

Test Stage (Automated Testing)

Trust boundary: Build artifacts → Test environment → Test results

Spoofing:

  • Fake test results (tests claim to pass but were never run)

  • Compromised test dependencies

  • Mitigation: Signed test reports, immutable test infrastructure, verified test framework versions

Tampering:

  • Test data manipulation to hide vulnerabilities

  • Modification of test configs to skip security tests

  • Disabling code coverage or vulnerability scanning

  • Mitigation: Policy enforcement (required tests can't be disabled), test result integrity checks, audit logs

Repudiation:

  • "Tests passed" but no audit trail of what was actually tested

  • Mitigation: Detailed test reports, logs of all tests run, version-controlled test configurations

Information Disclosure:

  • Test environments using production data without sanitization

  • Test results exposing sensitive business logic or vulnerabilities

  • Mitigation: Synthetic test data, test environment isolation, access controls on test reports

Denial of Service:

  • Resource exhaustion in test environment

  • Long-running tests blocking deployment pipeline

  • Mitigation: Test timeouts, parallel testing, test environment resource limits

Elevation of Privilege:

  • Test environment has access to production secrets/data

  • Attacker compromises test job → accesses production

  • Mitigation: Separate test and production environments, least-privilege access, network segmentation

Deploy Stage (Release to Production)

Trust boundary: Test results → Deployment system → Production environment

Spoofing:

  • Deployment triggered by unauthorized user

  • Fake deployment approval

  • Mitigation: Authenticated deployment requests, approval workflows, audit logs

Tampering:

  • Artifact swapped between test and deploy stages

  • IaC templates modified during deployment

  • Configuration drift (deployed resources don't match IaC definitions)

  • Mitigation: Artifact signing and verification, immutable artifacts, drift detection

Repudiation:

  • "Who deployed this broken change?" with no audit trail

  • Mitigation: Deployment logs, immutable audit records, integration with change management systems

Information Disclosure:

  • Deployment logs exposing production secrets or architecture details

  • Deployment system can read all production secrets

  • Mitigation: Secret masking, least-privilege deployment credentials, log access controls

Denial of Service:

  • Malicious deployment takes down production

  • Deployment process lacks rollback capability

  • Mitigation: Blue-green or canary deployments, automated rollback, health checks before traffic shift

Elevation of Privilege:

  • Deployment system has excessive production permissions (can modify IAM, delete databases)

  • Compromised deployment pipeline → full production control

  • Mitigation: Least-privilege deployment IAM, approval gates for sensitive changes, separate control plane access

Runtime Stage (Production Operation)

Trust boundary: Production infrastructure → Application/services → External users

Spoofing:

  • Service impersonation (unauthorized service claims to be legitimate microservice)

  • Compromised identity provider

  • Mitigation: mTLS between services, service mesh authentication, strong identity verification

Tampering:

  • Configuration drift from IaC definitions

  • Runtime modification of container images or binaries

  • Data tampering in databases or message queues

  • Mitigation: Immutable infrastructure, runtime integrity monitoring, encryption at rest

Repudiation:

  • Insufficient logging of user/service actions

  • Logs can be modified or deleted

  • Mitigation: Centralized logging, immutable log storage, log integrity verification

Information Disclosure:

  • Exposed APIs or dashboards (Kubernetes dashboard, database admin panels)

  • Cloud storage misconfiguration (public S3 buckets)

  • Excessive logging of sensitive data

  • Mitigation: Network policies, authentication on all admin interfaces, data classification, log sanitization

Denial of Service:

  • Application vulnerabilities (unpatched CVEs)

  • Resource exhaustion from malicious traffic

  • Mitigation: WAF, rate limiting, auto-scaling, vulnerability management

Elevation of Privilege:

  • Container escape to host

  • Compromised pod accessing Kubernetes API with excessive RBAC

  • Service account abuse

  • Mitigation: Pod security standards, least-privilege RBAC, network policies, runtime security tools (Falco)

Critical Trust Boundaries

  1. Developer workstation ↔ Git: MFA, commit signing

  2. Git ↔ CI/CD: Webhook authentication, least-privilege access

  3. CI/CD ↔ Cloud: Minimal IAM, network isolation

  4. Staging ↔ Production: Network segmentation, separate credentials, approval gates

  5. Service ↔ Service: mTLS, authorization

Principle: Every boundary crossing requires authentication, authorization, and audit logging.

Validation and Evidence Collection

Threat modeling identifies potential vulnerabilities, but validation proves they exist (or don't). This section covers how to systematically collect evidence during a DevOps security assessment, what logs to review, what configurations to inspect, and how to test whether your detection capabilities actually work when an attack occurs.

Key Log Sources

AWS CloudTrail / Azure Activity / GCP Audit:

  • Red flags: Unexpected IAM changes, excessive API calls, secret access from wrong systems

GitHub/GitLab Audit Logs:

  • Red flags: Branch protection changes, new deploy keys, webhook creation, failed auth attempts

Kubernetes Audit Logs:

  • Red flags: Exec into pods, secret reads by wrong accounts, RBAC changes, privileged pods

CI/CD Run Histories:

  • Red flags: Secrets in logs, unexpected network connections, unusual job durations

Configuration Review

IaC (Terraform, CloudFormation): Check for IAM wildcards, overly permissive security groups, hardcoded secrets, public resource access

CI/CD Configs: Review secret handling, runner specifications, approval requirements, third-party actions

Kubernetes Manifests: Validate security contexts, resource limits, RBAC bindings, network policies

IAM Permissions: Identify admin-level permissions for CI/CD/apps, unused permissions, long-lived credentials

Testing Detection Capabilities

  • Log injection: Can attackers hide activity?

  • Model misfire: Do anomalous ML outputs trigger alerts?

  • Pipeline poisoning: Are malicious pipeline changes detected?

  • Secret exposure: How fast from exposure to revocation?

  • Unauthorized access: Do suspicious patterns trigger alerts?

The Assessment Playbook

This playbook provides a structured, repeatable approach to conducting DevOps security assessments. Use it to ensure comprehensive coverage whether you have limited time for a focused review or weeks for deep analysis. The key is consistency. Applying the same framework lets you measure improvement over time and compare security posture across teams.

2-Day Assessment (High-Risk Focus aka The Sniff Test)

Day 1:

  • Morning: Interview key personnel (1 DevOps lead, 1 Security person) to understand architecture

  • Afternoon: Focus on highest-risk areas:

    • Review IAM permissions for CI/CD runners and production service accounts

    • Audit production Kubernetes RBAC and pod security

    • Check for secrets in code/logs (automated scan with TruffleHog/Semgrep)

    • Review recent CloudTrail/audit logs for anomalies

Day 2:

  • Morning: Red team simulation (1-2 attack scenarios, e.g., compromised GitHub Action, container escape attempt)

  • Afternoon: Document findings, prioritize top 5 critical issues, deliver brief with recommendations

Output: Executive summary with critical issues, recommended immediate actions, and proposal for deeper assessment

2-Week Assessment (Comprehensive Coverage):

Week 1:

  • Days 1-2: Discovery and documentation

    • Architecture review meetings with DevOps, Security, and ML teams

    • Document all CI/CD pipelines, cloud environments, and AI/ML systems

    • Collect configurations, policies, and access documentation

  • Days 3-5: Automated scanning and configuration review

    • Run comprehensive security scans (IaC, containers, cloud configs)

    • Review IAM permissions, RBAC policies, network configurations

    • Analyze audit logs for past 90 days

    • Document all findings with severity ratings

Week 2:

  • Days 6-8: Red team simulation and adversarial testing

    • Execute 5-7 attack scenarios across different trust boundaries

    • Test AI/ML pipeline security, model inference endpoints

    • Validate detection and response capabilities

  • Days 9-10: Analysis, reporting, and remediation planning

    • Consolidate findings, eliminate false positives

    • Prioritize issues by risk (likelihood × impact)

    • Create detailed remediation roadmap with timelines

    • Present comprehensive findings to stakeholders

Output: Detailed assessment report, risk register, remediation roadmap, executive briefing

What to Prioritize Based on Time:

Always cover (regardless of time):

  • Production access controls (who can deploy/modify production?)

  • Secret management (are secrets hardcoded anywhere?)

  • CI/CD runner permissions (do they have excessive cloud access?)

Add if you have 3-7 days:

  • Container security (image scanning, runtime policies)

  • Network segmentation (can dev access prod? Can prod egress freely?)

  • Kubernetes security (RBAC, pod security, network policies)

Add if you have 1-2 weeks:

  • AI/ML pipeline security

  • Comprehensive red teaming

  • Detection and response validation

  • Supply chain security (dependencies, third-party integrations)

Add if you have 2+ weeks:

  • Threat modeling workshops with development teams

  • Custom tooling development for automated testing

  • Policy-as-code implementation

  • Compliance mapping (SOC 2, ISO 27001, etc.)

Tools to Inspect by Layer

Source Code Management (GitHub, GitLab, Bitbucket)

  • Tools: Git-secrets, TruffleHog, Gitleaks

  • What to check:

    • Branch protection rules

    • Required reviewers and status checks

    • Commit signing enforcement

    • Repository visibility settings

    • Deploy keys and webhook configurations

    • Access audit logs

CI/CD (Jenkins, GitHub Actions, GitLab CI, CircleCI)

  • Tools: Semgrep, CodeQL, custom scripts

  • What to check:

    • Pipeline-as-code files (.github/workflows/, .gitlab-ci.yml, Jenkinsfile)

    • Secret management (are secrets in code or properly vaulted?)

    • Third-party integrations (actions, plugins)

    • Runner/agent configurations

    • Build logs for secret exposure

    • Approval workflows

Container Images and Registries (ECR, GCR, Docker Hub, ACR)

  • Tools: Trivy, Grype, Clair, Anchore

  • What to check:

    • Vulnerability scan results

    • Image signatures and attestations

    • Base image sources

    • Image layers (secrets baked in?)

    • Registry access controls

    • Image pull/push logs

Infrastructure as Code (Terraform, CloudFormation, Pulumi)

  • Tools: Checkov, tfsec, Terrascan, Sentinel

  • What to check:

    • IAM policies (wildcards, excessive permissions)

    • Security group rules

    • Encryption settings

    • Public access configurations

    • State file security

    • Drift detection (does prod match IaC?)

Kubernetes (EKS, GKE, AKS, self-managed)

  • Tools: Kube-bench, Kube-hunter, Kubesec, Polaris, Falco

  • What to check:

    • Pod security standards/policies

    • RBAC configurations

    • Network policies

    • Service account permissions

    • Secrets management (external secrets operator?)

    • Audit logging configuration

    • Admission controllers

Cloud Platforms (AWS, Azure, GCP)

  • Tools: Prowler, ScoutSuite, CloudSploit, Cloud Custodian

  • What to check:

    • IAM policies and roles

    • Resource-based policies (S3 bucket policies, etc.)

    • Network configurations (VPCs, security groups, NACLs)

    • Encryption settings (at-rest, in-transit)

    • Logging and monitoring (CloudTrail, VPC Flow Logs, etc.)

    • Publicly accessible resources

AI/ML Infrastructure (MLflow, Kubeflow, SageMaker, Vertex AI)

  • Tools: Garak (LLM testing), custom scripts, cloud provider tools

  • What to check:

    • Experiment tracking access controls

    • Model registry permissions

    • Training job IAM permissions (ensure least-privilege access; audit with AWS IAM Access Analyzer, GCP Policy Analyzer)

    • Training job IAM permissions

    • Inference endpoint authentication

    • Data access controls

    • Model versioning and signing

    • Vector database security (if using RAG)

Runtime Security (Production Applications)

  • Tools: Falco, Sysdig, Aqua, Datadog Security, Wiz

  • What to check:

    • Runtime behavior anomalies

    • Container escapes or privilege escalation attempts

    • Network connections (unexpected egress)

    • File integrity monitoring

    • Process execution monitoring

Next Up: Deep Dive – Securing the Build-to-Deploy Chain

In the next post, we'll take everything we've learned about assessing DevOps security and apply it to the most critical attack surface: the build-to-deploy chain.

Next
Next

AI and AIOps in DevOps – Opportunities and Risks