Building a Burp Extension for Passive Analysis: The Parts Nobody Writes About

Jun 1

By Nathan Gabriel Wang, Associate Consultant, Artais

Introduction

Most Burp extension tutorials stop at "Hello World." They show you how to register a handler, print to the output tab, and call it a day. That's the easy part.

The hard part is what happens when you run your extension against a real application with thousands of requests. Your handler blocks the proxy. You get 500 duplicate findings for the same issue. Your exports contain credentials. The UI freezes when you try to scroll the findings table.

This post covers what actually matters for passive analysis extensions: thread safety, deduplication, evidence capture, export formats, and performance at scale. We'll build a minimal CORS misconfiguration detector that demonstrates these patterns without the complexity of a full framework.

The complete source is available here.

Authorization note: Only run extensions against systems you own or have explicit permission to test. The examples use placeholder data and detect issues passively without modifying traffic.

The Misconceptions Extension Tutorials Create

Extension tutorials encourage a few assumptions that don't hold up in practice:

"Registration is the hard part." Implementing ProxyResponseHandler takes five lines. Making it usable takes fifty. The real work is everything around the handler: thread-safe state, dedupe logic, and export.
"If it works on ten requests, it works on ten thousand." Small tests hide concurrency bugs. That HashSet for tracking seen URLs? It'll corrupt silently under load. That synchronous export? It'll freeze the proxy for seconds.
"More detection is better." Extensions that flag everything become noise generators. One finding per issue type per endpoint is useful. Five hundred findings for the same CORS wildcard is not.
"UI makes the extension professional." UI makes the extension complex. Most passive extensions don't need tables, filters, or custom tabs. Log to output, export on demand, build UI only when you hit real limitations.

Let's dig into each area where this plays out.

Architecture Mental Model

Burp extensions hook into tools that handle traffic. For passive analysis, implement ProxyResponseHandler—you get called for every proxied response, do your analysis, and return immediately.

Data flows like this: request arrives → response arrives → your handler runs → finding created → output logged. You have access to host, path, status code, headers, and tool source.

The key architectural principle: keep your core logic (analysis + finding model) separate from I/O (exporters, UI). Don't tie your detection logic to how findings get displayed. This makes testing easier and lets you swap output formats without touching analysis code.

public class PassiveAuditor implements BurpExtension, ProxyResponseHandler {
    @Override
    public void initialize(MontoyaApi api) {
        api.extension().setName("Passive CORS Auditor");
        api.proxy().registerResponseHandler(this);
    }

    @Override
    public ProxyResponseReceivedAction handleResponseReceived(InterceptedResponse response) {
        // Analysis here—keep it fast
        return ProxyResponseReceivedAction.continueWith(response);
    }
}

What tutorials get wrong here: They show the handler in isolation. In practice, you need scope checking (don't analyze out-of-scope traffic), state management (track what you've seen), and output handling (store findings somewhere useful). The handler is the entry point, not the whole extension.

Thread Safety

Burp processes requests across multiple threads. Your handler gets called concurrently. Track state wrong and you'll report duplicates, corrupt data, or crash.

// Wrong: race condition on concurrent access
private Set<String> seen = new HashSet<>();

public void check(String key) {
    if (!seen.contains(key)) {  // Thread A checks
        seen.add(key);           // Thread B adds same key
        report();                // Both threads report
    }
}

// Right: atomic check-and-add
private final Set<String> seen = ConcurrentHashMap.newKeySet();

public void check(String key) {
    if (seen.add(key)) {  // Returns true only if actually added
        report();          // Only one thread reports
    }
}

The ConcurrentHashMap.newKeySet() pattern gives you a thread-safe set with atomic add(). The return value tells you whether the element was new, eliminating the check-then-act race condition.

What tutorials get wrong here: They use synchronized blocks everywhere or ignore threading entirely. synchronized works but creates contention—every thread waits for every other thread. ConcurrentHashMap allows concurrent reads and usually concurrent writes to different keys. For a dedupe set, it's the right tool.

Keep callback work minimal. If you need heavy parsing (regex on large bodies, JSON parsing, network calls), queue work to a separate thread pool. For simple header checks, inline is fine.

When Analysis Throws

A single bad response should not take down the extension. Malformed messages, unexpected nulls, or Montoya API edge cases can throw from header accessors or URL parsing. If that exception propagates out of handleResponseReceived, Burp may disable the extension mid-assessment with little visible feedback.

Wrap the body of your handler in a try/catch, log a short message to the extension error stream, and always return continueWith(response) so the proxy keeps flowing:

@Override
public ProxyResponseReceivedAction handleResponseReceived(InterceptedResponse response) {
    try {
        considerRecording(response.initiatingRequest(), response);
    } catch (Exception ex) {
        api.logging().logToError("Passive CORS Auditor: " + ex.getMessage());
    }
    return ProxyResponseReceivedAction.continueWith(response);
}

Do not rethrow. One failure on one message is not worth losing every subsequent passive check.

Deduplication Logic

Without dedupe, you'll report the same issue hundreds of times as users browse. The key design matters more than the data structure:

Too narrow: /api/users?id=1 and /api/users?id=2 are different findings. You'll flood the output with duplicates that differ only in query parameters.
Too broad: One finding per host regardless of endpoint. You'll miss issues on different paths.

Reasonable default: host + normalized_path + issue_type. Strip query strings, lowercase the path.

private String normalizePath(String path) {
    int q = path.indexOf('?');
    return (q > 0 ? path.substring(0, q) : path).toLowerCase();
}

String key = host + "|" + normalizePath(path) + "|" + issueType;

This reports once per issue type per endpoint. Five requests to /api/users with the same CORS wildcard become one finding. Five requests to /api/users and /api/admin with the same issue become two findings.

When to adjust the key:

Include query param names (not values) if the application routes differently based on parameters: /api?action=getUser vs /api?action=deleteUser
Include HTTP method if GET and POST to the same path have different behaviors
Exclude path entirely if you only care about host-level issues like missing security headers

The dedupe set grows unbounded during a session. For most assessments this is fine—a few thousand unique keys use negligible memory. For very long sessions or automated scanning, consider periodic clearing or an LRU cache.

Evidence Capture

Capture what's needed to understand and verify the finding, nothing more:

URL (with query string stripped or redacted)
Relevant header values (ACAO, ACAC for CORS issues)
Issue classification
Timestamp (optional, useful for correlation)

Don't store: full request/response bodies, cookies, authorization headers, or anything that could leak credentials into exports. Even "safe" headers can contain tokens in edge cases.

private static class Finding {
    final String host;
    final String path;           // Query string stripped
    final String issueType;
    final String acao;           // The specific header value
    final boolean withCredentials;

    // No: full body, cookies, auth headers
}

Truncate long values. A 10KB Access-Control-Allow-Origin header (yes, this happens) doesn't need to be stored in full.

What tutorials get wrong here: They store the entire HttpRequestResponse object for "completeness." This bloats memory, slows exports, and risks leaking sensitive data. Store the minimum needed to understand the finding.

The One Feature: CORS Misconfiguration

We'll detect three header patterns commonly associated with CORS misconfiguration. Each is a signal worth investigating, not a confirmed vulnerability. Exploitability depends on what the endpoint returns and how the application uses credentials.

Access-Control-Allow-Origin: * — any origin is permitted; impactful when the endpoint returns sensitive data without relying on cookies (browsers block the wildcard-plus-credentials combination)
Access-Control-Allow-Origin: null — the null origin can be obtained from sandboxed iframes and some local contexts
Origin reflection — the response echoes the request Origin in Access-Control-Allow-Origin. With Access-Control-Allow-Credentials: true, this is the strongest signal (weak validation on a credentialed endpoint). Without credentials, reflection is still flagged at lower severity as an indicator of weak origin validation

// evaluateCors — classification only; no dedupe or UI
if ("*".equals(acao)) {
    issueType = "CORS Wildcard";
    severity = "Medium";
} else if ("null".equalsIgnoreCase(acao)) {
    issueType = "CORS Null Origin";
    severity = withCreds ? "High" : "Medium";
} else {
    String origin = request.headerValue("Origin");
    if (origin != null && origin.equals(acao)) {
        if (withCreds) {
            issueType = "CORS Origin Reflection";
            severity = "High";
        } else {
            issueType = "CORS Origin Reflection (no credentials)";
            severity = "Low";
        }
    }
}

What we don't detect: Whether the endpoint returns sensitive data, whether cookies are actually sent, or whether origin validation has bypassable logic. That's active testing territory. Passive detection surfaces the header pattern; manual verification determines exploitability.

Export Formats

Findings that can't leave Burp don't get used. Support at minimum:

CSV for quick triage in spreadsheets:

host,path,issue,acao,credentials
example.com,/api/data,CORS Wildcard,*,false
example.com,/api/user,CORS Origin Reflection,https://attacker.com,true

JSON for tooling integration:

[
  {
    "host": "example.com",
    "path": "/api/data",
    "issue": "CORS Wildcard",
    "acao": "*",
    "credentials": false
  }
]

Escape output properly. A path containing commas or quotes will corrupt CSV. A header value containing quotes will corrupt JSON. Use proper escaping, not string concatenation:

private String escapeCsv(String s) {
    if (s.contains(",") || s.contains("\"") || s.contains("\n")) {
        return "\"" + s.replace("\"", "\"\"") + "\"";
    }
    return s;
}

private String escapeJson(String s) {
    return s.replace("\\", "\\\\").replace("\"", "\\\"");
}

Keep exports clean—no secrets, no unnecessary fields. If you wouldn't want the export emailed to a client, don't include it.

UI/UX for Triage

Most passive extensions don't need custom UI. The default output tab handles logging. Export methods handle extraction. Build UI only when you hit real limitations.

When logging is enough:

Small number of finding types
Findings are self-explanatory from the log line
Export handles bulk analysis

When you need UI:

Many finding types that need filtering
Findings need in-Burp verification (clicking to see request/response)
Triage workflow requires marking findings as reviewed/ignored

If you do build UI, follow these principles:

Don't block the event dispatch thread. Swing runs on EDT. Long operations freeze the UI. Use SwingWorker or background threads for anything slow.
Don't interrupt users. No popups. No modal dialogs. No stealing focus. Log findings quietly; let users check when they're ready.
Make export obvious. Context menu on right-click. "Export" button in the tab. Don't make users hunt for it.
Support filtering by severity/type. A flat list of 500 findings is useless. Let users filter to "CORS with credentials" or "this host only."
Show finding count. Users want to know "how many issues" at a glance. Update the tab title or add a status line.

Minimal UI pattern (table + export):

// Model for findings table
private DefaultTableModel tableModel = new DefaultTableModel(
    new String[]{"Host", "Path", "Issue", "ACAO", "Credentials"}, 0
);

// Add finding to table (call from EDT)
SwingUtilities.invokeLater(() -> {
    tableModel.addRow(new Object[]{
        finding.host, finding.path, finding.issueType,
        finding.acao, finding.withCredentials
    });
});

Reference implementation: The extension/ project registers a suite tab with a sortable table, substring filter, explicit export buttons, a status line with finding and dedupe-key counts, and a Benchmark proxy history action that walks api.proxy().history() using the same pure evaluation path used in the handler (timed loop does not mutate dedupe state or the UI, so it stays repeatable on large histories).

What tutorials get wrong here: They build elaborate UIs for extensions that would work fine with just logging. UI adds complexity, threading concerns, and maintenance burden. Start with output logging; add UI when users complain.

Benchmarking Performance

Performance matters because your handler runs on Burp's proxy threads. Block them and the proxy freezes. Slow them and users notice lag.

What to Measure

Per-message processing time — How long does your handler take? Target: <1ms data-preserve-html-node="true" for simple checks.
Memory growth — Does your dedupe set grow unbounded? Does storing findings consume excessive memory?
Proxy responsiveness under load — Does Burp stay responsive while your extension runs?
Export time — How long does exporting 10,000 findings take?

Methodology for Large Proxy Histories

To benchmark against realistic traffic:

Option 1: Replay from proxy history

Prefer timing classification only (no writes to dedupe sets, no Swing updates) so the number reflects proxy-history throughput, not UI cost. The reference extension’s benchmark button calls evaluateCors() only—not considerRecording(), which is where seen.add() runs. The seen field exists for live traffic dedupe; the benchmark loop never touches it, the findings list, or the table.

// Benchmark: evaluateCors only (no seen.add(), no UI)
for (ProxyHttpRequestResponse item : api.proxy().history()) {
    HttpResponse resp = item.response();
    if (resp == null) continue;
    if (evaluateCors(item.request(), resp) != null) {
        positives++;
    }
}

// Illustrative timing wrapper (same idea as the benchmark button)
public void benchmarkFromHistory() {
    long start = System.nanoTime();
    int count = 0;

    for (ProxyHttpRequestResponse item : api.proxy().history()) {
        if (evaluateCors(item.request(), item.response()) != null) {
            count++;
        }
    }

    long elapsed = System.nanoTime() - start;
    api.logging().logToOutput(String.format(
        "Processed %d items in %d ms (%.2f ms/item)",
        count, elapsed / 1_000_000, (double) elapsed / count / 1_000_000
    ));
}

Option 2: Synthetic load test

Generate synthetic responses to test throughput without needing real traffic:

// Create mock responses with CORS headers
for (int i = 0; i < 100_000; i++) {
    String key = "host" + (i % 1000) + "|/path" + (i % 100) + "|CORS Wildcard";
    seen.add(key);  // Benchmark dedupe set performance
}

Option 3: Profile during real browsing

Add timing to your handler and log statistics periodically:

private final AtomicLong totalTime = new AtomicLong();
private final AtomicInteger callCount = new AtomicInteger();

@Override
public ProxyResponseReceivedAction handleResponseReceived(InterceptedResponse response) {
    long start = System.nanoTime();

    // Your analysis...

    totalTime.addAndGet(System.nanoTime() - start);
    int count = callCount.incrementAndGet();

    if (count % 1000 == 0) {
        api.logging().logToOutput(String.format(
            "Avg processing time: %.3f ms (%d calls)",
            (double) totalTime.get() / count / 1_000_000, count
        ));
    }

    return ProxyResponseReceivedAction.continueWith(response);
}

Performance Characteristics of This Extension

For the CORS auditor, the hot path is:

Scope check — isInScope() call, O(1) against scope rules
Header lookup — headerValue() call, O(n) where n is header count (typically <50) data-preserve-html-node="true"
String comparison — Comparing ACAO value, O(1)
Set insertion — ConcurrentHashMap.add(), O(1) amortized

Total: Sub-millisecond for typical responses. No regex, no body parsing, no network calls.

Memory: The dedupe set stores one string key per unique finding. At 100 bytes per key and 10,000 unique findings, that's ~1MB. The findings list stores one object per finding with truncated fields. Negligible for typical assessments.

What tutorials get wrong here: They don't mention performance at all, or they over-engineer with complex profiling frameworks. Simple timing with System.nanoTime() and periodic logging is enough for most extensions.

Common Mistakes

Blocking the callback thread with regex on large bodies, JSON parsing, or network calls. If it might be slow, queue it to a background thread.
Capturing too much and leaking credentials into exports. Store the minimum needed to understand the finding.
No dedupe turning 10 issues into 500 findings. Users will disable your extension.
Overclaiming severity on findings that need manual verification. "CORS Wildcard detected" is accurate. "Critical vulnerability allowing account takeover" is not (without verification).
Building UI first instead of getting the analysis right. UI is polish. Correct detection is the product.
Ignoring scope. Analyzing every response regardless of scope wastes cycles and floods output with irrelevant findings.
Using synchronized everywhere. It works but creates contention. Use concurrent collections designed for your access pattern.

Conclusion

The practical parts of passive extension development: thread-safe state (ConcurrentHashMap.newKeySet()), sensible dedupe keys (host + path + issue), minimal evidence capture (no bodies, no credentials), clean exports (CSV/JSON with proper escaping), and performance awareness (sub-millisecond handlers, simple benchmarking).

Get these right and the extension works. Everything else—UI, persistence, multi-rule frameworks—is optional complexity. Add features when you hit real limitations, not because tutorials show elaborate examples.

Mark Hammond