Introduction

mitm2openapi converts mitmproxy flow dumps and HAR files into OpenAPI 3.0 specifications. It ships as a single static binary — no Python, no virtual environment, no runtime dependencies.

It is a Rust rewrite of mitmproxy2swagger by @alufers, who pioneered the "capture traffic, extract API spec" workflow. Credit to the original project for the idea and reference implementation.

Why?

The Python original works well but requires Python, pip, and mitmproxy installed in the environment. For CI pipelines, slim Docker images, security audits, and one-off usage, that dependency chain is friction.

mitm2openapi ships as a single ~5 MB static binary. Drop it into any environment and run. Same OpenAPI 3.0 output, plus first-class HAR support and glob-based filters for fully unattended pipelines.

Features

  • Fast — pure Rust, ~17× faster than the Python original (benchmarks)
  • Single static binary — no Python, no venv, no pip, no runtime dependencies
  • Two-format support — mitmproxy flow dumps (v19/v20/v21) and HAR 1.2
  • Two-step workflowdiscover finds endpoints, you curate, generate emits OpenAPI 3.0
  • Glob filters--exclude-patterns and --include-patterns for automated pipelines
  • Error recovery — skips corrupt flows, continues processing
  • Auto-detection — heuristic format detection from file content
  • Resource limits — configurable caps prevent denial-of-service on untrusted input
  • Strict mode — treat warnings as errors for CI gates
  • Structured reports--report outputs machine-readable JSON processing summaries
  • Battle-tested — integration tests against Swagger Petstore and OWASP crAPI
  • Cross-platform — Linux, macOS, Windows pre-built binaries

How it works

The tool uses a two-step workflow:

  1. Discover — scan captured traffic and list all observed API endpoints
  2. Curate — review the list and select which endpoints to include
  3. Generate — produce a clean OpenAPI 3.0 spec from the selected endpoints

This separates endpoint selection from spec generation, giving you full control over what ends up in the final spec.

Next steps

Installation

From binary releases

Download a pre-built binary for your platform from GitHub Releases.

Binaries are available for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64).

# Example: Linux x86_64 — replace <VERSION> with the release tag (e.g. v0.5.1)
curl -L "https://github.com/Arkptz/mitm2openapi/releases/download/<VERSION>/mitm2openapi-<VERSION>-x86_64-unknown-linux-gnu.tar.gz" \
  | tar xz
sudo mv mitm2openapi /usr/local/bin/

From source (via Cargo)

If you have a Rust toolchain installed:

cargo install --git https://github.com/Arkptz/mitm2openapi

Or from crates.io:

cargo install mitm2openapi

Verify installation

mitm2openapi --version

Shell completions

mitm2openapi uses clap for argument parsing. Shell completions are not yet bundled, but you can generate them for most shells via clap_complete if building from source.

Quick start

This walkthrough takes you from a traffic capture to a complete OpenAPI spec in under a minute.

Prerequisites

  • mitm2openapi installed (see installation)
  • A captured traffic file — either a mitmproxy .flow dump or a .har export from browser DevTools

If you do not have a capture yet, see capturing traffic for setup instructions.

Step 1: Discover endpoints

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com"

This scans every request in capture.flow that matches the prefix https://api.example.com and writes a templates file listing all observed URL paths.

Step 2: Curate the templates

Open templates.yaml. Each path is prefixed with ignore: by default:

x-path-templates:
- ignore:/api/users
- ignore:/api/users/{id}
- ignore:/api/products
- ignore:/static/bundle.js

Remove the ignore: prefix from paths you want in the final spec:

x-path-templates:
- /api/users
- /api/users/{id}
- /api/products
- ignore:/static/bundle.js

Paths still prefixed with ignore: are excluded from the generated spec.

Step 3: Generate the OpenAPI spec

mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com"

The resulting openapi.yaml contains a valid OpenAPI 3.0 spec with paths, methods, parameters, request bodies, and response schemas inferred from the captured traffic.

Skip the manual edit

If you already know which paths matter, use glob filters to automate curation:

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg' \
  --include-patterns '/api/**,/v2/**'

mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com"

Paths matching --include-patterns are auto-activated (no ignore: prefix). Paths matching --exclude-patterns are dropped entirely. Everything else still gets ignore: for manual review.

See filtering endpoints for the full glob syntax reference.

HAR files

The same workflow works with HAR files — just point -i at a .har file. The format is auto-detected:

mitm2openapi discover \
  -i capture.har \
  -o templates.yaml \
  -p "https://api.example.com"

See HAR files for details on exporting HARs from browser DevTools.

Capturing traffic

Before you can generate an OpenAPI spec, you need a captured traffic file. This chapter covers the most common ways to capture HTTP traffic.

mitmproxy is a free, open-source HTTPS proxy. It captures traffic in its own binary flow format that mitm2openapi reads natively.

Install mitmproxy

# macOS
brew install mitmproxy

# Linux (pip)
pip install mitmproxy

# Or download from https://mitmproxy.org/

See the mitmproxy installation docs for platform-specific instructions.

Capture with mitmdump

mitmdump is the non-interactive version of mitmproxy, ideal for scripted captures:

# Start the proxy and write all traffic to a flow file
mitmdump -w capture.flow

# In another terminal, route your HTTP client through the proxy:
curl --proxy http://localhost:8080 https://api.example.com/users

The default proxy port is 8080. Use -p to change it:

mitmdump -w capture.flow -p 9090

Capture with mitmweb

mitmweb provides a browser-based UI for inspecting traffic in real time:

mitmweb -w capture.flow
# Open http://localhost:8081 in your browser to inspect traffic

HTTPS traffic

For HTTPS, you need to install the mitmproxy CA certificate on the client machine. After starting mitmproxy, navigate to http://mitm.it from the proxied client to download and install the certificate.

See the mitmproxy certificate docs for detailed instructions.

Tips

  • Use mitmdump --set flow_detail=0 for minimal console output during long captures
  • Combine with --set save_stream_filter to capture only specific hosts
  • The flow format is versioned (v19/v20/v21) — mitm2openapi supports all three

Option 2: Browser DevTools (HAR export)

All modern browsers can export captured network traffic as HAR (HTTP Archive) files.

Chrome / Chromium

  1. Open DevTools (F12 or Ctrl+Shift+I)
  2. Switch to the Network tab
  3. Ensure recording is active (red circle icon)
  4. Perform the actions you want to capture
  5. Right-click in the request list → Save all as HAR with content

Firefox

  1. Open DevTools (F12)
  2. Switch to the Network tab
  3. Perform the actions you want to capture
  4. Click the gear icon → Save All As HAR

Safari

  1. Enable the Develop menu in Preferences → Advanced
  2. Open Web Inspector (Cmd+Option+I)
  3. Switch to the Network tab
  4. Perform the actions
  5. Click Export in the toolbar

Note

HAR files from browser DevTools contain the full request and response bodies. Sensitive data (cookies, tokens, passwords) will be present in the export. Sanitize before sharing.

Option 3: Other HTTP proxies

Any tool that produces HAR 1.2 output works with mitm2openapi:

  • Charles Proxy — export sessions as HAR via File → Export
  • Fiddler — File → Export Sessions → HTTPArchive
  • Proxyman — export as HAR from the session menu

What to capture

For the best OpenAPI spec, capture diverse traffic:

  • Multiple endpoints — the more paths covered, the more complete the spec
  • Different HTTP methods — GET, POST, PUT, DELETE on the same resource
  • Various response codes — 200, 400, 404, 500 responses produce richer schemas
  • Query parameters — include requests with different query strings
  • Request bodies — POST/PUT with different payloads improve body schema inference

Next steps

Once you have a capture file, proceed to the quick start or learn about the full discover → curate → generate pipeline.

Discover, curate, generate

mitm2openapi uses a three-step pipeline to convert captured HTTP traffic into an OpenAPI specification. This chapter explains each step in detail.

Overview

graph LR
    A[Traffic capture] --> B[discover]
    B --> C[Templates file]
    C --> D[Curate]
    D --> E[generate]
    E --> F[OpenAPI 3.0 spec]

The pipeline separates endpoint discovery from spec generation, giving you an explicit curation step where you choose which endpoints appear in the final spec.

Step 1: Discover

The discover command scans a traffic capture and extracts all unique URL paths that match a given prefix.

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com"

What happens internally

  1. The input file is read incrementally (streaming — memory usage stays bounded)
  2. Each request's URL is checked against the --prefix filter
  3. Matching paths are collected and deduplicated
  4. Path segments that look like IDs (UUIDs, numeric strings) are replaced with {id} placeholders (or {id1}, {id2}, ... when a path has multiple parameters)
  5. The result is written to the templates file

Templates file format

The output is a YAML file with path templates under an x-path-templates key:

x-path-templates:
- ignore:/api/users
- ignore:/api/users/{id}
- ignore:/api/products
- ignore:/api/products/{id}/reviews
- ignore:/static/bundle.js

Every path is prefixed with ignore: by default. This is intentional — it forces you to explicitly opt in to each endpoint.

Automatic parameterization

The discover step detects path segments that vary across requests and replaces them with named parameters:

Observed pathsTemplate
/api/users/42, /api/users/99/api/users/{id}
/api/orders/abc-def-123/api/orders/{id}

UUID-like and numeric segments are detected automatically. More complex patterns require manual editing of the templates file.

Step 2: Curate

Open the templates file in any text editor. For each path:

  • Remove ignore: to include the endpoint in the generated spec
  • Leave ignore: to exclude it
  • Delete the line to exclude it permanently
# Before curation
x-path-templates:
- ignore:/api/users
- ignore:/api/users/{id}
- ignore:/static/bundle.js

# After curation
x-path-templates:
- /api/users
- /api/users/{id}
- ignore:/static/bundle.js

You can also edit parameter names. The default {id} placeholder can be renamed to something more descriptive like {userId}:

- /api/users/{userId}

Automating curation with glob filters

For CI pipelines or large captures, manual curation is impractical. Use --include-patterns and --exclude-patterns during the discover step instead:

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --include-patterns '/api/**' \
  --exclude-patterns '/static/**,*.css,*.js'

Paths matching --include-patterns are emitted without the ignore: prefix (auto-activated). Paths matching --exclude-patterns are dropped entirely. Everything else gets ignore: for manual review.

See filtering endpoints for the full glob syntax.

Step 3: Generate

The generate command re-reads the traffic capture and produces an OpenAPI spec using the curated templates as a guide:

mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com"

What happens internally

  1. The templates file is loaded and the ignore: entries are filtered out
  2. Each template path is compiled into a regex for matching
  3. The traffic capture is streamed again, matching each request against the templates
  4. For each matched request:
    • Path parameters are extracted
    • Query parameters are collected
    • Request body schema is inferred (JSON, form data)
    • Response status code and body schema are recorded
  5. When multiple requests match the same template, their schemas are merged:
    • Different status codes (200, 400, 404) produce separate response entries
    • Request body is taken from the first observation; subsequent same-endpoint observations only contribute response schemas
  6. The final OpenAPI 3.0 document is written as YAML

Customizing output

The generate command accepts several options to tune the output:

mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com" \
  --openapi-title "My API" \
  --openapi-version "2.0.0" \
  --exclude-headers "authorization,cookie" \
  --ignore-images

See the CLI reference for all available options.

Worked example

Starting from a mitmproxy capture of a pet store API:

# Discover all endpoints under the API prefix
mitm2openapi discover \
  -i petstore.flow \
  -o templates.yaml \
  -p "http://petstore:8080" \
  --exclude-patterns '/static/**' \
  --include-patterns '/api/**'

# Templates file now has API paths auto-activated:
#   - /api/v3/pet
#   - /api/v3/pet/{id}
#   - /api/v3/pet/findByStatus
#   - /api/v3/store/inventory
#   - ignore:/static/swagger-ui.css

# Generate the spec
mitm2openapi generate \
  -i petstore.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "http://petstore:8080"

# Result: openapi.yaml with paths, methods, schemas

The generated openapi.yaml is a valid OpenAPI 3.0 document that can be opened in Swagger UI, imported into Postman, or used as a contract for API testing.

Filtering endpoints

The discover command supports glob-based filters to automate endpoint curation. This is useful for CI pipelines or large captures where manual editing is impractical.

Glob syntax

Filters use git-style glob patterns (powered by the globset crate):

PatternMatchesDoes not match
*Single path segmentSegments with /
**Any number of path segments(matches everything)
?Any single character
[abc]Character class
{a,b}Alternation

--exclude-patterns

Paths matching any exclude glob are dropped entirely — they do not appear in the templates file at all.

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg,*.png'

Multiple patterns are comma-separated. A path is excluded if it matches any pattern.

--include-patterns

Paths matching any include glob are emitted without the ignore: prefix — they are auto-activated for the generate step.

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --include-patterns '/api/**,/v2/**'

Combining filters

When both are specified:

  1. Exclude runs first — matching paths are dropped entirely
  2. Include runs second — matching paths among the survivors are auto-activated
  3. Everything else gets the ignore: prefix for manual review
mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --exclude-patterns '/static/**,*.css,*.js' \
  --include-patterns '/api/**'

Result:

  • /static/bundle.js — excluded (dropped)
  • /api/users — included (auto-activated)
  • /dashboard — neither matched (gets ignore: prefix)

Examples

API-only spec

--include-patterns '/api/**' \
--exclude-patterns '/api/internal/**,/api/debug/**'

Strip static assets

--exclude-patterns '/static/**,/assets/**,*.css,*.js,*.svg,*.png,*.jpg,*.gif,*.ico,*.woff,*.woff2'

Multiple API versions

--include-patterns '/v1/**,/v2/**,/v3/**'

Pattern tips

  • Patterns match against the URL path only (after the prefix is stripped), not the full URL
  • Leading / is recommended for clarity but not required
  • Patterns are case-sensitive
  • Use ** sparingly — it matches everything, including deeply nested paths

Resource limits

To prevent denial-of-service when processing untrusted captures, mitm2openapi enforces several configurable and fixed limits.

Configurable limits

These limits can be adjusted via CLI flags:

FlagDefaultPurpose
--max-input-size2 GiBReject files larger than this before reading
--max-payload-size256 MiBCap on individual tnetstring payload allocation
--max-depth256Recursion depth limit for nested tnetstring structures
--max-body-size64 MiBMaximum request/response body considered during schema inference
--allow-symlinksoffBy default, symlinked inputs are rejected

Adjusting limits

Increase --max-input-size if you work with captures larger than 2 GiB:

mitm2openapi discover \
  -i large-capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --max-input-size 8GiB

Size suffixes are supported: KiB, MiB, GiB.

The other limits rarely need tuning. The defaults are designed to handle real-world captures while rejecting pathological inputs.

By default, symlinked input files are rejected to prevent path-traversal attacks on shared CI runners. If you need to process a symlinked file:

mitm2openapi discover \
  -i /path/to/symlinked-capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --allow-symlinks

Fixed per-field limits

These limits are applied unconditionally and cannot be changed via CLI flags:

FieldCapBehaviour when exceeded
Header name8 KiBHeader dropped (other headers still processed)
Header value64 KiBValue truncated to cap
Form fields per request1,000Excess fields ignored
URL schemehttp / https onlyNon-HTTP flows silently skipped
Port number1 -- 65,535Out-of-range port drops the request
HTTP status code100 -- 599Invalid codes treated as no response

UTF-8 validation

Identity fields (scheme, host, path, method, header names) require valid UTF-8. Flows with non-UTF-8 identity bytes are skipped to prevent data aliasing through replacement-character collisions.

Control characters (0x00--0x1F, 0x7F) in paths are stripped automatically.

Streaming and memory

Both mitmproxy flow files and HAR files are processed incrementally. Memory usage stays bounded regardless of input size — there is no need to load the entire capture into memory.

Peak RSS is proportional to the size of the largest single flow in the capture, not the total file size. For typical captures, expect 5--15 MB of memory usage.

When limits fire

When a per-field limit is exceeded (header too large, body too large, form fields over cap), the affected field is skipped or truncated and processing continues with the remaining data.

When a tnetstring parse error occurs, the iterator halts and the rest of the file is not processed — valid flows parsed before the error are still emitted. There is no resync because binary payloads can contain bytes that mimic valid length prefixes.

In both cases a warn-level log message is emitted with details.

Use strict mode to treat these warnings as errors, or processing reports to capture them as structured data.

Strict mode

Pass --strict to either discover or generate to treat warning-level events as hard failures. The process exits with code 2 if the processing report records any counted events.

Currently, the only event counter populated at runtime is parse_error — triggered when flows cannot be deserialized (corrupt tnetstring data, malformed HAR JSON). The cap_fired and rejected counters exist in the report schema but are not yet wired to the reader pipelines; they will be connected in a future release.

In practice, --strict today catches:

  • Parse errors during flow deserialization (tnetstring or HAR)
  • Errors counted by the streaming iterator wrapper in discover mode

Usage

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --strict
mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com" \
  --strict

CI usage pattern

Strict mode is designed for CI gates where silent degradation is unacceptable:

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --strict \
  || { echo "FAIL: corrupt or over-limit flows detected"; exit 1; }

Without --strict

Without the flag, parse errors are logged at warn level and processing continues with exit code 0. Affected flows are skipped, but the output file is still produced. Other warning-level events (cap fires, scheme rejections, etc.) are always logged but do not currently increment the report counters that --strict checks.

Exit codes

CodeMeaning
0Success (no warnings, or --strict not set)
1Fatal error (I/O failure, missing required arguments)
2Strict mode violation (warnings detected with --strict)

Combining with reports

For CI pipelines that need both strict enforcement and structured diagnostics:

mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com" \
  --strict \
  --report report.json

The report is written even when --strict causes a non-zero exit, capturing the full details of what went wrong.

Processing reports

Pass --report <PATH> to either discover or generate to write a JSON processing summary. This is useful for CI pipelines that need structured data instead of log scraping.

Usage

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --report report.json

Report schema

{
  "report_version": 1,
  "tool_version": "0.5.1",
  "input": {
    "path": "capture.flow",
    "format": "Auto",
    "size_bytes": 102400
  },
  "result": {
    "flows_read": 150,
    "flows_emitted": 148,
    "paths_in_spec": 12
  },
  "events": {
    "parse_error": {
      "TNetString parse error at byte 98304: unexpected end of input": 1
    }
  }
}

Fields

FieldTypeDescription
report_versionintegerSchema version (currently 1)
tool_versionstringmitm2openapi version that produced the report
input.pathstringInput file path
input.formatstringDetected or specified format (Auto, Mitmproxy, Har)
input.size_bytesintegerInput file size in bytes
result.flows_readintegerTotal flows/entries parsed from input
result.flows_emittedintegerFlows that passed all filters and were processed
result.paths_in_specintegerUnique paths in the output (for generate)
eventsobjectMap of event categories to message counts

Event categories

CategoryMeaningStatus
parse_errorCorrupt data encountered (tnetstring errors, malformed HAR entries)Populated
cap_firedA resource limit was triggered (body too large, depth exceeded)Reserved — not yet populated at runtime
rejectedA flow was skipped (invalid UTF-8, unsupported scheme, bad port/status)Reserved — not yet populated at runtime

The cap_fired and rejected categories are present in the report schema and will be connected to the reader pipelines in a future release. Currently, only parse_error events are counted.

CI integration

Parse the report in CI to make decisions based on processing quality:

mitm2openapi generate \
  -i capture.flow \
  -t templates.yaml \
  -o openapi.yaml \
  -p "https://api.example.com" \
  --report report.json

# Check if any events occurred
if jq -e '.events | length > 0' report.json > /dev/null 2>&1; then
  echo "Warning: processing had events"
  jq '.events' report.json
fi

Report with strict mode

The report is written even when --strict causes a non-zero exit code. This lets you capture full diagnostics while still failing the CI job:

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --strict \
  --report report.json \
  || { jq '.' report.json; exit 1; }

CLI reference

Warning

This reference was last synced with mitm2openapi --help at version 0.5.1. If you notice a flag missing from your local --help output, the tool may be ahead of these docs. Open an issue to prompt an update.

mitm2openapi discover

Scan captured traffic and produce a templates file listing all observed endpoints.

mitm2openapi discover [OPTIONS] -i <INPUT> -o <OUTPUT> -p <PREFIX>

Required arguments

OptionDescription
-i, --input <PATH>Input file (flow dump or HAR)
-o, --output <PATH>Output YAML templates file
-p, --prefix <URL>API prefix URL to filter requests

Optional arguments

OptionDefaultDescription
--format <FORMAT>autoInput format: auto, har, mitmproxy
--exclude-patterns <GLOBS>Comma-separated globs; matching paths are dropped entirely
--include-patterns <GLOBS>Comma-separated globs; matching paths are auto-activated
--max-input-size <BYTES>2GiBMaximum input file size. Accepts KiB, MiB, GiB suffixes
--allow-symlinksoffAllow symlinked input files
--strictoffTreat warnings as errors (exit code 2)
--report <PATH>Write structured JSON processing report

mitm2openapi generate

Generate an OpenAPI 3.0 spec from captured traffic using a curated templates file.

mitm2openapi generate [OPTIONS] -i <INPUT> -t <TEMPLATES> -o <OUTPUT> -p <PREFIX>

Required arguments

OptionDescription
-i, --input <PATH>Input file (flow dump or HAR)
-t, --templates <PATH>Templates YAML file (from discover)
-o, --output <PATH>Output OpenAPI YAML file
-p, --prefix <URL>API prefix URL

Optional arguments

OptionDefaultDescription
--format <FORMAT>autoInput format: auto, har, mitmproxy
--openapi-title <TITLE>Custom title for the spec
--openapi-version <VER>1.0.0Custom spec version
--exclude-headers <LIST>Comma-separated headers to exclude from spec
--exclude-cookies <LIST>Comma-separated cookies to exclude from spec
--include-headersoffInclude request headers in the spec
--ignore-imagesoffIgnore image content types
--suppress-paramsoffSuppress parameter suggestions
--tags-overrides <JSON>JSON string for tag overrides
--max-input-size <BYTES>2GiBMaximum input file size
--max-payload-size <BYTES>256MiBMaximum tnetstring payload size
--max-depth <N>256Maximum tnetstring nesting depth
--max-body-size <BYTES>64MiBMaximum request/response body size
--allow-symlinksoffAllow symlinked input files
--strictoffTreat warnings as errors (exit code 2)
--report <PATH>Write structured JSON processing report

Common flag details

--format

By default, the input format is auto-detected from a combination of file extension and content sniffing:

  • .flow extension or content starting with a tnetstring length prefix → mitmproxy format
  • .har extension or content starting with { → HAR format

Use --format mitmproxy or --format har to override auto-detection.

--prefix

The prefix URL filters which requests are processed. Only requests whose URL starts with the prefix are included. The prefix is stripped from paths in the generated spec.

Example: with --prefix https://api.example.com, a request to https://api.example.com/users/42 produces path /users/42 in the spec.

--strict

See strict mode for details on exit codes and CI usage.

--report

See processing reports for the JSON schema and CI integration examples.

Exit codes

CodeMeaning
0Success
1Fatal error (I/O failure, missing arguments, invalid input)
2Strict mode violation (warnings with --strict enabled)

Environment variables

VariableDescription
RUST_LOGControls log verbosity. Default: warn. Set to info or debug for more output.
RUST_LOG=info mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"

mitmproxy flow dumps

mitm2openapi reads mitmproxy's native binary flow format. This is the recommended input format — it captures the richest data and is produced directly by mitmdump and mitmweb.

Supported versions

Flow format versionmitmproxy versionStatus
v19mitmproxy 8.xSupported
v20mitmproxy 9.xSupported
v21mitmproxy 10.xSupported

The flow format is auto-detected from file content. No version flag is needed.

How flow files work

Flow files use the tnetstring serialization format. Each flow is a sequence of key-value pairs representing a complete HTTP request-response cycle.

A typical flow contains:

  • Request: method, URL (scheme, host, port, path), headers, body
  • Response: status code, headers, body
  • Metadata: timestamps, flow ID, client/server addresses

mitm2openapi extracts the request and response data relevant to OpenAPI spec generation and discards metadata.

Capturing flow files

# Record all traffic through the proxy
mitmdump -w capture.flow

# Record only traffic to a specific host
mitmdump -w capture.flow --set flow_detail=0 \
  --set save_stream_filter='~d api.example.com'

See capturing traffic for full setup instructions.

Directory input

If you pass a directory path to -i, mitm2openapi reads all .flow files in that directory (non-recursive). This is useful when you have traffic split across multiple capture sessions.

Known limitations

  • No WebSocket frames — WebSocket upgrade requests are captured, but frame-level data is not used for spec generation
  • No gRPC — binary protocol buffers inside HTTP/2 frames are not decoded
  • Corrupt files — when the tnetstring parser hits corruption, it stops and reports the byte offset. No resync is attempted because binary payloads can contain bytes that mimic valid tnetstring length prefixes. See diagnostics for details.
  • Large payloads — individual tnetstring payloads are capped at 256 MiB by default (adjustable via --max-payload-size)

HAR files

mitm2openapi reads HAR (HTTP Archive) files — the standard format for exporting browser network traffic. HAR version 1.2 is supported.

Producing HAR files

Browser DevTools

All modern browsers export HAR from their Network tab:

  • Chrome/Chromium: DevTools → Network → right-click → "Save all as HAR with content"
  • Firefox: DevTools → Network → gear icon → "Save All As HAR"
  • Safari: Web Inspector → Network → Export button

HTTP proxies

Several proxy tools export HAR:

Programmatic generation

Libraries like puppeteer and playwright can produce HAR files from automated browser sessions:

// Playwright example
const context = await browser.newContext({
  recordHar: { path: 'capture.har' }
});
// ... run your test
await context.close(); // HAR is written on close

Usage

mitm2openapi discover \
  -i capture.har \
  -o templates.yaml \
  -p "https://api.example.com"

Format is auto-detected. Use --format har to force HAR parsing if auto-detection fails.

HAR vs mitmproxy flows

Aspectmitmproxy flowHAR
Sourcemitmproxy proxyBrowser DevTools, HTTP proxies
FormatBinary (tnetstring)JSON
Response bodiesAlways presentSometimes base64-encoded
HTTPSDecrypted by proxyDecrypted by browser
File sizeCompact binaryLarger (JSON overhead)
StreamingNativeIncremental JSON parsing

Both formats produce equivalent OpenAPI specs. Choose based on your capture workflow:

  • mitmproxy flows for server-side proxying, CI pipelines, and automated captures
  • HAR files for browser-based testing, manual exploration, and when you already have DevTools open

Incremental parsing

HAR files are parsed incrementally — the entire JSON is not loaded into memory at once. This means memory usage stays bounded even for large HAR exports (hundreds of megabytes).

Known limitations

  • Base64-encoded bodies — some HAR exporters base64-encode response bodies. Decode failures are logged as warnings and the body is skipped (not silently dropped).
  • Compressed content — if the HAR exporter did not decompress response bodies, mitm2openapi sees the compressed bytes. Most browser DevTools decompress automatically.
  • Timing data — HAR timing information (DNS, connect, TLS) is ignored; only request and response data is used for spec generation.

Performance & Benchmarks

Results are regenerated weekly by the benchmark workflow. See the workflow for the reproducible methodology.

Generated by the benchmark workflow.

Benchmark results

Run: 2026-04-22 22:31 UTC, commit 22ef2faa, runner: Linux 6.17.0-1011-azure

Fixture: 89 MB, 40k requests across 8 endpoint shapes (bench-fixtures-v1).

Timing

CommandMean sMin sMax sRelative
Python mitmproxy2swagger44.757 ± 0.21944.38444.96516.80 ± 0.26
Rust mitm2openapi2.663 ± 0.0392.6182.7121.00

Peak RSS

ToolRSS
Python mitmproxy2swagger46104 KB
Rust mitm2openapi6168 KB

Security model

mitm2openapi processes untrusted binary input (traffic captures from unknown sources). The security model is designed to prevent denial-of-service, data corruption, and information leakage when handling adversarial input.

Threat model

The primary threat is a malicious capture file — a .flow or .har file crafted to exploit the parser. Scenarios include:

  • CI pipelines processing captures from untrusted contributors
  • Shared analysis servers where multiple users submit captures
  • Automated pipelines where the capture source is not fully controlled

Input validation layers

File-level checks

Before reading any content:

  1. File type — only regular files are accepted. Symlinks, FIFOs, device files, and directories are rejected unless --allow-symlinks is explicitly set.
  2. File size — files exceeding --max-input-size (default 2 GiB) are rejected before any bytes are read.
  3. TOCTOU caveat — file metadata is checked via the path before reading to reject symlinks, non-regular files, and oversized inputs. There is a small TOCTOU window between the metadata check and the file open; mitigation via fd-based recheck after open is a future enhancement.

Parser-level caps

During parsing:

CapDefaultPurpose
Payload size256 MiBPrevents OOM from oversized tnetstring values
Nesting depth256Prevents stack overflow from deeply nested structures
JSON depth64Prevents stack overflow in schema inference
Body size64 MiBLimits memory for individual request/response bodies

These caps trigger warn-level events and skip the affected data. Use --strict to treat them as hard errors.

Field-level validation

For every flow:

  • Scheme whitelist — only http and https are accepted. Other schemes (e.g., javascript:, data:) are silently skipped.
  • UTF-8 strictness — identity fields (method, scheme, host, path, header names) must be valid UTF-8. Invalid bytes cause the flow to be skipped, preventing data aliasing through replacement-character collisions.
  • Port range — port numbers must be 1--65,535. Out-of-range values drop the request.
  • Status code range — HTTP status codes must be 100--599.
  • Control character stripping0x00--0x1F and 0x7F in URL paths are removed.
  • Header caps — header names over 8 KiB are dropped; values over 64 KiB are truncated.
  • Form field count — at most 1,000 form fields per request are processed.

Output safety

  • Atomic writes — output files are written via a temporary file and renamed. If the write fails (disk full, permission denied), the target path is left untouched.
  • No resync on corruption — when the tnetstring parser encounters corrupt data, it halts immediately. It does not scan forward looking for the next valid frame, because binary payloads can contain bytes that look like valid length prefixes.

Streaming architecture

Both mitmproxy and HAR inputs are processed incrementally. At no point is the entire capture loaded into memory. This bounds peak RSS to the size of the largest single flow, regardless of total file size.

Glob pattern safety

The --exclude-patterns and --include-patterns flags use the globset crate, which compiles patterns into a DFA. This eliminates exponential backtracking that was possible with the original recursive glob matcher.

Recommendations

For processing untrusted captures:

  1. Do not use --allow-symlinks unless you control the filesystem
  2. Keep --max-input-size at the default (2 GiB) or lower
  3. Run with --strict to fail fast on any anomaly
  4. Use --report to capture processing diagnostics for audit trails
  5. Run in a sandboxed environment (container, VM) when processing captures from unknown sources

Diagnostics

mitm2openapi uses structured logging to report issues during processing. This chapter covers how to interpret warnings, errors, and the structured report output.

Log levels

Control verbosity with the RUST_LOG environment variable:

# Default: warnings only
mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"

# More detail
RUST_LOG=info mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"

# Full debug output
RUST_LOG=debug mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"

Common warnings

Parse errors (tnetstring)

WARN TNetString parse error at byte 98304: unexpected end of input (148 flows parsed successfully)

This means the mitmproxy flow file contains corrupt data starting at byte 98,304. The parser halts immediately and the remaining bytes in the file are not processed. The 148 flows parsed before the corruption are still emitted.

No resync is attempted. Binary payloads can contain bytes that mimic valid tnetstring length prefixes, so scanning forward would produce phantom flows with fabricated data.

What to do:

  • If the file was truncated during transfer, re-capture or re-download
  • The 148 successfully parsed flows are still usable
  • Use --report to capture the exact byte offset for debugging

Cap-fired events

WARN body size 68157440 exceeds cap 67108864, truncating
WARN header name exceeds 8192 bytes, dropping
WARN form field count 1247 exceeds cap 1000, ignoring excess

These indicate that a specific field in a flow exceeded the built-in or configured limit. The affected field is truncated or dropped, but processing continues.

What to do:

  • Usually safe to ignore — the caps exist to prevent abuse, not normal traffic
  • If you need the full data, increase the relevant --max-* flag
  • Use --strict to fail on these if you need guaranteed completeness

Flow rejection events

WARN skipping flow: scheme "javascript" not in whitelist [http, https]
WARN skipping flow: invalid UTF-8 in host field
WARN skipping flow: port 0 out of valid range 1-65535

These mean an entire flow was skipped because it failed validation.

What to do:

  • Non-HTTP flows (WebSocket upgrades, CONNECT tunnels) are expected to be skipped
  • UTF-8 errors suggest the capture contains binary protocol data, not HTTP traffic
  • Invalid port/status usually indicates corrupt flow data

Structured reports

For machine-readable diagnostics, use --report:

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --report report.json

See processing reports for the full JSON schema.

Event categories in reports

CategoryExamples
parse_errorTnetstring corruption, HAR JSON syntax errors
cap_firedBody too large, depth exceeded, form field count exceeded
rejectedInvalid scheme, non-UTF-8 identity fields, bad port/status

Using reports in CI

# Fail if any parse errors occurred
if jq -e '.events.parse_error | length > 0' report.json > /dev/null 2>&1; then
  echo "Parse errors detected"
  exit 1
fi

# Check flows-read vs flows-emitted ratio
RATIO=$(jq '.result.flows_emitted / .result.flows_read' report.json)
if (( $(echo "$RATIO < 0.9" | bc -l) )); then
  echo "Warning: more than 10% of flows were dropped"
fi

Strict mode interaction

With --strict, any warning-level event causes exit code 2. This converts the "informational" diagnostics above into hard failures:

mitm2openapi discover \
  -i capture.flow \
  -o templates.yaml \
  -p "https://api.example.com" \
  --strict \
  --report report.json

# Exit code 2 if ANY warning was emitted
# report.json still written for post-mortem

See strict mode for details.

Changelog

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

0.5.2 - 2026-04-24

Fixed

  • (har) apply header size caps consistent with mitmproxy reader
  • (reader) reject symlinked directory inputs and entries

Other

  • (security) cover symlink directory and entry rejection
  • (readme) trim content migrated to book, add docs badge
  • (book) add mdBook scaffold with book.toml and all chapter content
  • adjust CHANGELOG/CONTRIBUTING headings for mdBook inclusion
  • regenerate demo.gif skip ci All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

0.5.1 - 2026-04-22

Other

  • (bench) refresh benchmark results
  • (bench) drop small fixture tier
  • (readme) add benchmarks section linking to automated results
  • (bench) seed benchmarks.md with methodology and placeholders
  • regenerate demo.gif skip ci

0.5.0 - 2026-04-22

Added

  • (cli) add --strict flag to escalate warnings to errors

Other

  • (readme) document --strict flag and benchmark CI
  • (strict) verify strict mode exit codes

0.4.1 - 2026-04-22

Fixed

  • (builder) use .get() in dedup_schema_variants to satisfy indexing_slicing lint
  • (reader) warn on skipped directory entries and malformed overrides
  • (schema) union array element schemas and tighten dict heuristic

Other

  • (lint) deny clippy::indexing_slicing at crate level
  • extract is_numeric_string and is_uuid to shared module
  • (output) lazy-init regex via LazyLock
  • (error) replace guarded unwrap sites with pattern matching

0.4.0 - 2026-04-22

Added

  • feat!(builder): merge response schemas per status code
  • feat!(cli): remove unused --param-regex flag

Other

  • (readme) remove --param-regex mention from CLI reference
  • (cli) verify --param-regex is rejected as unknown argument
  • (builder) cover multi-status response aggregation
  • refactor!(error): mark Error enum as non_exhaustive
  • regenerate demo.gif skip ci

0.3.0 - 2026-04-22

Added

  • (report) track cap firings and parse errors in processing report
  • (cli) add --report flag for structured processing summary
  • (tnetstring) emit byte offset and error kind on parse halt

Other

  • (readme) document --report flag and parse halt diagnostics
  • (report) verify report file schema and contents
  • (tnetstring) verify parse halt diagnostics and no-resync on binary payload

0.2.6 - 2026-04-22

Fixed

  • (test) gate Unix-specific path-failure test behind cfg(unix)
  • (output) write YAML via tempfile and atomic rename

Other

  • (output) verify atomic write preserves target on failure
  • (deps) move tempfile to runtime dependencies

0.2.5 - 2026-04-22

Fixed

  • (builder) skip requests with unknown HTTP methods instead of aliasing to GET

Other

  • (builder) verify unknown method is skipped and standard methods preserved

0.2.4 - 2026-04-22

Fixed

  • (params) preserve multi-byte UTF-8 in urlencoding_decode

Other

  • (params) add UTF-8 roundtrip and overlong rejection cases
  • regenerate demo.gif skip ci

0.2.3 - 2026-04-22

Fixed

  • (builder) cap form-field count per request at 1000
  • (har) validate schemes and status codes, log base64 failures, cap bodies
  • (reader) validate port/status ranges, enforce strict UTF-8, and cap field sizes

Other

  • (readme) document per-field size and validation limits

0.2.2 - 2026-04-22

Added

  • (har) add streaming HAR entry iterator

Other

  • (readme) mention HAR streaming in resource limits and supported formats
  • (har) verify streaming does not materialize all entries
  • (reader) switch HAR dispatch to streaming iterator
  • regenerate demo.gif skip ci

0.2.1 - 2026-04-22

Added

  • (reader) add stream_mitmproxy_file and stream_mitmproxy_dir
  • (tnetstring) add streaming iterator TNetStringIter

Other

  • (readme) document resource-limit flags and streaming behavior
  • (main) switch discover and generate to streaming pipeline
  • (path_matching) cache compiled regexes in CompiledTemplates
  • (builder) add discover_paths_streaming variant
  • regenerate demo.gif skip ci

0.2.0 - 2026-04-22

Added

  • (path_matching) validate path parameter identifiers
  • (cli) expose --max-input-size, --max-payload-size, --max-depth, --max-body-size, --allow-symlinks
  • (reader) reject symlinks, non-regular files, and oversized inputs
  • (schema) enforce 64-level JSON recursion depth limit
  • (tnetstring) enforce 256-level recursion depth limit
  • (tnetstring) cap payload size at 256 MiB
  • (error) add typed variants for parse and input limits

Fixed

  • (test) gate symlink and FIFO tests behind cfg(unix)

Other

  • update Cargo.lock for globset dependency
  • (security) cover symlink, FIFO, and oversize input rejection
  • (har) bound format-detection read to 4 KiB
  • (builder) replace custom glob matcher with globset

0.1.2 - 2026-04-22

Other

  • add tests/fixtures/poc placeholder directory (P0.2)
  • regenerate demo.gif skip ci

0.1.1 - 2026-04-20

Other

  • (readme) add Why? section explaining the Python-vs-Rust trade-off
  • (deps) bump assert_cmd from 2.2.0 to 2.2.1 in the all-cargo group (#7)
  • regenerate demo.gif skip ci

0.1.0 - TBD

Initial release.

Contributing

This document covers how to run the three test tracks locally.

Prerequisites

ToolRequired forInstall
Rust toolchainBuild + unit testsrustup.rs
Docker + ComposeAll integration testsdocs.docker.com
oasdiffLevel 1 diff validationgo install github.com/tufin/oasdiff@latest or brew install oasdiff
Node.js + npmLevel 2nodejs.org
PlaywrightLevel 2npx playwright install --with-deps chromium
VHSDemo GIFbrew install vhs or charm apt repo
ffmpeg, gifski, gifsicleDemo GIF optimizationSystem package manager

Build

cargo build --release
# Binary: target/release/mitm2openapi

Unit Tests

cargo test

Level 1 — Petstore Golden Test (~2 min)

Full pipeline (compose up, seed, discover, generate, diff, teardown):

tests/integration/level1/run.sh

Strict mode (--fail-on WARN instead of BREAKING):

tests/integration/level1/run.sh --strict

Manual step-by-step:

cd tests/integration/level1
docker compose up -d
# Wait for petstore healthcheck...
../../ci/petstore/seed.sh
# Run mitm2openapi discover/generate against the proxy
# ...
docker compose down -v

Gotcha: The seed script sends requests through the mitmproxy proxy to petstore:8080 (Docker service name), not localhost. This is intentional — traffic must flow through the proxy to be captured.

Level 2 — crAPI + Playwright (~8 min)

cd tests/integration/level2

# Start crAPI stack (identity + community + workshop + web + mongo + postgres + mailhog + mitmproxy)
make up
# No seed needed — crAPI auto-seeds on first boot

# Run Playwright scenarios
npm install
npx playwright install --with-deps chromium
npx playwright test

# Cleanup
make down

Port conflict: Level 1 and Level 2 both use port 8080 (for different services). Do not run both stacks simultaneously.

Demo GIF (Phase 2 terminal recording)

cd ci/demo
make phase2        # VHS recording
make gif           # gifski + gifsicle optimization
make clean         # remove outputs

Phase 2 uses a real capture, not a committed fixture. The GHA workflow copies tests/integration/level2/out/crapi.flow (produced by Phase 1) into ci/demo/out/demo.flow before running the tape. Locally, do the same: run Phase 1 first, then cp tests/integration/level2/out/crapi.flow ci/demo/out/demo.flow.

Filtering discover output

Captures from real apps include static assets (/static/css/main.*.css, /images/*.svg, etc.) which bloat the generated OpenAPI spec. Two flags on discover handle this:

mitm2openapi discover \
  -i capture.flow -o templates.yaml -p http://api.example.com \
  --exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg,*.png,*.jpg' \
  --include-patterns '/api/**,/v2/**'
  • --exclude-patterns: paths matching any glob are dropped entirely (not even emitted as ignore:).
  • --include-patterns: paths matching any glob are emitted without the ignore: prefix (i.e. auto-activated for generate). Everything else still gets ignore: for manual review.

Globs: * matches a single path segment, ** matches any subtree.

Together they let generate run with no intermediate sed or editor step — useful for automated pipelines like the demo GIF.

Ports Reference

StackServicePort
Level 1Petstore8080
Level 1mitmproxy8081
Level 2crAPI web8888
Level 2mailhog8025
Level 2mitmproxy8080
DemoSwagger UI8088

Cleanup

All compose stacks use docker compose down -v to remove containers and volumes.

CI Workflows

WorkflowTriggerNotes
integration-level1.ymlEvery PRNaive (required) + strict (informational)
integration-level2.ymlNightly + manual dispatchFull crAPI + Playwright
demo-gif.ymlPush to main (path-filtered) + manual dispatchTerminal recording