Introduction
mitm2openapi converts mitmproxy flow dumps and HAR files into OpenAPI 3.0 specifications. It ships as a single static binary — no Python, no virtual environment, no runtime dependencies.
It is a Rust rewrite of mitmproxy2swagger by @alufers, who pioneered the "capture traffic, extract API spec" workflow. Credit to the original project for the idea and reference implementation.
Why?
The Python original works well but requires Python, pip, and mitmproxy installed in the
environment. For CI pipelines, slim Docker images, security audits, and one-off usage, that
dependency chain is friction.
mitm2openapi ships as a single ~5 MB static binary. Drop it into any environment and run.
Same OpenAPI 3.0 output, plus first-class HAR support and glob-based filters for fully
unattended pipelines.
Features
- Fast — pure Rust, ~17× faster than the Python original (benchmarks)
- Single static binary — no Python, no venv, no pip, no runtime dependencies
- Two-format support — mitmproxy flow dumps (v19/v20/v21) and HAR 1.2
- Two-step workflow —
discoverfinds endpoints, you curate,generateemits OpenAPI 3.0 - Glob filters —
--exclude-patternsand--include-patternsfor automated pipelines - Error recovery — skips corrupt flows, continues processing
- Auto-detection — heuristic format detection from file content
- Resource limits — configurable caps prevent denial-of-service on untrusted input
- Strict mode — treat warnings as errors for CI gates
- Structured reports —
--reportoutputs machine-readable JSON processing summaries - Battle-tested — integration tests against Swagger Petstore and OWASP crAPI
- Cross-platform — Linux, macOS, Windows pre-built binaries
How it works
The tool uses a two-step workflow:
- Discover — scan captured traffic and list all observed API endpoints
- Curate — review the list and select which endpoints to include
- Generate — produce a clean OpenAPI 3.0 spec from the selected endpoints
This separates endpoint selection from spec generation, giving you full control over what ends up in the final spec.
Next steps
Installation
From binary releases
Download a pre-built binary for your platform from GitHub Releases.
Binaries are available for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64).
# Example: Linux x86_64 — replace <VERSION> with the release tag (e.g. v0.5.1)
curl -L "https://github.com/Arkptz/mitm2openapi/releases/download/<VERSION>/mitm2openapi-<VERSION>-x86_64-unknown-linux-gnu.tar.gz" \
| tar xz
sudo mv mitm2openapi /usr/local/bin/
From source (via Cargo)
If you have a Rust toolchain installed:
cargo install --git https://github.com/Arkptz/mitm2openapi
Or from crates.io:
cargo install mitm2openapi
Verify installation
mitm2openapi --version
Shell completions
mitm2openapi uses clap for argument parsing. Shell completions
are not yet bundled, but you can generate them for most shells via clap_complete if building
from source.
Quick start
This walkthrough takes you from a traffic capture to a complete OpenAPI spec in under a minute.
Prerequisites
mitm2openapiinstalled (see installation)- A captured traffic file — either a mitmproxy
.flowdump or a.harexport from browser DevTools
If you do not have a capture yet, see capturing traffic for setup instructions.
Step 1: Discover endpoints
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com"
This scans every request in capture.flow that matches the prefix https://api.example.com
and writes a templates file listing all observed URL paths.
Step 2: Curate the templates
Open templates.yaml. Each path is prefixed with ignore: by default:
x-path-templates:
- ignore:/api/users
- ignore:/api/users/{id}
- ignore:/api/products
- ignore:/static/bundle.js
Remove the ignore: prefix from paths you want in the final spec:
x-path-templates:
- /api/users
- /api/users/{id}
- /api/products
- ignore:/static/bundle.js
Paths still prefixed with ignore: are excluded from the generated spec.
Step 3: Generate the OpenAPI spec
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com"
The resulting openapi.yaml contains a valid OpenAPI 3.0 spec with paths, methods,
parameters, request bodies, and response schemas inferred from the captured traffic.
Skip the manual edit
If you already know which paths matter, use glob filters to automate curation:
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg' \
--include-patterns '/api/**,/v2/**'
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com"
Paths matching --include-patterns are auto-activated (no ignore: prefix). Paths matching
--exclude-patterns are dropped entirely. Everything else still gets ignore: for manual
review.
See filtering endpoints for the full glob syntax reference.
HAR files
The same workflow works with HAR files — just point -i at a .har file. The format is
auto-detected:
mitm2openapi discover \
-i capture.har \
-o templates.yaml \
-p "https://api.example.com"
See HAR files for details on exporting HARs from browser DevTools.
Capturing traffic
Before you can generate an OpenAPI spec, you need a captured traffic file. This chapter covers the most common ways to capture HTTP traffic.
Option 1: mitmproxy (recommended)
mitmproxy is a free, open-source HTTPS proxy. It captures traffic
in its own binary flow format that mitm2openapi reads natively.
Install mitmproxy
# macOS
brew install mitmproxy
# Linux (pip)
pip install mitmproxy
# Or download from https://mitmproxy.org/
See the mitmproxy installation docs for platform-specific instructions.
Capture with mitmdump
mitmdump is the non-interactive version of mitmproxy, ideal for scripted captures:
# Start the proxy and write all traffic to a flow file
mitmdump -w capture.flow
# In another terminal, route your HTTP client through the proxy:
curl --proxy http://localhost:8080 https://api.example.com/users
The default proxy port is 8080. Use -p to change it:
mitmdump -w capture.flow -p 9090
Capture with mitmweb
mitmweb provides a browser-based UI for inspecting traffic in real time:
mitmweb -w capture.flow
# Open http://localhost:8081 in your browser to inspect traffic
HTTPS traffic
For HTTPS, you need to install the mitmproxy CA certificate on the client machine.
After starting mitmproxy, navigate to http://mitm.it from the proxied client to
download and install the certificate.
See the mitmproxy certificate docs for detailed instructions.
Tips
- Use
mitmdump --set flow_detail=0for minimal console output during long captures - Combine with
--set save_stream_filterto capture only specific hosts - The flow format is versioned (v19/v20/v21) —
mitm2openapisupports all three
Option 2: Browser DevTools (HAR export)
All modern browsers can export captured network traffic as HAR (HTTP Archive) files.
Chrome / Chromium
- Open DevTools (
F12orCtrl+Shift+I) - Switch to the Network tab
- Ensure recording is active (red circle icon)
- Perform the actions you want to capture
- Right-click in the request list → Save all as HAR with content
Firefox
- Open DevTools (
F12) - Switch to the Network tab
- Perform the actions you want to capture
- Click the gear icon → Save All As HAR
Safari
- Enable the Develop menu in Preferences → Advanced
- Open Web Inspector (
Cmd+Option+I) - Switch to the Network tab
- Perform the actions
- Click Export in the toolbar
HAR files from browser DevTools contain the full request and response bodies. Sensitive data (cookies, tokens, passwords) will be present in the export. Sanitize before sharing.
Option 3: Other HTTP proxies
Any tool that produces HAR 1.2 output works with mitm2openapi:
- Charles Proxy — export sessions as HAR via File → Export
- Fiddler — File → Export Sessions → HTTPArchive
- Proxyman — export as HAR from the session menu
What to capture
For the best OpenAPI spec, capture diverse traffic:
- Multiple endpoints — the more paths covered, the more complete the spec
- Different HTTP methods — GET, POST, PUT, DELETE on the same resource
- Various response codes — 200, 400, 404, 500 responses produce richer schemas
- Query parameters — include requests with different query strings
- Request bodies — POST/PUT with different payloads improve body schema inference
Next steps
Once you have a capture file, proceed to the quick start or learn about the full discover → curate → generate pipeline.
Discover, curate, generate
mitm2openapi uses a three-step pipeline to convert captured HTTP traffic into an OpenAPI
specification. This chapter explains each step in detail.
Overview
graph LR
A[Traffic capture] --> B[discover]
B --> C[Templates file]
C --> D[Curate]
D --> E[generate]
E --> F[OpenAPI 3.0 spec]
The pipeline separates endpoint discovery from spec generation, giving you an explicit curation step where you choose which endpoints appear in the final spec.
Step 1: Discover
The discover command scans a traffic capture and extracts all unique URL paths that match
a given prefix.
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com"
What happens internally
- The input file is read incrementally (streaming — memory usage stays bounded)
- Each request's URL is checked against the
--prefixfilter - Matching paths are collected and deduplicated
- Path segments that look like IDs (UUIDs, numeric strings) are replaced with
{id}placeholders (or{id1},{id2}, ... when a path has multiple parameters) - The result is written to the templates file
Templates file format
The output is a YAML file with path templates under an x-path-templates key:
x-path-templates:
- ignore:/api/users
- ignore:/api/users/{id}
- ignore:/api/products
- ignore:/api/products/{id}/reviews
- ignore:/static/bundle.js
Every path is prefixed with ignore: by default. This is intentional — it forces you to
explicitly opt in to each endpoint.
Automatic parameterization
The discover step detects path segments that vary across requests and replaces them with named parameters:
| Observed paths | Template |
|---|---|
/api/users/42, /api/users/99 | /api/users/{id} |
/api/orders/abc-def-123 | /api/orders/{id} |
UUID-like and numeric segments are detected automatically. More complex patterns require manual editing of the templates file.
Step 2: Curate
Open the templates file in any text editor. For each path:
- Remove
ignore:to include the endpoint in the generated spec - Leave
ignore:to exclude it - Delete the line to exclude it permanently
# Before curation
x-path-templates:
- ignore:/api/users
- ignore:/api/users/{id}
- ignore:/static/bundle.js
# After curation
x-path-templates:
- /api/users
- /api/users/{id}
- ignore:/static/bundle.js
You can also edit parameter names. The default {id} placeholder can be renamed to
something more descriptive like {userId}:
- /api/users/{userId}
Automating curation with glob filters
For CI pipelines or large captures, manual curation is impractical. Use --include-patterns
and --exclude-patterns during the discover step instead:
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--include-patterns '/api/**' \
--exclude-patterns '/static/**,*.css,*.js'
Paths matching --include-patterns are emitted without the ignore: prefix (auto-activated).
Paths matching --exclude-patterns are dropped entirely. Everything else gets ignore: for
manual review.
See filtering endpoints for the full glob syntax.
Step 3: Generate
The generate command re-reads the traffic capture and produces an OpenAPI spec using the
curated templates as a guide:
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com"
What happens internally
- The templates file is loaded and the
ignore:entries are filtered out - Each template path is compiled into a regex for matching
- The traffic capture is streamed again, matching each request against the templates
- For each matched request:
- Path parameters are extracted
- Query parameters are collected
- Request body schema is inferred (JSON, form data)
- Response status code and body schema are recorded
- When multiple requests match the same template, their schemas are merged:
- Different status codes (200, 400, 404) produce separate response entries
- Request body is taken from the first observation; subsequent same-endpoint observations only contribute response schemas
- The final OpenAPI 3.0 document is written as YAML
Customizing output
The generate command accepts several options to tune the output:
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com" \
--openapi-title "My API" \
--openapi-version "2.0.0" \
--exclude-headers "authorization,cookie" \
--ignore-images
See the CLI reference for all available options.
Worked example
Starting from a mitmproxy capture of a pet store API:
# Discover all endpoints under the API prefix
mitm2openapi discover \
-i petstore.flow \
-o templates.yaml \
-p "http://petstore:8080" \
--exclude-patterns '/static/**' \
--include-patterns '/api/**'
# Templates file now has API paths auto-activated:
# - /api/v3/pet
# - /api/v3/pet/{id}
# - /api/v3/pet/findByStatus
# - /api/v3/store/inventory
# - ignore:/static/swagger-ui.css
# Generate the spec
mitm2openapi generate \
-i petstore.flow \
-t templates.yaml \
-o openapi.yaml \
-p "http://petstore:8080"
# Result: openapi.yaml with paths, methods, schemas
The generated openapi.yaml is a valid OpenAPI 3.0 document that can be opened in
Swagger UI, imported into Postman, or used
as a contract for API testing.
Filtering endpoints
The discover command supports glob-based filters to automate endpoint curation.
This is useful for CI pipelines or large captures where manual editing is impractical.
Glob syntax
Filters use git-style glob patterns (powered by the globset crate):
| Pattern | Matches | Does not match |
|---|---|---|
* | Single path segment | Segments with / |
** | Any number of path segments | (matches everything) |
? | Any single character | |
[abc] | Character class | |
{a,b} | Alternation |
--exclude-patterns
Paths matching any exclude glob are dropped entirely — they do not appear in the templates file at all.
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg,*.png'
Multiple patterns are comma-separated. A path is excluded if it matches any pattern.
--include-patterns
Paths matching any include glob are emitted without the ignore: prefix — they are
auto-activated for the generate step.
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--include-patterns '/api/**,/v2/**'
Combining filters
When both are specified:
- Exclude runs first — matching paths are dropped entirely
- Include runs second — matching paths among the survivors are auto-activated
- Everything else gets the
ignore:prefix for manual review
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--exclude-patterns '/static/**,*.css,*.js' \
--include-patterns '/api/**'
Result:
/static/bundle.js— excluded (dropped)/api/users— included (auto-activated)/dashboard— neither matched (getsignore:prefix)
Examples
API-only spec
--include-patterns '/api/**' \
--exclude-patterns '/api/internal/**,/api/debug/**'
Strip static assets
--exclude-patterns '/static/**,/assets/**,*.css,*.js,*.svg,*.png,*.jpg,*.gif,*.ico,*.woff,*.woff2'
Multiple API versions
--include-patterns '/v1/**,/v2/**,/v3/**'
Pattern tips
- Patterns match against the URL path only (after the prefix is stripped), not the full URL
- Leading
/is recommended for clarity but not required - Patterns are case-sensitive
- Use
**sparingly — it matches everything, including deeply nested paths
Resource limits
To prevent denial-of-service when processing untrusted captures, mitm2openapi enforces
several configurable and fixed limits.
Configurable limits
These limits can be adjusted via CLI flags:
| Flag | Default | Purpose |
|---|---|---|
--max-input-size | 2 GiB | Reject files larger than this before reading |
--max-payload-size | 256 MiB | Cap on individual tnetstring payload allocation |
--max-depth | 256 | Recursion depth limit for nested tnetstring structures |
--max-body-size | 64 MiB | Maximum request/response body considered during schema inference |
--allow-symlinks | off | By default, symlinked inputs are rejected |
Adjusting limits
Increase --max-input-size if you work with captures larger than 2 GiB:
mitm2openapi discover \
-i large-capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--max-input-size 8GiB
Size suffixes are supported: KiB, MiB, GiB.
The other limits rarely need tuning. The defaults are designed to handle real-world captures while rejecting pathological inputs.
Symlink rejection
By default, symlinked input files are rejected to prevent path-traversal attacks on shared CI runners. If you need to process a symlinked file:
mitm2openapi discover \
-i /path/to/symlinked-capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--allow-symlinks
Fixed per-field limits
These limits are applied unconditionally and cannot be changed via CLI flags:
| Field | Cap | Behaviour when exceeded |
|---|---|---|
| Header name | 8 KiB | Header dropped (other headers still processed) |
| Header value | 64 KiB | Value truncated to cap |
| Form fields per request | 1,000 | Excess fields ignored |
| URL scheme | http / https only | Non-HTTP flows silently skipped |
| Port number | 1 -- 65,535 | Out-of-range port drops the request |
| HTTP status code | 100 -- 599 | Invalid codes treated as no response |
UTF-8 validation
Identity fields (scheme, host, path, method, header names) require valid UTF-8. Flows with non-UTF-8 identity bytes are skipped to prevent data aliasing through replacement-character collisions.
Control characters (0x00--0x1F, 0x7F) in paths are stripped automatically.
Streaming and memory
Both mitmproxy flow files and HAR files are processed incrementally. Memory usage stays bounded regardless of input size — there is no need to load the entire capture into memory.
Peak RSS is proportional to the size of the largest single flow in the capture, not the total file size. For typical captures, expect 5--15 MB of memory usage.
When limits fire
When a per-field limit is exceeded (header too large, body too large, form fields over cap), the affected field is skipped or truncated and processing continues with the remaining data.
When a tnetstring parse error occurs, the iterator halts and the rest of the file is not processed — valid flows parsed before the error are still emitted. There is no resync because binary payloads can contain bytes that mimic valid length prefixes.
In both cases a warn-level log message is emitted with details.
Use strict mode to treat these warnings as errors, or processing reports to capture them as structured data.
Strict mode
Pass --strict to either discover or generate to treat warning-level events as
hard failures. The process exits with code 2 if the processing report records any
counted events.
Currently, the only event counter populated at runtime is parse_error — triggered when
flows cannot be deserialized (corrupt tnetstring data, malformed HAR JSON). The
cap_fired and rejected counters exist in the report schema but are not yet wired to
the reader pipelines; they will be connected in a future release.
In practice, --strict today catches:
- Parse errors during flow deserialization (tnetstring or HAR)
- Errors counted by the streaming iterator wrapper in
discovermode
Usage
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--strict
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com" \
--strict
CI usage pattern
Strict mode is designed for CI gates where silent degradation is unacceptable:
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--strict \
|| { echo "FAIL: corrupt or over-limit flows detected"; exit 1; }
Without --strict
Without the flag, parse errors are logged at warn level and processing continues with
exit code 0. Affected flows are skipped, but the output file is still produced. Other
warning-level events (cap fires, scheme rejections, etc.) are always logged but do not
currently increment the report counters that --strict checks.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success (no warnings, or --strict not set) |
| 1 | Fatal error (I/O failure, missing required arguments) |
| 2 | Strict mode violation (warnings detected with --strict) |
Combining with reports
For CI pipelines that need both strict enforcement and structured diagnostics:
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com" \
--strict \
--report report.json
The report is written even when --strict causes a non-zero exit, capturing
the full details of what went wrong.
Processing reports
Pass --report <PATH> to either discover or generate to write a JSON processing
summary. This is useful for CI pipelines that need structured data instead of log scraping.
Usage
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--report report.json
Report schema
{
"report_version": 1,
"tool_version": "0.5.1",
"input": {
"path": "capture.flow",
"format": "Auto",
"size_bytes": 102400
},
"result": {
"flows_read": 150,
"flows_emitted": 148,
"paths_in_spec": 12
},
"events": {
"parse_error": {
"TNetString parse error at byte 98304: unexpected end of input": 1
}
}
}
Fields
| Field | Type | Description |
|---|---|---|
report_version | integer | Schema version (currently 1) |
tool_version | string | mitm2openapi version that produced the report |
input.path | string | Input file path |
input.format | string | Detected or specified format (Auto, Mitmproxy, Har) |
input.size_bytes | integer | Input file size in bytes |
result.flows_read | integer | Total flows/entries parsed from input |
result.flows_emitted | integer | Flows that passed all filters and were processed |
result.paths_in_spec | integer | Unique paths in the output (for generate) |
events | object | Map of event categories to message counts |
Event categories
| Category | Meaning | Status |
|---|---|---|
parse_error | Corrupt data encountered (tnetstring errors, malformed HAR entries) | Populated |
cap_fired | A resource limit was triggered (body too large, depth exceeded) | Reserved — not yet populated at runtime |
rejected | A flow was skipped (invalid UTF-8, unsupported scheme, bad port/status) | Reserved — not yet populated at runtime |
The cap_fired and rejected categories are present in the report schema and will be
connected to the reader pipelines in a future release. Currently, only parse_error
events are counted.
CI integration
Parse the report in CI to make decisions based on processing quality:
mitm2openapi generate \
-i capture.flow \
-t templates.yaml \
-o openapi.yaml \
-p "https://api.example.com" \
--report report.json
# Check if any events occurred
if jq -e '.events | length > 0' report.json > /dev/null 2>&1; then
echo "Warning: processing had events"
jq '.events' report.json
fi
Report with strict mode
The report is written even when --strict causes a non-zero exit code. This lets you
capture full diagnostics while still failing the CI job:
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--strict \
--report report.json \
|| { jq '.' report.json; exit 1; }
CLI reference
This reference was last synced with mitm2openapi --help at version 0.5.1.
If you notice a flag missing from your local --help output, the tool may be ahead of these
docs. Open an issue to prompt an update.
mitm2openapi discover
Scan captured traffic and produce a templates file listing all observed endpoints.
mitm2openapi discover [OPTIONS] -i <INPUT> -o <OUTPUT> -p <PREFIX>
Required arguments
| Option | Description |
|---|---|
-i, --input <PATH> | Input file (flow dump or HAR) |
-o, --output <PATH> | Output YAML templates file |
-p, --prefix <URL> | API prefix URL to filter requests |
Optional arguments
| Option | Default | Description |
|---|---|---|
--format <FORMAT> | auto | Input format: auto, har, mitmproxy |
--exclude-patterns <GLOBS> | Comma-separated globs; matching paths are dropped entirely | |
--include-patterns <GLOBS> | Comma-separated globs; matching paths are auto-activated | |
--max-input-size <BYTES> | 2GiB | Maximum input file size. Accepts KiB, MiB, GiB suffixes |
--allow-symlinks | off | Allow symlinked input files |
--strict | off | Treat warnings as errors (exit code 2) |
--report <PATH> | Write structured JSON processing report |
mitm2openapi generate
Generate an OpenAPI 3.0 spec from captured traffic using a curated templates file.
mitm2openapi generate [OPTIONS] -i <INPUT> -t <TEMPLATES> -o <OUTPUT> -p <PREFIX>
Required arguments
| Option | Description |
|---|---|
-i, --input <PATH> | Input file (flow dump or HAR) |
-t, --templates <PATH> | Templates YAML file (from discover) |
-o, --output <PATH> | Output OpenAPI YAML file |
-p, --prefix <URL> | API prefix URL |
Optional arguments
| Option | Default | Description |
|---|---|---|
--format <FORMAT> | auto | Input format: auto, har, mitmproxy |
--openapi-title <TITLE> | Custom title for the spec | |
--openapi-version <VER> | 1.0.0 | Custom spec version |
--exclude-headers <LIST> | Comma-separated headers to exclude from spec | |
--exclude-cookies <LIST> | Comma-separated cookies to exclude from spec | |
--include-headers | off | Include request headers in the spec |
--ignore-images | off | Ignore image content types |
--suppress-params | off | Suppress parameter suggestions |
--tags-overrides <JSON> | JSON string for tag overrides | |
--max-input-size <BYTES> | 2GiB | Maximum input file size |
--max-payload-size <BYTES> | 256MiB | Maximum tnetstring payload size |
--max-depth <N> | 256 | Maximum tnetstring nesting depth |
--max-body-size <BYTES> | 64MiB | Maximum request/response body size |
--allow-symlinks | off | Allow symlinked input files |
--strict | off | Treat warnings as errors (exit code 2) |
--report <PATH> | Write structured JSON processing report |
Common flag details
--format
By default, the input format is auto-detected from a combination of file extension and content sniffing:
.flowextension or content starting with a tnetstring length prefix → mitmproxy format.harextension or content starting with{→ HAR format
Use --format mitmproxy or --format har to override auto-detection.
--prefix
The prefix URL filters which requests are processed. Only requests whose URL starts with the prefix are included. The prefix is stripped from paths in the generated spec.
Example: with --prefix https://api.example.com, a request to
https://api.example.com/users/42 produces path /users/42 in the spec.
--strict
See strict mode for details on exit codes and CI usage.
--report
See processing reports for the JSON schema and CI integration examples.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Fatal error (I/O failure, missing arguments, invalid input) |
| 2 | Strict mode violation (warnings with --strict enabled) |
Environment variables
| Variable | Description |
|---|---|
RUST_LOG | Controls log verbosity. Default: warn. Set to info or debug for more output. |
RUST_LOG=info mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"
mitmproxy flow dumps
mitm2openapi reads mitmproxy's native binary flow format. This is the recommended input
format — it captures the richest data and is produced directly by mitmdump and mitmweb.
Supported versions
| Flow format version | mitmproxy version | Status |
|---|---|---|
| v19 | mitmproxy 8.x | Supported |
| v20 | mitmproxy 9.x | Supported |
| v21 | mitmproxy 10.x | Supported |
The flow format is auto-detected from file content. No version flag is needed.
How flow files work
Flow files use the tnetstring serialization format. Each flow is a sequence of key-value pairs representing a complete HTTP request-response cycle.
A typical flow contains:
- Request: method, URL (scheme, host, port, path), headers, body
- Response: status code, headers, body
- Metadata: timestamps, flow ID, client/server addresses
mitm2openapi extracts the request and response data relevant to OpenAPI spec generation
and discards metadata.
Capturing flow files
# Record all traffic through the proxy
mitmdump -w capture.flow
# Record only traffic to a specific host
mitmdump -w capture.flow --set flow_detail=0 \
--set save_stream_filter='~d api.example.com'
See capturing traffic for full setup instructions.
Directory input
If you pass a directory path to -i, mitm2openapi reads all .flow files in that
directory (non-recursive). This is useful when you have traffic split across multiple
capture sessions.
Known limitations
- No WebSocket frames — WebSocket upgrade requests are captured, but frame-level data is not used for spec generation
- No gRPC — binary protocol buffers inside HTTP/2 frames are not decoded
- Corrupt files — when the tnetstring parser hits corruption, it stops and reports the byte offset. No resync is attempted because binary payloads can contain bytes that mimic valid tnetstring length prefixes. See diagnostics for details.
- Large payloads — individual tnetstring payloads are capped at 256 MiB by default
(adjustable via
--max-payload-size)
HAR files
mitm2openapi reads HAR (HTTP Archive)
files — the standard format for exporting browser network traffic. HAR version 1.2 is supported.
Producing HAR files
Browser DevTools
All modern browsers export HAR from their Network tab:
- Chrome/Chromium: DevTools → Network → right-click → "Save all as HAR with content"
- Firefox: DevTools → Network → gear icon → "Save All As HAR"
- Safari: Web Inspector → Network → Export button
HTTP proxies
Several proxy tools export HAR:
- Charles Proxy — File → Export Session → HAR
- Fiddler — File → Export Sessions → HTTPArchive
- Proxyman — Export as HAR
Programmatic generation
Libraries like puppeteer and playwright
can produce HAR files from automated browser sessions:
// Playwright example
const context = await browser.newContext({
recordHar: { path: 'capture.har' }
});
// ... run your test
await context.close(); // HAR is written on close
Usage
mitm2openapi discover \
-i capture.har \
-o templates.yaml \
-p "https://api.example.com"
Format is auto-detected. Use --format har to force HAR parsing if auto-detection fails.
HAR vs mitmproxy flows
| Aspect | mitmproxy flow | HAR |
|---|---|---|
| Source | mitmproxy proxy | Browser DevTools, HTTP proxies |
| Format | Binary (tnetstring) | JSON |
| Response bodies | Always present | Sometimes base64-encoded |
| HTTPS | Decrypted by proxy | Decrypted by browser |
| File size | Compact binary | Larger (JSON overhead) |
| Streaming | Native | Incremental JSON parsing |
Both formats produce equivalent OpenAPI specs. Choose based on your capture workflow:
- mitmproxy flows for server-side proxying, CI pipelines, and automated captures
- HAR files for browser-based testing, manual exploration, and when you already have DevTools open
Incremental parsing
HAR files are parsed incrementally — the entire JSON is not loaded into memory at once. This means memory usage stays bounded even for large HAR exports (hundreds of megabytes).
Known limitations
- Base64-encoded bodies — some HAR exporters base64-encode response bodies. Decode failures are logged as warnings and the body is skipped (not silently dropped).
- Compressed content — if the HAR exporter did not decompress response bodies,
mitm2openapisees the compressed bytes. Most browser DevTools decompress automatically. - Timing data — HAR timing information (DNS, connect, TLS) is ignored; only request and response data is used for spec generation.
Performance & Benchmarks
Results are regenerated weekly by the benchmark workflow. See the workflow for the reproducible methodology.
Generated by the benchmark workflow.
Benchmark results
Run: 2026-04-22 22:31 UTC, commit 22ef2faa, runner: Linux 6.17.0-1011-azure
Fixture: 89 MB, 40k requests across 8 endpoint shapes (bench-fixtures-v1).
Timing
| Command | Mean s | Min s | Max s | Relative |
|---|---|---|---|---|
Python mitmproxy2swagger | 44.757 ± 0.219 | 44.384 | 44.965 | 16.80 ± 0.26 |
Rust mitm2openapi | 2.663 ± 0.039 | 2.618 | 2.712 | 1.00 |
Peak RSS
| Tool | RSS |
|---|---|
| Python mitmproxy2swagger | 46104 KB |
| Rust mitm2openapi | 6168 KB |
Security model
- Threat model
- Input validation layers
- Streaming architecture
- Glob pattern safety
- Recommendations
- Related
mitm2openapi processes untrusted binary input (traffic captures from unknown sources).
The security model is designed to prevent denial-of-service, data corruption, and
information leakage when handling adversarial input.
Threat model
The primary threat is a malicious capture file — a .flow or .har file crafted to
exploit the parser. Scenarios include:
- CI pipelines processing captures from untrusted contributors
- Shared analysis servers where multiple users submit captures
- Automated pipelines where the capture source is not fully controlled
Input validation layers
File-level checks
Before reading any content:
- File type — only regular files are accepted. Symlinks, FIFOs, device files, and
directories are rejected unless
--allow-symlinksis explicitly set. - File size — files exceeding
--max-input-size(default 2 GiB) are rejected before any bytes are read. - TOCTOU caveat — file metadata is checked via the path before reading to reject symlinks, non-regular files, and oversized inputs. There is a small TOCTOU window between the metadata check and the file open; mitigation via fd-based recheck after open is a future enhancement.
Parser-level caps
During parsing:
| Cap | Default | Purpose |
|---|---|---|
| Payload size | 256 MiB | Prevents OOM from oversized tnetstring values |
| Nesting depth | 256 | Prevents stack overflow from deeply nested structures |
| JSON depth | 64 | Prevents stack overflow in schema inference |
| Body size | 64 MiB | Limits memory for individual request/response bodies |
These caps trigger warn-level events and skip the affected data. Use --strict to
treat them as hard errors.
Field-level validation
For every flow:
- Scheme whitelist — only
httpandhttpsare accepted. Other schemes (e.g.,javascript:,data:) are silently skipped. - UTF-8 strictness — identity fields (method, scheme, host, path, header names) must be valid UTF-8. Invalid bytes cause the flow to be skipped, preventing data aliasing through replacement-character collisions.
- Port range — port numbers must be 1--65,535. Out-of-range values drop the request.
- Status code range — HTTP status codes must be 100--599.
- Control character stripping —
0x00--0x1Fand0x7Fin URL paths are removed. - Header caps — header names over 8 KiB are dropped; values over 64 KiB are truncated.
- Form field count — at most 1,000 form fields per request are processed.
Output safety
- Atomic writes — output files are written via a temporary file and renamed. If the write fails (disk full, permission denied), the target path is left untouched.
- No resync on corruption — when the tnetstring parser encounters corrupt data, it halts immediately. It does not scan forward looking for the next valid frame, because binary payloads can contain bytes that look like valid length prefixes.
Streaming architecture
Both mitmproxy and HAR inputs are processed incrementally. At no point is the entire capture loaded into memory. This bounds peak RSS to the size of the largest single flow, regardless of total file size.
Glob pattern safety
The --exclude-patterns and --include-patterns flags use the
globset crate, which compiles patterns into a DFA. This eliminates
exponential backtracking that was possible with the original recursive glob matcher.
Recommendations
For processing untrusted captures:
- Do not use
--allow-symlinksunless you control the filesystem - Keep
--max-input-sizeat the default (2 GiB) or lower - Run with
--strictto fail fast on any anomaly - Use
--reportto capture processing diagnostics for audit trails - Run in a sandboxed environment (container, VM) when processing captures from unknown sources
Related
- Resource limits — configuring the caps
- Strict mode — CI enforcement
- Diagnostics — interpreting warnings and errors
Diagnostics
mitm2openapi uses structured logging to report issues during processing. This chapter
covers how to interpret warnings, errors, and the structured report output.
Log levels
Control verbosity with the RUST_LOG environment variable:
# Default: warnings only
mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"
# More detail
RUST_LOG=info mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"
# Full debug output
RUST_LOG=debug mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.com"
Common warnings
Parse errors (tnetstring)
WARN TNetString parse error at byte 98304: unexpected end of input (148 flows parsed successfully)
This means the mitmproxy flow file contains corrupt data starting at byte 98,304. The parser halts immediately and the remaining bytes in the file are not processed. The 148 flows parsed before the corruption are still emitted.
No resync is attempted. Binary payloads can contain bytes that mimic valid tnetstring length prefixes, so scanning forward would produce phantom flows with fabricated data.
What to do:
- If the file was truncated during transfer, re-capture or re-download
- The 148 successfully parsed flows are still usable
- Use
--reportto capture the exact byte offset for debugging
Cap-fired events
WARN body size 68157440 exceeds cap 67108864, truncating
WARN header name exceeds 8192 bytes, dropping
WARN form field count 1247 exceeds cap 1000, ignoring excess
These indicate that a specific field in a flow exceeded the built-in or configured limit. The affected field is truncated or dropped, but processing continues.
What to do:
- Usually safe to ignore — the caps exist to prevent abuse, not normal traffic
- If you need the full data, increase the relevant
--max-*flag - Use
--strictto fail on these if you need guaranteed completeness
Flow rejection events
WARN skipping flow: scheme "javascript" not in whitelist [http, https]
WARN skipping flow: invalid UTF-8 in host field
WARN skipping flow: port 0 out of valid range 1-65535
These mean an entire flow was skipped because it failed validation.
What to do:
- Non-HTTP flows (WebSocket upgrades, CONNECT tunnels) are expected to be skipped
- UTF-8 errors suggest the capture contains binary protocol data, not HTTP traffic
- Invalid port/status usually indicates corrupt flow data
Structured reports
For machine-readable diagnostics, use --report:
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--report report.json
See processing reports for the full JSON schema.
Event categories in reports
| Category | Examples |
|---|---|
parse_error | Tnetstring corruption, HAR JSON syntax errors |
cap_fired | Body too large, depth exceeded, form field count exceeded |
rejected | Invalid scheme, non-UTF-8 identity fields, bad port/status |
Using reports in CI
# Fail if any parse errors occurred
if jq -e '.events.parse_error | length > 0' report.json > /dev/null 2>&1; then
echo "Parse errors detected"
exit 1
fi
# Check flows-read vs flows-emitted ratio
RATIO=$(jq '.result.flows_emitted / .result.flows_read' report.json)
if (( $(echo "$RATIO < 0.9" | bc -l) )); then
echo "Warning: more than 10% of flows were dropped"
fi
Strict mode interaction
With --strict, any warning-level event causes exit code 2. This converts the
"informational" diagnostics above into hard failures:
mitm2openapi discover \
-i capture.flow \
-o templates.yaml \
-p "https://api.example.com" \
--strict \
--report report.json
# Exit code 2 if ANY warning was emitted
# report.json still written for post-mortem
See strict mode for details.
Changelog
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
0.5.2 - 2026-04-24
Fixed
- (har) apply header size caps consistent with mitmproxy reader
- (reader) reject symlinked directory inputs and entries
Other
- (security) cover symlink directory and entry rejection
- (readme) trim content migrated to book, add docs badge
- (book) add mdBook scaffold with book.toml and all chapter content
- adjust CHANGELOG/CONTRIBUTING headings for mdBook inclusion
- regenerate demo.gif skip ci All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
0.5.1 - 2026-04-22
Other
- (bench) refresh benchmark results
- (bench) drop small fixture tier
- (readme) add benchmarks section linking to automated results
- (bench) seed benchmarks.md with methodology and placeholders
- regenerate demo.gif skip ci
0.5.0 - 2026-04-22
Added
- (cli) add --strict flag to escalate warnings to errors
Other
- (readme) document --strict flag and benchmark CI
- (strict) verify strict mode exit codes
0.4.1 - 2026-04-22
Fixed
- (builder) use .get() in dedup_schema_variants to satisfy indexing_slicing lint
- (reader) warn on skipped directory entries and malformed overrides
- (schema) union array element schemas and tighten dict heuristic
Other
- (lint) deny clippy::indexing_slicing at crate level
- extract is_numeric_string and is_uuid to shared module
- (output) lazy-init regex via LazyLock
- (error) replace guarded unwrap sites with pattern matching
0.4.0 - 2026-04-22
Added
- feat!(builder): merge response schemas per status code
- feat!(cli): remove unused --param-regex flag
Other
- (readme) remove --param-regex mention from CLI reference
- (cli) verify --param-regex is rejected as unknown argument
- (builder) cover multi-status response aggregation
- refactor!(error): mark Error enum as non_exhaustive
- regenerate demo.gif skip ci
0.3.0 - 2026-04-22
Added
- (report) track cap firings and parse errors in processing report
- (cli) add --report flag for structured processing summary
- (tnetstring) emit byte offset and error kind on parse halt
Other
- (readme) document --report flag and parse halt diagnostics
- (report) verify report file schema and contents
- (tnetstring) verify parse halt diagnostics and no-resync on binary payload
0.2.6 - 2026-04-22
Fixed
- (test) gate Unix-specific path-failure test behind cfg(unix)
- (output) write YAML via tempfile and atomic rename
Other
- (output) verify atomic write preserves target on failure
- (deps) move tempfile to runtime dependencies
0.2.5 - 2026-04-22
Fixed
- (builder) skip requests with unknown HTTP methods instead of aliasing to GET
Other
- (builder) verify unknown method is skipped and standard methods preserved
0.2.4 - 2026-04-22
Fixed
- (params) preserve multi-byte UTF-8 in urlencoding_decode
Other
- (params) add UTF-8 roundtrip and overlong rejection cases
- regenerate demo.gif skip ci
0.2.3 - 2026-04-22
Fixed
- (builder) cap form-field count per request at 1000
- (har) validate schemes and status codes, log base64 failures, cap bodies
- (reader) validate port/status ranges, enforce strict UTF-8, and cap field sizes
Other
- (readme) document per-field size and validation limits
0.2.2 - 2026-04-22
Added
- (har) add streaming HAR entry iterator
Other
- (readme) mention HAR streaming in resource limits and supported formats
- (har) verify streaming does not materialize all entries
- (reader) switch HAR dispatch to streaming iterator
- regenerate demo.gif skip ci
0.2.1 - 2026-04-22
Added
- (reader) add stream_mitmproxy_file and stream_mitmproxy_dir
- (tnetstring) add streaming iterator TNetStringIter
Other
- (readme) document resource-limit flags and streaming behavior
- (main) switch discover and generate to streaming pipeline
- (path_matching) cache compiled regexes in CompiledTemplates
- (builder) add discover_paths_streaming variant
- regenerate demo.gif skip ci
0.2.0 - 2026-04-22
Added
- (path_matching) validate path parameter identifiers
- (cli) expose --max-input-size, --max-payload-size, --max-depth, --max-body-size, --allow-symlinks
- (reader) reject symlinks, non-regular files, and oversized inputs
- (schema) enforce 64-level JSON recursion depth limit
- (tnetstring) enforce 256-level recursion depth limit
- (tnetstring) cap payload size at 256 MiB
- (error) add typed variants for parse and input limits
Fixed
- (test) gate symlink and FIFO tests behind cfg(unix)
Other
- update Cargo.lock for globset dependency
- (security) cover symlink, FIFO, and oversize input rejection
- (har) bound format-detection read to 4 KiB
- (builder) replace custom glob matcher with globset
0.1.2 - 2026-04-22
Other
- add tests/fixtures/poc placeholder directory (P0.2)
- regenerate demo.gif skip ci
0.1.1 - 2026-04-20
Other
- (readme) add Why? section explaining the Python-vs-Rust trade-off
- (deps) bump assert_cmd from 2.2.0 to 2.2.1 in the all-cargo group (#7)
- regenerate demo.gif skip ci
0.1.0 - TBD
Initial release.
Contributing
This document covers how to run the three test tracks locally.
Prerequisites
| Tool | Required for | Install |
|---|---|---|
| Rust toolchain | Build + unit tests | rustup.rs |
| Docker + Compose | All integration tests | docs.docker.com |
oasdiff | Level 1 diff validation | go install github.com/tufin/oasdiff@latest or brew install oasdiff |
| Node.js + npm | Level 2 | nodejs.org |
| Playwright | Level 2 | npx playwright install --with-deps chromium |
| VHS | Demo GIF | brew install vhs or charm apt repo |
| ffmpeg, gifski, gifsicle | Demo GIF optimization | System package manager |
Build
cargo build --release
# Binary: target/release/mitm2openapi
Unit Tests
cargo test
Level 1 — Petstore Golden Test (~2 min)
Full pipeline (compose up, seed, discover, generate, diff, teardown):
tests/integration/level1/run.sh
Strict mode (--fail-on WARN instead of BREAKING):
tests/integration/level1/run.sh --strict
Manual step-by-step:
cd tests/integration/level1
docker compose up -d
# Wait for petstore healthcheck...
../../ci/petstore/seed.sh
# Run mitm2openapi discover/generate against the proxy
# ...
docker compose down -v
Gotcha: The seed script sends requests through the mitmproxy proxy to
petstore:8080(Docker service name), notlocalhost. This is intentional — traffic must flow through the proxy to be captured.
Level 2 — crAPI + Playwright (~8 min)
cd tests/integration/level2
# Start crAPI stack (identity + community + workshop + web + mongo + postgres + mailhog + mitmproxy)
make up
# No seed needed — crAPI auto-seeds on first boot
# Run Playwright scenarios
npm install
npx playwright install --with-deps chromium
npx playwright test
# Cleanup
make down
Port conflict: Level 1 and Level 2 both use port 8080 (for different services). Do not run both stacks simultaneously.
Demo GIF (Phase 2 terminal recording)
cd ci/demo
make phase2 # VHS recording
make gif # gifski + gifsicle optimization
make clean # remove outputs
Phase 2 uses a real capture, not a committed fixture. The GHA workflow copies
tests/integration/level2/out/crapi.flow(produced by Phase 1) intoci/demo/out/demo.flowbefore running the tape. Locally, do the same: run Phase 1 first, thencp tests/integration/level2/out/crapi.flow ci/demo/out/demo.flow.
Filtering discover output
Captures from real apps include static assets (/static/css/main.*.css,
/images/*.svg, etc.) which bloat the generated OpenAPI spec. Two flags
on discover handle this:
mitm2openapi discover \
-i capture.flow -o templates.yaml -p http://api.example.com \
--exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg,*.png,*.jpg' \
--include-patterns '/api/**,/v2/**'
--exclude-patterns: paths matching any glob are dropped entirely (not even emitted asignore:).--include-patterns: paths matching any glob are emitted without theignore:prefix (i.e. auto-activated forgenerate). Everything else still getsignore:for manual review.
Globs: * matches a single path segment, ** matches any subtree.
Together they let generate run with no intermediate sed or editor
step — useful for automated pipelines like the demo GIF.
Ports Reference
| Stack | Service | Port |
|---|---|---|
| Level 1 | Petstore | 8080 |
| Level 1 | mitmproxy | 8081 |
| Level 2 | crAPI web | 8888 |
| Level 2 | mailhog | 8025 |
| Level 2 | mitmproxy | 8080 |
| Demo | Swagger UI | 8088 |
Cleanup
All compose stacks use docker compose down -v to remove containers and volumes.
CI Workflows
| Workflow | Trigger | Notes |
|---|---|---|
integration-level1.yml | Every PR | Naive (required) + strict (informational) |
integration-level2.yml | Nightly + manual dispatch | Full crAPI + Playwright |
demo-gif.yml | Push to main (path-filtered) + manual dispatch | Terminal recording |