Configuration¶
WebArc is configurable via a YAML file. The configuration controls how archives are stored, refreshed, and captured, down to individual domains and URL paths.
Top-Level Options¶
route_internal: bool¶
- Rewrite links inside archived pages to point back to the local archive.
- Prevents browsers from accidentally fetching live content.
enable_fetch: bool¶
- Enable on-demand fetching of missing resources.
- Useful for hybrid capture/replay setups.
Requests Section¶
The requests section controls what gets captured and how:
It defines:
- Global defaults for all domains
- Domain-specific rules
- Path-specific overrides
Blacklisted Domains¶
- Domains matching these regexes are never fetched or archived.
- Useful for dynamic, private, or otherwise problematic sites.
Global Defaults¶
Global defaults apply everywhere unless overridden.
You can see the available RequestConfigValues here.
Domain Configuration¶
domain— the domain this config applies toglobal— overrides global defaults for this domain (see RequestConfigValues)path_match— optional path-specific rules
WebArc uses cascading rules:
Path-Specific Overrides¶
path— regex to match the URL pathapply— RequestConfigValues applied to matching requests
This lets you handle dynamic pages, skip irrelevant content, or force re-fetches.
RequestConfigValues¶
RequestConfigValues is the set of options that can be applied globally, per-domain, or per-path.
| Option | Type | Description |
|---|---|---|
outdated |
duration string | How long before a cached response is considered stale (e.g., 10d, 30month) |
keep_n |
integer | Number of snapshots to retain per resource |
always_fetch |
bool | If true, always fetch fresh content; never return from archive |
drop |
bool | If true, skip this resource entirely; never fetch or store |
Examples:
- Global default:
- Domain override:
- Path-specific override:
Real-World Example: Arch Linux Mirror¶
- domain: "geo.mirror.pkgbuild.com"
global:
outdated: "10d"
keep_n: 3
path_match:
- path: "\\.db(\\.sig)?$"
apply:
always_fetch: true
outdated: "10s"
keep_n: 3
- Regular files: refreshed every 10 days, keep 3 snapshots
- DB files: always fetched, keep 3 snapshots