WebArc¶
WebArc is a local-first web archiving system designed to capture, preserve, and replay HTTP content in an extensible way.
Unlike traditional crawlers that fetch pages in isolation, WebArc focuses on recording real HTTP traffic, storing it in an archive, and making that archive usable through multiple interfaces.
About¶
WebArc is a toolchain for:
- Capturing HTTP(S) traffic into a persistent archive
- Replaying archived content locally as if it were still online
- Inspecting archived data at different abstraction levels
- Integrating with existing tools and workflows
WebArc is local-first by design: archives live on your machine, and you control how they are created, served, and accessed.
Core Concepts¶
At a high level, WebArc consists of three ideas:
-
Capture
HTTP traffic is intercepted or fetched and written into an archive. -
Archive
The archive is the authoritative record of requests, responses, and metadata. -
Access
Archived content can be accessed in multiple ways:
-> Queried or processed programmatically
You don’t need to use every component — WebArc is intentionally modular.