Skip to content

WebArc

WebArc is a local-first web archiving system designed to capture, preserve, and replay HTTP content in an extensible way.

Unlike traditional crawlers that fetch pages in isolation, WebArc focuses on recording real HTTP traffic, storing it in an archive, and making that archive usable through multiple interfaces.

About

WebArc is a toolchain for:

  • Capturing HTTP(S) traffic into a persistent archive
  • Replaying archived content locally as if it were still online
  • Inspecting archived data at different abstraction levels
  • Integrating with existing tools and workflows

WebArc is local-first by design: archives live on your machine, and you control how they are created, served, and accessed.

Core Concepts

At a high level, WebArc consists of three ideas:

  1. Capture
    HTTP traffic is intercepted or fetched and written into an archive.

  2. Archive
    The archive is the authoritative record of requests, responses, and metadata.

  3. Access
    Archived content can be accessed in multiple ways:

-> Served over HTTP

-> Proxied to other tools

-> Mounted as a filesystem

-> Queried or processed programmatically

You don’t need to use every component — WebArc is intentionally modular.