Getting Started¶

This guide walks you through your first complete WebArc workflow: capturing a website locally and replaying it in your browser.

By the end, you will:

Run WebArc locally
Capture real HTTP traffic into an archive
Browse the archived content without accessing the original site

Build¶

git clone https://git.hydrar.de/jmarya/webarc
cd webarc
cargo build --release

Start WebArc¶

For a first run, WebArc can start all essential components at once:

# Set environment variables for archive
export DB_URL="postgres://user:password@127.0.0.1:5432/webarc" # Metadata DB
export S3_URL="https://s3.example.com/bucket" # Blob Store
# Auth for S3
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

webarc serve

This starts:

A local server (default: localhost:8000)
A local HTTP proxy (default: localhost:3000)
An archive (created automatically if it does not exist)

WebArc will log what it is doing; keep this process running.

Capture a Website¶

To capture content, route traffic through the WebArc proxy.

Option A: Using your browser¶

Configure your browser to use an HTTP proxy:
- Host: localhost
- Port: 8000
Visit any website (for example, a documentation page or a blog).

As you browse, WebArc records the HTTP requests and responses into the archive.

Option B: Using command-line tools¶

You can also capture traffic using tools like curl or wget:

http_proxy=http://localhost:8000 curl https://example.org

Warning

For HTTPS sites, WebArc will create a self-signed certificate for the proxy. As with any MITM proxy your client needs to trust the certificate. See TLS.

What Just Happened?¶

In this short session, WebArc:

Intercepted real HTTP traffic via the proxy
Stored requests, responses, and metadata in an archive
Served that archive back to you through the proxy server

The archive is now a persistent record of what you captured.