Back to Tools

Internet Archive Analyser

Use cases

Planning migrations by understanding old site structure Recovering lost URLs and content from archives Analyzing competitor site evolution over time Finding historical content for link reclamation

Queries the Wayback Machine CDX server to analyse how a site evolved over time.

Tracks folder structure changes, HTTP status codes, frequently modified pages, and robots.txt history with diff-based version comparison.

Includes Plotly visualisations and CSV export.

Streamlit App

Platform

Browser-based (no installation required)

Input

Domain name

Analysis settings

Output

Historical analysis with charts and CSV export

Launch App View Source

Features

  • Folder structure evolution tracking (annual breakdown)
  • HTTP status code analysis (1xx-5xx groupings over time)
  • Frequently changed pages identification
  • robots.txt timeline with diff-based version comparison
  • Stacked line/bar chart visualisations (Plotly)
  • Configurable filters (All files, HTML only, HTML + Images)

How to use

  1. 1 Enter the domain you want to analyse
  2. 2 Select visualisation type and top folders count
  3. 3 Choose file type filter
  4. 4 Run the query against the Wayback Machine CDX server
  5. 5 Explore tabs: Folder Structure, Status Codes, Changed Pages, robots.txt
  6. 6 Compare robots.txt versions with highlighted diffs
  7. 7 Export filtered URL list as CSV

Let's work together

Monthly retainers or one-off projects. No lengthy reports that sit in a drawer.

Let's Talk