Back to Tools

Content Extractor

Use cases

Content audits Striking distance keyword analysis Competitor content extraction Bulk content gathering for analysis

Uses BeautifulSoup to remove script/style/nav/footer/header elements, replaces <br> with newlines, and normalises whitespace.

ThreadPoolExecutor for concurrent requests (1-10 workers).

Randomised rate limiting (0.5-1.5x configured delay) to avoid blocks.

Customisable User-Agent header (Chrome 120 default).

Streamlit App

Platform

Browser-based (no installation required)

Input

URLs via text area (one per line) or CSV upload

URL list (paste or CSV)

Output

CSV/Excel: URL, Title, H1, Content Length, Status, Error messages. Display shows extraction progress.

Launch App View Source

Features

  • BeautifulSoup HTML cleaning (removes scripts, nav, footer)
  • ThreadPoolExecutor concurrent requests (1-10 workers)
  • Randomised rate limiting (0.5-1.5x delay)
  • Request timeout slider (5-30 seconds)
  • Customisable User-Agent header
  • CSV and Excel (.xlsx) export via openpyxl

How to use

  1. 1 Enter URLs or upload CSV and select URL column
  2. 2 Configure request delay (0.5-5.0 seconds)
  3. 3 Set concurrent workers (1-10) and timeout (5-30s)
  4. 4 Optionally customise User-Agent
  5. 5 Run extraction
  6. 6 Download CSV or Excel with results

Let's work together

Monthly retainers or one-off projects. No lengthy reports that sit in a drawer.

Let's Talk