Back to Tools

Template Fingerprinting

Use cases

Auditing large sites by template instead of individual pages Finding pages using wrong or outdated templates Understanding site structure and page types Prioritising template-level technical SEO fixes

Classifies pages into template groups using TF-IDF vectorisation and K-Means clustering on HTML structure.

Extracts four feature dimensions: tag counts, CSS classes, ID attributes, and meta tags.

Default 5 clusters with reproducible results (random state 42).

Streamlit App

Platform

Browser-based (no installation required)

Input

Crawl CSV with URLs

Output

CSV with template cluster assignments

Launch App View Source

Features

  • TF-IDF vectorisation of HTML structural features
  • K-Means clustering (configurable cluster count, default 5)
  • Four feature dimensions: tag counts, CSS classes, IDs, meta tags
  • Reproducible results (random state 42)
  • Bulk URL fetching with progress indicator

How to use

  1. 1 Upload CSV with URL list (requires "Address" column)
  2. 2 Set number of template clusters to detect
  3. 3 Run analysis (fetches HTML, extracts features, clusters)
  4. 4 Review cluster assignments (Type 0, Type 1, etc.)
  5. 5 Download CSV with original data plus Cluster and Page Type columns

Let's work together

Monthly retainers or one-off projects. No lengthy reports that sit in a drawer.

Let's Talk