Why does the bigrams download only contain 10 rows?

The code keeps only the 10 most common word pairs (Counter.most_common(10)) from the combined page content, so the bigrams CSV is a top-10 shortlist, not an exhaustive frequency table. Only two-word bigrams are generated; there is no setting for longer n-grams.

How many API credits does one analysis use?

One ValueSERP search per submit. The pages slider is multiplied by 10 and sent as the num parameter on a single request (page=1), so selecting 3 pages fetches the top 30 results in one call rather than making three requests. ValueSERP gives 100 free searches on signup, per the app's own notes.

Why do fewer URLs appear in the results than in the SERP?

Pages that fail to download or extract (blocked bots, 403s, timeouts) are silently skipped; there is no error report. The remaining pages are extracted with Trafilatura with its extraction timeout disabled, so a very heavy page can also make the run feel stuck rather than fail.

How are the title keywords counted?

All fetched page titles are concatenated, special characters are replaced with spaces, and single words appearing more than once are kept. Titles are not lowercased first, so the count is case-sensitive: Welding and welding are tallied separately. Page content, by contrast, is lowercased and has NLTK English stopwords removed before bigram counting.

Which locations and devices can I search from?

Eight preset regions (United Kingdom, United States, Australia, France, Canada, Germany, Italy, Spain) with ValueSERP's location_auto enabled, and three device types: Desktop, Mobile or Tablet.

Back to Tools

SERP N-gram Extractor

Use cases

Content gap analysis Page title optimisation Understanding SERP content patterns Competitive content research

Fetches SERP results via ValueSERP API and extracts page content using Trafilatura (unlimited timeout).

Generates bigrams via custom find_ngrams() using zip iteration.

Uses NLTK English stopwords filtering and Collections.Counter for frequency analysis.

Normalises text with special character removal and lowercase conversion.

Platform

Python script (requires Python 3.x)

Input

ValueSERP API key

Target search keyword

Geographic location

Device type (Desktop, Mobile, Tablet)

Output

Three CSVs: top 10 content bigrams with frequency counts, title keywords (frequency > 1), SERP titles with URLs.

View Source

Features

ValueSERP API integration (one search request per run)
Trafilatura content extraction with timeout disabled
Bigram (two-word) generation from combined page content
NLTK English stopwords filtering
Top 10 bigrams by frequency exported to CSV
8 preset search regions; Desktop, Mobile or Tablet device

How to use

1 Enter your ValueSERP API key
2 Input target keyword and select a region
3 Choose device type and how many top results to fetch (10-100)
4 Click Submit to fetch the SERP in a single API request
5 Trafilatura extracts text from each ranking page
6 Review bigrams, title keywords and extracted titles
7 Download the three CSV files for content planning

Frequently asked questions

Why does the bigrams download only contain 10 rows?: The code keeps only the 10 most common word pairs (Counter.most_common(10)) from the combined page content, so the bigrams CSV is a top-10 shortlist, not an exhaustive frequency table. Only two-word bigrams are generated; there is no setting for longer n-grams.
How many API credits does one analysis use?: One ValueSERP search per submit. The pages slider is multiplied by 10 and sent as the num parameter on a single request (page=1), so selecting 3 pages fetches the top 30 results in one call rather than making three requests. ValueSERP gives 100 free searches on signup, per the app's own notes.
Why do fewer URLs appear in the results than in the SERP?: Pages that fail to download or extract (blocked bots, 403s, timeouts) are silently skipped; there is no error report. The remaining pages are extracted with Trafilatura with its extraction timeout disabled, so a very heavy page can also make the run feel stuck rather than fail.
How are the title keywords counted?: All fetched page titles are concatenated, special characters are replaced with spaces, and single words appearing more than once are kept. Titles are not lowercased first, so the count is case-sensitive: Welding and welding are tallied separately. Page content, by contrast, is lowercased and has NLTK English stopwords removed before bigram counting.
Which locations and devices can I search from?: Eight preset regions (United Kingdom, United States, Australia, France, Canada, Germany, Italy, Spain) with ValueSERP's location_auto enabled, and three device types: Desktop, Mobile or Tablet.

Want me to run this for you?

I run this tool as a managed service, or build something custom around your data. You get the insights without touching the code.

Book a Call

Content Duplication FinderAppContent

Find duplicate and near-duplicate content using TF-IDF similarity matching.

Content ExtractorAppContent

Extract main text content and H1 headings from URLs.

Reading Score AnalyserAppContent

Analyse content readability from sitemaps or URLs using Flesch scores.

Page Intent ClassifierAppContent

Use OpenAI to classify page intent and expected user actions.

Review Sentiment ExtractorAppContent

Use OpenAI to extract positive and negative sentiments from product reviews.

Open Graph PreviewBrowserContent

Preview how your page appears when shared on social media.

Need something built for your business?

This tool started as bespoke client work. I build custom scripts, data pipelines, and full apps for SEO and product data problems that off-the-shelf tools don't solve.