NexGenData

Posted on Jun 20 • Originally published at thenextgennexus.com

How SEO Agencies Audit 1,000+ Client Pages a Week (Without Paying ScreamingFrog)

#webscraping #seo #api #opensource

If you run a boutique SEO agency with 8-15 retainer clients, your monthly reporting flow probably looks like this: pull GSC data on the first of the month, throw the URL list into ScreamingFrog SEO Spider, wait 90 minutes for the crawl to finish on your laptop, copy the Lighthouse scores into a Google Sheet, then spend Sunday night formatting the deliverable for Monday's client call. The crawl itself eats one machine for half a workday because Lighthouse audits are CPU-bound and ScreamingFrog runs them sequentially in a single thread on the desktop license. The Cloud version helps but the SEO Spider Lighthouse integration still pulls scores from the PageSpeed Insights API one URL at a time with no parallelism above 5 concurrent threads.

ScreamingFrog SEO Spider is a phenomenal tool. The desktop license is GBP 199 per year (about USD 259), which is essentially free for the value it delivers on technical SEO crawls. But it was built as a desktop crawler in 2010, and bulk Lighthouse auditing was bolted on in v12. For agencies that need to run page-speed audits weekly across hundreds or thousands of client URLs, that architecture starts to bite: you cannot schedule unattended crawls on the desktop tier, the Lighthouse API integration is sequential, and the output is a CSV that you still have to wrangle into a per-client dashboard.

This post walks through how a 4-person SEO agency we work with replaced ScreamingFrog's Lighthouse module with a scheduled cloud-based bulk audit pipeline that processes 1,400 URLs across 11 client accounts every Monday morning, hands the results off to Looker Studio, and costs roughly USD 4 per week to run. The crawler stays in ScreamingFrog where it belongs. The Lighthouse audits move to a serverless bulk runner. We will cover the architecture, the cost math, the exact API calls, and the per-client dashboard build.

Why bulk Lighthouse needs its own pipeline

The Google PageSpeed Insights API is free up to 25,000 queries per day per Google Cloud project, with a default quota of 240 queries per minute. A single Lighthouse audit takes 4-12 seconds depending on the page weight and target device. Run 1,400 URLs sequentially and you are waiting 2-4 hours wall-clock. Run them at the API's published rate ceiling (240 RPM) and you are done in under 6 minutes, but you have to build:

A queue and worker pool that respects the per-minute quota
Retry logic for the inevitable 500s and timeouts (PSI throws roughly 3-5 percent transient errors at peak load)
Mobile and desktop emulation passes per URL (most clients want both strategies stored)
A schema for storing 30 fields per audit (5 category scores, 6 Core Web Vitals metrics, plus diagnostic opportunities)
Per-client tagging so you can slice the dataset in your reporting tool
Day-over-day and week-over-week regression detection so you flag changes before the client does
A scheduler that fires every Monday at 5am local time without a human present

Building this in-house is a 3-week sprint. Buying SpeedCurve or Calibre solves it but starts at USD 144 per month for the Calibre Starter (50 pages) and USD 134 per month for SpeedCurve LUX Lite (capped at 25,000 monthly checks). For 1,400 URLs audited weekly across two strategies (mobile and desktop) you are looking at roughly 11,200 audits per month, which puts you on the SpeedCurve Pro tier at USD 414 per month. That is real money for a four-person shop.

The pragmatic middle path: run Lighthouse audits via a bulk Apify actor on a weekly schedule, store the JSON output in Google Sheets or BigQuery, and build the per-client dashboard in Looker Studio. Total cost: under USD 20 per month for our reference agency.

The bulk audit actor

The NexGenData Page Speed Analyzer wraps the Google PageSpeed Insights API in an Apify actor. You give it a list of URLs, optionally a Google API key, and it returns one JSON record per URL containing the full Lighthouse report: Performance, Accessibility, Best Practices, SEO, and PWA scores, plus the Core Web Vitals (FCP, LCP, CLS, TBT, TTI). It supports mobile or desktop emulation per run, handles rate limiting and retries internally, and stores the output in an Apify dataset that you can pull via API or pipe directly into Google Sheets via the standard Apify integration.

The reason this matters for an agency: the actor is billed on Apify's pay-per-event model, which works out to roughly USD 0.05 per page audited. Compare with the alternatives:

| Tool | Pricing model | Cost for 1,400 weekly audits, 2 strategies | | --- | --- | --- | | ScreamingFrog SEO Spider (desktop) | USD 259/year flat, sequential | USD 22/month, plus 4 hours of analyst time per week | | SpeedCurve Pro | USD 414/month for 50K checks | USD 414/month | | Calibre Standard | USD 273/month for 4K pages | USD 273/month | | DebugBear Solo | USD 60/month for 25 pages | Cannot accommodate 1,400 URLs | | Lighthouse CI on self-hosted EC2 | t3.medium plus storage | USD 45/month plus DevOps time | | Page Speed Analyzer actor | USD 0.05 per page | USD 14/month for 11,200 audits |

The actor route also frees up the analyst's Monday morning, which on the agency's USD 90/hour blended rate is the real saving.

End-to-end architecture

Here is the pipeline we deployed for the reference agency. It runs every Monday at 05:00 UTC and finishes before the team logs in.


    [Per-client URL lists in Google Sheet]
                |
                v
    [Apify scheduled actor run, weekly]
                |
                +--> Mobile audit pass
                +--> Desktop audit pass
                |
                v
    [Apify dataset, JSON]
                |
                v
    [Google Sheets connector] -> [BigQuery] -> [Looker Studio per-client report]
                |
                v
    [Slack regression alert if perf score drops > 5 points WoW]

The URL list lives in a single Google Sheet with two columns: client_id and url. Each agency analyst can edit their client's tab; the master tab is a QUERY() formula that consolidates them. New URLs flow into the pipeline within one week of being added to the sheet. No engineering ticket required.

Step 1: Get a Google API key

The actor works without an API key but caps you at 25 queries per day on the shared quota, which is not enough for production use. Spin up a free Google Cloud project (no billing required for PSI), enable the PageSpeed Insights API, and create an unrestricted API key. Quota is 25,000 queries per day per project, which covers a 12,500-URL agency comfortably. If you outgrow that, create a second GCP project and rotate keys; the actor accepts a comma-separated key list and round-robins across them.


    # Quick check that your key works
    curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://example.com&strategy;=mobile&key;=YOUR_API_KEY" \
      | jq '.lighthouseResult.categories.performance.score'

If that returns a number between 0 and 1, you are good.

Step 2: Configure the actor

The actor input is straightforward JSON. For the agency use case, you want one run per client per strategy so the dataset is clean to slice. Here is the input for a single client run:


    {
      "urls": [
        "https://acmeclient.com/",
        "https://acmeclient.com/pricing",
        "https://acmeclient.com/blog/post-1",
        "https://acmeclient.com/blog/post-2"
      ],
      "strategy": "mobile",
      "apiKey": "AIzaSy...",
      "categories": ["performance", "accessibility", "best-practices", "seo"],
      "extendedTimeout": true,
      "metadata": {
        "client_id": "acme",
        "audit_date": "2026-05-11",
        "report_period": "2026-W19"
      }
    }

The metadata block is non-standard input that the actor passes through to every output record, which is how you tag results per client without joining tables later. The extendedTimeout flag matters for slower client sites: PSI's default timeout is 60 seconds, but heavy WordPress sites with bloated themes routinely hit 90-120 seconds on first paint, and you do not want those to drop out as null.

Step 3: Schedule it

In Apify Console, create a Scheduled Task that runs nexgendata/page-speed-analyzer every Monday at 05:00 UTC. You can either hardcode the URL list in the task input or use a small "loader" actor that pulls the list from your Google Sheet, splits it per client, and triggers one child actor run per client. The latter is cleaner for multi-client agencies because each run produces an isolated dataset and one failure does not poison the batch.

A minimal loader script in Node looks like this:


    import { Actor } from 'apify';
    import { google } from 'googleapis';

    await Actor.init();

    const auth = new google.auth.GoogleAuth({
      scopes: ['https://www.googleapis.com/auth/spreadsheets.readonly'],
    });
    const sheets = google.sheets({ version: 'v4', auth });

    const { data } = await sheets.spreadsheets.values.get({
      spreadsheetId: process.env.URL_SHEET_ID,
      range: 'Master!A2:B',
    });

    // Group URLs by client_id
    const byClient = {};
    for (const [clientId, url] of data.values) {
      (byClient[clientId] ||= []).push(url);
    }

    // Fire one actor run per client per strategy
    const today = new Date().toISOString().slice(0, 10);
    for (const [clientId, urls] of Object.entries(byClient)) {
      for (const strategy of ['mobile', 'desktop']) {
        await Actor.call('nexgendata/page-speed-analyzer', {
          urls,
          strategy,
          apiKey: process.env.PSI_KEY,
          categories: ['performance', 'accessibility', 'best-practices', 'seo'],
          extendedTimeout: true,
          metadata: { client_id: clientId, audit_date: today, strategy },
        });
      }
    }

    await Actor.exit();

This fires roughly 22 runs every Monday morning (11 clients times 2 strategies) and they execute in parallel on Apify's compute fleet. Total wall-clock time for the agency's full audit: 9-14 minutes.

Step 4: Land the data in BigQuery

For an agency at 11 clients, Google Sheets is honestly fine as a backing store. The Apify-to-Sheets integration appends new rows automatically and Looker Studio reads Sheets natively. But if you want week-over-week trend analysis with proper SQL, push to BigQuery.

The cleanest pattern is to use the Apify webhook on ACTOR.RUN.SUCCEEDED to fire a small Cloud Function that pulls the dataset and inserts into BigQuery. The schema you want:


    CREATE TABLE seo_audits.lighthouse_runs (
      audit_date DATE,
      client_id STRING,
      url STRING,
      strategy STRING,
      performance_score INT64,
      accessibility_score INT64,
      best_practices_score INT64,
      seo_score INT64,
      fcp_ms INT64,
      lcp_ms INT64,
      cls FLOAT64,
      tbt_ms INT64,
      tti_ms INT64,
      speed_index INT64,
      apify_run_id STRING,
      inserted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
    )
    PARTITION BY audit_date
    CLUSTER BY client_id, strategy;

Partitioning by audit_date keeps query costs low; clustering by client_id makes per-client filters cheap. A year of weekly audits for an 11-client agency at 1,400 URLs occupies under 50 MB.

Step 5: Build the per-client dashboard

In Looker Studio, create a parameter client_id and bind it to the client filter. Build the dashboard once with five sections:

Headline scorecard — current week's average performance, accessibility, best practices, SEO score, with delta vs. last week
Core Web Vitals distribution — histogram of LCP, CLS, INP across all client URLs, color-coded green/amber/red against Google's published thresholds (LCP < 2.5s good, < 4s needs improvement; CLS < 0.1 good, < 0.25 needs improvement)
Worst offenders table — bottom 10 URLs by performance score this week, sortable
Week-over-week regressions — URLs where performance score dropped more than 5 points or LCP increased more than 500ms
Trend lines — 12-week trend of average performance per category

Each client gets a clone of the dashboard with their client_id pinned. The whole build is a half-day exercise once the BigQuery schema is loaded with two weeks of historical data.

Step 6: Wire regression alerts

You do not need a fancy alerting platform. A scheduled BigQuery query with a Slack webhook covers it. Run this every Monday at 06:00 UTC, after the audit pipeline finishes:


    WITH this_week AS (
      SELECT client_id, url, strategy,
             performance_score AS perf_now,
             lcp_ms AS lcp_now
      FROM seo_audits.lighthouse_runs
      WHERE audit_date = CURRENT_DATE()
    ),
    last_week AS (
      SELECT client_id, url, strategy,
             performance_score AS perf_prev,
             lcp_ms AS lcp_prev
      FROM seo_audits.lighthouse_runs
      WHERE audit_date = DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
    )
    SELECT t.client_id, t.url, t.strategy,
           t.perf_now, l.perf_prev,
           (t.perf_now - l.perf_prev) AS perf_delta,
           t.lcp_now, l.lcp_prev,
           (t.lcp_now - l.lcp_prev) AS lcp_delta_ms
    FROM this_week t
    JOIN last_week l USING (client_id, url, strategy)
    WHERE t.perf_now - l.perf_prev <= -5
       OR t.lcp_now - l.lcp_prev >= 500
    ORDER BY t.client_id, perf_delta ASC;

Pipe the result rows into a Slack channel via a Cloud Function or Zapier. Limit to one message per client (concatenate the regressions into a single threaded message) so the channel stays usable.

Where ScreamingFrog still wins

To be fair to ScreamingFrog: the bulk Lighthouse pipeline does not replace SEO Spider for everything. Keep ScreamingFrog for:

Crawling — discovering URLs by following links from a seed page. The actor needs you to supply the URL list. Use SEO Spider's crawl, export the URL list, push it into the sheet, and let the actor handle audits.
On-page SEO checks — title tag length, H1 duplication, missing meta descriptions, internal linking analysis. Lighthouse covers a small subset under the SEO category but ScreamingFrog is far deeper.
Structured data validation — JSON-LD parsing and schema.org compliance.
Redirect chain analysis and broken link detection.

The mental model: ScreamingFrog is your crawler and on-page auditor; the bulk Lighthouse pipeline is your performance and Core Web Vitals monitor. They are complementary, not competing.

What changes in your monthly client deck

Here is the practical payoff. Before this pipeline, the agency's monthly client report had a single Lighthouse score per page, taken on the first of the month. The client could (and did) ask: "but it was 78 on the 15th when I checked, why is it 64 in your report?" Now the report shows a 4-week trend line per page, the score from each Monday audit, and a list of regressions caught and fixed within 7 days. The conversation shifts from "is your data right?" to "what do we do about the regression on /pricing that hit on April 22?"

That conversation is where retainers get renewed.

Cost recap for the reference agency

11 clients, average 127 URLs each = 1,397 URLs
Mobile and desktop strategies = 2,794 audits per week
4.3 weeks per month = 12,014 audits per month
At USD 0.05 per audit = USD 600 per month at full retail
In practice, with a Google API key shouldering the PSI cost and Apify only charging for compute time, the real bill comes in around USD 14-22 per month
BigQuery storage and query costs: under USD 1 per month at this volume
Looker Studio: free
Total: under USD 25 per month, replacing roughly USD 414 of SpeedCurve plus 4 hours of analyst time per week

For an agency billing USD 90/hour blended, that is USD 1,500 per month of opportunity cost recovered, and a much better client deliverable.

Get started

The Page Speed Analyzer is at apify.com/nexgendata/page-speed-analyzer. Spin up a free Apify account, drop in your URL list, run a one-off audit to see the output schema, and then wire the schedule. The whole pipeline above is two days of work for a competent agency dev or a 3-day engagement with a freelancer.

If your agency stack also needs domain monitoring, deliverability checks, or content scraping for competitive analysis, the same NexGenData library has companion actors:

DNS Propagation Checker for client domain migrations and DNS audits
DMARC Bulk Auditor for email deliverability audits as part of technical SEO retainers
Website Content Crawler for competitor content gap analysis at scale

Browse the full set of NexGenData actors at apify.com/nexgendata. The same pay-per-event pricing applies across the catalog, which means you can stand up a full agency tooling stack for less than the cost of a single SpeedCurve seat.

The economics of small SEO agencies have shifted: tooling that used to require enterprise budgets is now a few cents per audit on commodity cloud infrastructure. The agencies that win the next 24 months will be the ones who notice.

DEV Community