Dev Tools

Web Scraping That
Doesn't Break Silently

Headless Playwright automation with pagination, selector fallbacks, deduplication, and canary detection. Point it at a page, get structured data back.

Get the Agent See How It Works
The Problem
Scrapers fail quietly.
A site changes its markup, your scraper returns zero results, and nobody notices until the downstream system breaks.

Empty results, no errors

Most scrapers treat "page loaded but nothing matched" as success. They return an empty array and move on. Your pipeline keeps running on stale data for days before someone checks.

Canary detection built in

If a page loads but yields zero product cards, the scraper throws a CanaryError instead of returning empty results. You find out immediately when a site change breaks extraction, not when a customer complains.

{
  "title": "A Light in the Attic",
  "price": 51.77,
  "currency": "GBP",
  "availability": "in_stock",
  "rating": 3,
  "detailUrl": "https://..."
}
What You Get
Resilient extraction, not fragile selectors.

Selector Fallback

Tries data-testid attributes first, falls back to semantic HTML selectors automatically. Your scrape survives markup changes without code edits.

Automatic Pagination

Follows next-page links up to a configurable cap. Rate-limits between requests (default 300ms). No manual page loop needed.

Deduplication

Tracks detail URLs across all pages. Same product on page 2 and page 5? You get it once. Duplicates never reach your dataset.

Structured Output

Prices come back as numbers, not strings with currency symbols. Ratings are integers. Stock status is normalized. Clean data from the start.

Navigation Retry

Each page navigation gets one automatic retry on failure. Configurable timeouts. Transient network issues don't kill the run.

How It Works
Three lines to structured data.
1

Install Playwright

npm install playwright is the only dependency. The scraper uses Chromium headless under the hood.

2

Point at a URL

Call the scrape function with a target URL and optional config (max pages, rate limit, timeout). Works with live sites and file:// URLs for local testing.

3

Get JSON back

Structured records with typed fields. Prices are numeric, ratings are integers, URLs are absolute. Pipe it straight into your database or API.

Pricing
Scraping that tells you when it breaks.

One-time purchase. Full source code.

Solo

$1,500
one-time
  • Full Playwright scraper source
  • Selector fallback system
  • Canary detection
  • Pagination + dedup
  • Offline test fixtures
  • Commercial license (single user)
Get Started

What do you need scraped?

Send us the target URL and the fields you need. We'll tell you what's feasible and how long it takes.

Get in Touch