v0.8.0 — Local regex extraction, Android swipe UI, improved GwSS

WebScraper Pro

The most powerful Firefox extension for web scraping. Regex-based data extraction, interactive graph visualization, batch queuing, and one-click HuggingFace upload.

Download v0.8.0 View Source
58+
CLI Commands
9
Export Formats
7
Popup Tabs
5
Data Types

Everything You Need to Scrape the Web

From simple text extraction to AI-powered structured data mining, WebScraper Pro handles it all.

📄

Smart Text Extraction

Multi-strategy extraction from SPAs, shadow DOM, web components, JSON-LD, microdata, and dynamically-loaded content.

🎥

Video & Media Scraping

Extract video sources, embeds, posters, tracks, and subtitles. YouTube filtering toggle, audio capture, and image download.

🤖

AI-Powered Extraction

NuExtract-2.0-2B for structured data. 9 built-in templates, custom JSON schemas, local regex fallback, batch processing.

📸

Screenshot Extract (ASE)

Take screenshots, OCR with Tesseract, auto-next page. Works on KDE, Hyprland, GNOME, and Cinnamon desktops.

📊

GwSS Visualization

Interactive force-directed graph of all scraped domains with live physics, unique composite edge patterns, favicons, and SSDg diagrams.

🚀

HuggingFace Upload

One-click upload with automatic JSONL sharding, MLA/APA citations, version-aware README, and community dataset support.

Batch Queue & Auto-Scan

Queue multiple URLs for background scraping. Auto-scan crawls pages with configurable rate limiting and domain filtering.

🛡

Privacy & Security

XSS sanitization, robots.txt compliance, PII/API key/slur filtering with configurable redaction modes.

💻

Python CLI (58+ Commands)

Full CLI for data management, AI model serving, Parquet export, screenshot OCR, HuggingFace ops, and system diagnostics.

Up and Running in Minutes

Install the extension and start scraping in four easy steps.

1

Download

Get the latest .xpi from GitHub Releases

2

Install

Open about:addons in Firefox, gear icon, "Install Add-on From File..."

3

Configure

Set up HuggingFace token, data format, and scraping preferences in Settings

4

Scrape

Click the popup, select a mode (Full Page, Auto-Scan, Queue), and start collecting data

Export in Any Format

Your data, your format. Export to whatever works best for your workflow.

JSONL
JSON
CSV
Parquet
XML
Markdown
PNG / WebP / JPEG
BMP / SVG
WAV (audio)

Powerful CLI Companion

58+ commands for data management, AI serving, exporting, and system diagnostics.

Key Commands

  • Export to JSONL, JSON, CSV, Parquet, XML, Markdown
  • AI extraction server with GPU/CPU auto-detect
  • Screenshot OCR extraction (Linux)
  • HuggingFace upload with auto-sharding
  • System diagnostics and environment check
  • Session management and data merging
  • Batch URL processing and rate limiting
  • One-command install and update
# Install the CLI
python install.py

# Export data in various formats
scrape export jsonl
scrape export parquet --compression zstd

# AI extraction server
scrape ai.serve --gpu
scrape ai.setup

# Screenshot OCR (Linux)
scrape ai.screenshot --pages 5

# HuggingFace operations
scrape hf.push
scrape hf.status

# System info
scrape env
scrape doctor

Keyboard Shortcuts

Fast access to all scraping modes without opening the popup.

Select Area Ctrl+Shift+S
Full Page Ctrl+Shift+P
Scroll & Scrape Ctrl+Shift+L
Auto-Scan Ctrl+Shift+A
Stop All Ctrl+Shift+X