Scrape

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling.

来自 GitHub查看原文 →2026-03-24

## Pre-Scrape Compliance Checklist

Before writing any scraping code:

1. **robots.txt** — 获取 `{域名}/robots.txt`, check if target 路径 is disallowed. If yes, 停止. 2. **Terms of 服务** — Check `/terms`, `/tos`, `/legal`. Explicit scraping prohibition = need 权限. 3. **Data 类型** — Public factual data (prices

相关 Skills

agent-autonomy-kit

Stop waiting for prompts. Keep working.

agent-builder

Build high-performing OpenClaw agents end-to-end. Use when you want to design a new agent (persona + operating rules) and generate the required OpenClaw workspace files (SOUL.md, IDENTITY.md, AGENTS.md, USER.md, HEARTBEAT.md, optional MEMORY.md + memory/YYYY-MM-DD.md). Also use to iterate on an existing agent’s behavior, guardrails, autonomy model, heartbeat plan, and skill roster.

agent-content-pipeline

Safe content workflow (drafts/reviewed/revised/approved/posted) with human-in-the-loop approval, plus CLI to list/move/review and post to LinkedIn/X. Use when setting up a content pipeline, drafting content, managing review threads, or posting approved content.