Render dynamic sites and extract content

Playwright_Web_Navigatorskillsetup L20
tribeti/InkMD-Assets
What it does

Extract and render dynamic JavaScript-heavy websites using headless browser

Best for

Extracting content from React/Vue/Angular single-page applications where static HTTP requests return empty or skeleton HTML

Inputs
  • · target URL
  • · CSS selector (optional, defaults to body)
  • · wait_for milliseconds (optional, for JS hydration)
  • · action type: scrape, screenshot, or pdf
Outputs
  • · extracted text (scrape action)
  • · base64 screenshot (screenshot action)
  • · PDF bytes (pdf action)
Requires
  • · playwright >=1.39.0
  • · html2text >=2020.1.16
Preconditions

Target URL is publicly accessible; JavaScript execution enabled; selector matches existing DOM elements

Failure modes

Selector mismatch returns empty content; page load timeout on slow sites; memory exhaustion on large SPAs; cookies/auth required but not provided

Trust signals
  • · Explicit import pattern (from scripts import browser_utils) shown
  • · Three action types (scrape, screenshot, pdf) documented with use cases
  • · Token-saving guidance: prefer specific selectors over full body scrape
  • · Privacy/safety warning against illegal content navigation