Toolkit Marketplace
Discover curated collections of AI tools and workflows
Use LlamaParse to turn messy PDFs into clean text, then use GPT-4o to output strict invoice JSON. Log results to Google Sheets, archive originals in Google Drive, and send a review ping to Telegram.
This SOP extracts product data from web pages without relying on brittle XPath/CSS selectors by using screenshots as the primary source of truth. You keep a URL list in Google Sheets, capture a full-page screenshot with ScrapingBee, then ask Gemini (Gemini 1.5 Pro) to read the screenshot and return strict JSON.\n\nIf a page is ambiguous (dynamic price blocks, small text, variants), you fall back to HTML extraction and re-run the same JSON schema. To control token cost, convert HTML into a compact markdown representation before sending it to the model. This is built for e-commerce scraping, but it generalizes to directories, marketplaces, and SaaS pricing pages. [file:81][web:82]