Data Tools
Explore the best new Data tools and products curated by the community.
Forsy captures workflow data from the agents you already use (OpenClaw, Claude, Codex, Hermes, etc.) and turns it into sellable structured data. It creates a marketplace for authentic, high-fidelity workflow data with licensing and privacy built in. Forsy is building the infrastructure for a new agent data economy, where real agent workflows become training data for RL and more capable future agents.
HasData is the managed web scraping service for data pipelines and AI agents. Send any URL, get clean JSON or Markdown back in one API call. We handle proxies, browser rendering, retries, and anti-bot. 50+ ready scrapers cover Google Search, Maps, News, Zillow, Indeed, and major e-commerce. AI extraction handles any other URL from a plain-text prompt. Use it from Claude, ChatGPT, or your own AI agent via MCP. CLI for everything else.
PHBench: the first public benchmark predicting Series A funding from Product Hunt launch signals. We analyzed 67,292 featured launches over 7 years, linked to 528 verified Series A rounds via Crunchbase. Champion model: 4.7x lift over random. Team size × community engagement is the strongest signal; B2B (API, Payments, Fintech) converts at 3x baseline; Rank #1 raises at 2.2x unranked. Dataset, code, and baselines open. Submit at phbench.com and subscribe for weekly high-probability launches.
Phone browser to command line, Apple to Android, any device to any device. If it has a browser or a CLI, it works. P2P, end-to-end encrypted. No cookies, no logins, free.
RNDA is a data protocol where raw input is encoded to 256 bytes and permanently discarded. Not encrypted — gone. The data can't be breached because it doesn't exist. Proven across 31 data types: genomics (140,835x), quantum circuits on IBM hardware (351,939x), medical imaging, AV sensors, oil & gas. SSL made unencrypted traffic obsolete. JWT made session storage obsolete. RNDA makes raw data storage obsolete. Multiple patents filed.
Basedash is now an MCP server. Connect Claude, Cursor, ChatGPT, or any MCP-compatible client and your AI agent can ask Basedash anything about your data — across every database, warehouse, and SaaS tool you've already connected to your workspace. It can pull live numbers, compare cohorts, generate charts, and dig into trends, all governed by the same access controls your team already uses. Your data analyst, inside every tool you ship in.
Seeknal is an all-in-one CLI for data & AI/ML engineering. Define pipelines in YAML or Python, run a safe draft → dry-run → apply workflow, materialize to PostgreSQL and Iceberg, and query your data in natural language. Three verbs: Organize (transform raw data, point-in-time joins, incrementals), Expose (dashboards, features, NL query), Action (insight → report → API → alert). Built for the agent world.
DecisionBox is an autonomous AI agent that writes SQL against your warehouse and ships validated findings. Enterprise runs it fully air-gapped: self-hosted LLMs via Ollama, open-source base models fine-tuned on your schema, SSO, RBAC, three-layer data governance, full audit log. Plugin architecture on an open-source AGPL v3 core — zero fork, zero outbound calls, zero bytes leave your network.
Deletion has always been a claim: "trust us, it's gone." Zombie Delete makes it a receipt. Drop a file reference, get a SHA-256 tombstone anchored on Internet Computer mainnet in under two seconds, plus a signed PDF your auditor can verify forever - even if we disappear. Built for GDPR, the EU AI Act, California DELETE Act, India DPDP, and DORA. No wallets. No tokens. No blockchain knowledge required.
Marmot is an open-source data catalog designed for teams who want powerful data discovery without enterprise complexity. Catalog every data asset, enrich it with the context that matters and make it accessible to your team and your AI tools.
Panorama analyzes your workplace data to recommend AI workflows your team can run together. Instead of building automations from scratch, discover what to automate and execute collaboratively in one place.
DataSieve helps you turn unstructured text into clean, usable data in seconds. Drop in text, files, folders, or even archives, and extract what you need in one pass. Emails, phone numbers, URLs, dates, financial data, and more. Everything runs locally on your device, with no cloud and no tracking. What you can do - Extract multiple data types at once - Process text, PDFs, EPUBs, CSV, JSON, Word files, and more - Export results to JSON, XLSX, DOCX, and more - Define your own custom extractors
Context.dev (previously Brand.dev) gives your AI agents and apps real-time access to structured web data, no brittle scraping infrastructure needed. Scrape any URL as clean markdown or HTML, extract brand data (logos, colors, fonts, socials) from any domain, crawl sitemaps, resolve transaction descriptors, and more. Typed SDKs for TypeScript, Python, and Ruby. Trusted by 5,000+ businesses including Mintlify, Daily.dev, Ferndesk.com, and more. Most teams integrate in under 10 minutes.
Most AI agents & complex automations fail because they’re operating in the dark. Boost.space provides the persistent context layer that turns siloed LLMs into an integrated business intelligence system. Give your automations & agents a "Shared Brain." so all workflows has the full context of your business—from past interactions to live database states—allowing workflows to compound instead of breaking.
PredictLeads Technographics Dataset provides structured data on what technologies companies use, sourced from company websites, job descriptions, DNS records, cookies, and more. Each detection includes first/last seen timestamps and the signals used, so you can track adoption curves, technology migrations, and competitive shifts over time. Available via API, flat files, and webhooks, with an MCP server for AI agents.
Fundable is a startup, investor, and people dataset (like a Crunchbase) with a few improvements: Surfaces new deals before other platforms, provides sources for every datapoint, allows for natural language deal alerts ("coding agent startups in SF looking to hire"), much better UI, cheaper. First month is free. Hit us up if you want access to our API, Datafeed, or MCP!
Stacksync powers real-time and bidirectional data synchronization between CRMs (e.g. Salesforce, Hubspot or SAP) and databases (e.g. Postgres, Google BigQuery,...). Edits made in your CRM will instantly update in your Database, and vice-versa. To set up a sync, users simply have to connect the two chosen apps in one click and select the tables they want to sync, no-code! Stacksync reduces implementation delays from months to minutes for CRM integration projects
Firecrawl /agent is a magic API that searches, navigates, and gathers data from even the most complex websites. Describe what data you want and agent handles the rest. Find information in hard-to-reach places, return single datapoints or entire datasets at scale.
Turn your blood work into actionable insights.
Control real web browsers with a simple API
Querri transforms how teams work with data, making it easy to connect, clean, analyze, and visualize - all in one place. With new integrations and interactive, drag-and-drop dashboards, anyone can now build automated workflows and shareable insights - no technical expertise required