unweb

10 Weeks of UnWeb in Production: What We Didn't Expect

We launched UnWeb in March — an API that converts messy web pages into clean, LLM-ready Markdown. We had a clear hypothesis: developers building AI pipelines are wasting context window budget on HTML noise, and they’ll pay to fix that. Ten weeks and a few hundred API keys later, here’s what we actually learned. The Use Cases We Didn’t Build For We designed UnWeb for RAG pipelines — the classic “fetch a URL, get clean content, embed it, done” workflow.

developers

What HTML Does to Your LLM Context Window (And What to Do About It)

Most LLM pipelines have a data quality problem that nobody talks about at conferences. You’re fetching web content — documentation, knowledge base articles, competitor pages, product data — and feeding it directly into your AI pipeline. The content looks fine in your browser. But what your model actually receives is something else entirely. It’s a context window full of <div class="wrapper"><div class="inner"><div class="content">, navigation menus, cookie banners, JavaScript snippets, tracking pixels, and somewhere in the middle, the three paragraphs of actual content you wanted.

developers