Policy

Cloudflare Flips the Default on AI Crawlers

The content delivery giant will block mixed-use bots on ad-supported pages starting September, pushing search engines to separate indexing from model training.

Arjun S. Mehta

Staff Writer · Singapore

Jul 3, 2026

5 min read

Cloudflare Flips the Default on AI CrawlersCredit: Photo: Samuel Boivin / Shutterstock

The New Baseline

Cloudflare is reversing the burden of proof for AI web crawlers. Starting this September, the platform will automatically block bots that simultaneously index sites for search engines and harvest data for AI training or agent use - unless those crawlers give publishers granular control. The shift marks a departure from the company's earlier opt-in filtering tools, which let customers choose whether to block AI scrapers. Now, protection becomes the default state for new accounts and any fresh sites added by existing subscribers.

The policy applies specifically to pages that carry advertising. Cloudflare's logic is straightforward: if a publisher monetizes through ads, the value exchange depends on human eyeballs or at least identifiable bot traffic that respects commercial terms. Mixed-use crawlers that pull content for model training or answer synthesis without compensation break that exchange. Free-tier users will also inherit the new defaults unless they explicitly opt out before the September 15 cutoff.

Matthew Prince, Cloudflare's co-founder and chief executive, framed the change as a response to a fundamental shift in internet composition. Non-human traffic now accounts for the majority of requests across the web, and much of that volume serves AI systems rather than end users. In that environment, Prince argues, platforms must act faster to prevent a collapse of the content ecosystem. The goal is to create space for both website owners and AI companies - but only those whose bots signal clear, separable intent.

Pay Per Use, Not Pay Per Crawl

Alongside the default-blocking policy, Cloudflare is evolving its commercial framework for AI access. The company introduced Pay Per Crawl in 2025, a system that let publishers block AI bots unless the companies behind them paid for scraping rights. The new iteration, called Pay Per Use, shifts the revenue trigger from the act of crawling to the downstream use of content. Publishers will now receive payment when their material appears in answers generated by AI chatbots, not merely when a bot visits a page.

At launch, Cloudflare has partnerships with Ceramic.AI and You.com, two smaller players in the AI search and answer space. The model hinges on broader adoption by larger language model providers, and Cloudflare's announcement suggests the company expects more firms to join as publishers enable the feature. The shift from crawl-based to usage-based compensation aligns incentives more closely with the actual value extracted: a page visited but never cited generates no reader engagement and, under the new system, no payment.

For publishers, the appeal is twofold. They gain visibility into which AI systems are surfacing their content, and they secure a potential revenue stream that reflects real utility rather than speculative scraping. For AI companies willing to play by these rules, the benefit is reputational and operational - clearer relationships with content owners and reduced risk of legal or platform-level blocking.

The Google Problem

Cloudflare's announcement does not name Google explicitly, but the targeting is transparent. The company notes that the largest search engine enjoys access to roughly twice the information available to leading AI firms, a disparity rooted in the structure of its crawler. Googlebot serves dual purposes: it indexes the web for traditional search and simultaneously gathers training data for Gemini and powers features like AI Overviews and AI Mode.

Google does offer an alternative crawler called Google-Extended, which is limited to traditional search indexing and excludes AI training. But the split is incomplete. Publishers who want their content to appear in AI Mode results - where Google's assistant answers user queries directly - have no way to opt out of model training while opting in to answer synthesis. The choice is binary: allow Googlebot full access or lose discoverability in both search and AI features.

This asymmetry gives Google leverage that smaller AI companies lack. A publisher blocking Googlebot risks vanishing from the world's dominant search engine, a cost few can afford. Cloudflare's default-blocking stance is designed to force a separation: search indexing on one hand, AI training and agent use on the other. If mixed-use crawlers refuse to split their functions, they face exclusion from ad-supported pages across Cloudflare's network - a substantial portion of the web.

The move also reflects a broader tension in the AI economy. Training data has become a scarce and contested resource, with publishers, artists, and platforms increasingly asserting control over content that was once treated as freely scrapeable. Cloudflare is positioning itself as an infrastructure layer that can enforce those assertions at scale, turning policy into code.

What Happens Next

The September deadline creates a forcing function for AI companies and search engines. Those that rely on mixed-use crawlers must either split their bots into separate agents with distinct purposes or negotiate commercial terms with publishers through Cloudflare's Pay Per Use framework. Refusal to do either will result in exclusion from a growing share of the web, at least on pages that carry advertising.

For Cloudflare, the policy is both a service to customers and a strategic bet. The company is wagering that publishers want more control and that AI firms will eventually accept usage-based compensation as the norm. If enough large platforms adopt similar defaults, the economics of AI training and deployment will shift. If not, Cloudflare risks fragmenting the web into zones of access and exclusion, with unpredictable consequences for discoverability and innovation.

The policy also raises questions about enforcement and verification. How will Cloudflare determine whether a crawler is mixed-use? How will it track whether content appears in downstream AI answers? The company has not published technical details, but the success of Pay Per Use will depend on transparent auditing and reliable attribution. Without those mechanisms, the system risks becoming another layer of friction rather than a genuine rebalancing.

At DailyTechWire, we've watched similar debates play out across media, music, and software. The pattern is familiar: new technology enables extraction at scale, incumbents resist, and platforms broker uneasy truces. Cloudflare's intervention is notable because it operates at the infrastructure level, not the application layer. The company is not a publisher or an AI firm; it is the plumbing. When the plumbing starts making policy, the stakes extend beyond any single industry.

Beijing Pushes Domestic Chip Toolmakers to Merge as Export Controls Tighten

Wei Zhang · 4 min

Policy

FCC Tightens Equipment Ban on Huawei and ZTE Legacy Hardware

Arjun S. Mehta · 4 min

Policy

Homeland Security Intelligence Platform Breached as Federal Cybersecurity Lapses Mount

Daniel R. Whitfield · 5 min

Spot something wrong? Email corrections@dailytechwire.com. We log every correction publicly.