Cloudflare Unveils New Tool to Fight AI Bots
Cloudflare has released a new free tool that stops AI bots from scraping website data to train their models. The tool improves bot spotting by looking at how AI bots behave and gives users the option to report bots manually.
Publicly traded cloud service provider Cloudflare has released a new, free tool to stop bots from collecting data from websites stored on its platform in order to train AI models.
Apple, Google, and OpenAI are some AI companies that let website owners block the bots they use to scrape data and train models. Bots can only see certain pages on a website, and website owners can change the robots.txt file to block those pages. But Cloudflare says in a post about its bot-fighting tool that not all AI scrapers follow this rule.
The Cloudflare writes. We’re worried that some AI companies that want to get around rules to access content will keep changing to avoid being found by bots.
To try to fix the issue, Cloudflare looked at data from AI bots and crawlers to improve models that automatically find bots. The models look at things like whether an AI bot might be trying to avoid being caught by looking and acting like a web browser user, among other things.
Over time, Cloudflare will continue to manually ban AI bots. It has set up a form for hosts to report suspected AI bots and crawlers. As the demand for model training data rises because of the generative AI boom, the issue of AI bots has become much more clear.
Concerns and Statistics
Many websites are afraid that AI companies will use their content to train models without paying or telling them, so they have blocked AI scrapers and bots. One study found that about 26% of the top 1,000 websites have stopped OpenAI’s bot. Another study found that more than 600 news groups had blocked the bot.
But blocking isn’t always a good way to stay safe. As we already said, it looks like some sellers are ignoring normal rules about keeping bots out in order to get ahead in the AI race.
Recently, the AI search engine Perplexity was accused of pretending to be real visitors to websites in order to scrape information from them. OpenAI and Anthropic are also said to have broken robots.txt rules at times.
It might be helpful to have tools like Cloudflare’s, but only if they can find hidden AI bots. On top of that, they won’t fix the harder problem of authors losing referral traffic from AI tools like Google’s AI Overviews because sites that block certain AI crawlers aren’t included.