One big source of traffic, it noted, is AI crawlers, which are increasingly under scrutiny as they scan the web and gobble up voluminous amounts of data to train large language models (LLMs). A big concern is that some take data even when they’re not supposed to, as opposed to “verified” good bots that typically come from search engines and are transparent about who they are (such as GoogleBot, GPTBot, Qualys, and BingBot).
Cloudflare tracks AI bot traffic to determine which are the most aggressive, which have the highest volume of requests, and which perform crawls on a regular basis. Researchers found that “facebookexternalhit” accounted for the most traffic throughout the year (27.16%) — the bot is notorious for creating excessive traffic — followed by Bytespider (from TikTok owner ByteDance) at 23.35%, Amazonbot (13.34%), Anthropic’s ClaudeBot (8.06%), and GPTBot (5.60%).
Interestingly, Bytespider traffic gradually declined over the year, ending roughly 80% to 85% lower than at the start of the year, while Anthropic’s ClaudeBot traffic saw a spike in the middle of the year, then flattened out. GPTBot traffic, for its part, remained pretty consistent throughout 2024.