|
Navigation
Search
|
While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data
Saturday November 15, 2025. 10:22 PM , from Slashdot
Early in March 2025, I noticed that a web crawler with a user agent string of meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) was hitting my blog's machine at an unreasonable rate. I followed the URL and discovered this is what Meta uses to gather premium, human-generated content to train its LLMs. I found the rate of requests to be annoying. I already have a PHP program that creates the illusion of an infinite website. I decided to answer any HTTP request that had 'meta-externalagent' in its user agent string with the contents of a bork.php generated file... This worked brilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and 31, 2025... After about 3 months, I got scared that Meta's insatiable consumption of Super Great Pages about condiments, underwear and circa 2010 C-List celebs would start costing me money. So I switched to giving 'meta-externalagent' a 404 status code. I decided to see how long it would take one of the highest valued companies in the world to decide to go away. The answer is 5 months. Read more of this story at Slashdot.
https://tech.slashdot.org/story/25/11/15/2023242/while-meta-crawls-the-web-for-ai-training-data-bruc...
Related News |
25 sources
Current Date
Dec, Wed 10 - 05:44 CET
|







