Proof News has published a new audit showing that major tech companies such as Apple, Nvidia, Anthropic, and Salesforce used subtitle data from 173,536 YouTube videos to train their artificial intelligence (AI) tools.
The companies plan to use the “Youtube Subtitles” data collection, created by EleutherAI; it contains transcripts from news channels such as Khan Academy, MIT, Harvard, The Wall Street Journal, NPR and BBC, as well as entertainment channels such as The Late Show with Stephen Colbert, Last Week Tonight with John Oliver and Jimmy Kimmel Live.
The data collection also contains subtitles for videos belonging to big YouTube stars such as MrBeast, Swedish PewDiePie, and Jacksepticeye. According to Youtube’s rules, companies are not allowed to harvest material from the platform without permission.