Reddit wants companies pay for data that has been used to train ChatGPT

Every day 57 million souls walk through the Reddit discussion forums. The platform co-founded and directed by Steve Huffman has become a gigantic compendium of bizarre debates, but also reflections, questions and answers that have ended up making it a unique alternative to Google. That data has proven to be very valuable, and now Reddit wants to take advantage of it.

Reddit training the AIs. As revealed in The New York Times, for a few years now all those messages available on Reddit have been used to train artificial intelligence platforms such as those of Google, OpenAI or Microsoft.

Either you pay me, or nothing. The company indicated these days that it was considering starting to charge companies that want to access its API, the method through which external entities can download and process the huge amounts of conversations that, among other things, can help train communication models. artificial intelligence.

A strategic move. Reddit’s current CEO explained how “Reddit’s core data is really valuable, but we don’t need to give all that value away for free to some of the biggest companies in the world.” The company seems to be preparing for a potential IPO, and putting its API under a payment model would raise a new source of income that would add to the advertising model that now supports it.

Google used it, ChatGPT too. Those responsible for the development of Google Bard have already indicated in a study that they partially trained their model with data from Reddit. OpenAI, responsible for the development of ChatGPT, cited Reddit as one of the data sources with which its LLM model was trained.

Following in the footsteps of Twitter. Other companies have already begun to understand that the data they work with can be very valuable to these new AI models. Shutterstock reached an agreement with OpenAI for DALL-E to be trained with its image database, and in March Elon Musk precisely announced that the Twitter API would be paid for, something that was a blow to small developers but also it will force companies like OpenAI to pay if they want to train their models with the messages of this platform.

The API will be free for developers. At least that’s Huffman’s promise. If a developer wants to develop apps that help people use Reddit, he can make use of the API just fine. The same will happen for academic or non-commercial purposes. For companies, things change: “crawling Reddit, generating value and not returning any of that value to our users is something we have a problem with.”

And what about the users? Huffman’s comment is debatable: Companies are going to pay him and his company, after all. It is the users who have generated all that data and all that value, and although Reddit is a fantastic platform, it is, like any other social network, an intermediary. The content has been contributed by its users, and they probably won’t get anything for it. Of course, for them using Reddit probably isn’t a job.

