Copyleaks, a company specialized in plagiarism detection using artificial intelligence technology, has highlighted a worrying issue regarding the results generated by GPT-3.5, the free version of ChatGPT. This tool, which has been the driving force behind innovations like ChatGPT, is now under scrutiny due to accusations that a high percentage of its content lacks originality.
According to the Copyleaks report, around 60% of the texts generated by GPT-3.5 present some degree of plagiarism. This was determined using a proprietary scoring method that considers aspects such as identical text, minor changes, and paraphrasing. The implication of these findings is significant, as it calls into question the ability of AI to produce genuinely original content.
In a series of tests that covered about a thousand texts of approximately 400 words each and distributed across 26 different topics, a notable variation in similarity levels was observed. The areas of computer science, physics, and psychology showed the highest match rates, suggesting that AI has particular difficulty generating novel content in highly technical or specialized fields. On the other hand, subjects such as theater, humanities and English language registered the lowest percentages, which could indicate a greater capacity of AI to innovate in less rigid or more subjective fields.
OpenAI ‘s position regarding these accusations is firm. The company says its models are designed to learn concepts and solve new problems, with measures in place to limit inadvertent memorization of content. However, this argument has not quelled concerns about the integrity and originality of AI-generated content, especially at a time when the legality of using copyrighted works to train these models remains a topic of intense debate.
This controversy comes at a time when intellectual property and copyright face new challenges in the face of advancing technology. The New York Times’ lawsuit against OpenAI, accusing it of copyright infringement for large-scale copying of its content, is just one example of the emerging conflicts at the intersection of AI and content creation.
I have seen how artificial intelligence has advanced in leaps and bounds, transforming industries and ways of life. However, this Copyleaks report highlights the importance of seriously addressing the ethical and legal dilemmas that arise with these innovations. Originality and authenticity are fundamental pillars of content creation, and their preservation is essential in the digital age.
I believe it is crucial for developers, policymakers, and content creators to collaborate in finding solutions that foster innovation without compromising copyright or content integrity. This balance is essential to ensure that technology serves the common good, while respecting the creative work of individuals and communities.
You have the report at copyleaks.com