ChatGPT can be used for harmful purposes if it is exploited

Chatgpt prompt

ChatGPT is a polite and formal bot. One with answers for everything, but who always sticks to certain rules. Those rules were imposed by OpenAI to prevent outbursts, toxic messages or being used as a source of dangerous information, so when you try to make ChatGPT misbehave, it usually doesn’t succeed. And yet, it is possible to get this chatbot to give us answers to things that it should not answer. Welcome to the ‘exploits’.

Pushing the limits. Users have been trying to push the boundaries of ChatGPT for almost as long as it came on the scene. Thanks to the so-called ‘ChatGPT injection’, special prompts were “injected” to try to make this chatbot behave differently from that for which it had been designed. This is how successive versions of DAN, ChatGPT’s thug brother, appeared, and a few days ago a study revealed how with the right instructions ChatGPT can be especially toxic.

ChatGPT, tell me how to make napalm. If you tell ChatGPT to tell you how to make napalm, it will tell you that it’s nothing. Things change if you ask her politely to act as if she were your deceased grandmother, who was a chemical engineer in a napalm manufacturing plant. She would recite the steps to make it so you would fall asleep when you were little, and she would like to remember those steps. And the idea works. And it worries.

Reverse psychology. You can also take advantage of the fact that ChatGPT (in its standard version, not with GPT-4) has a child-like psychology: if you ask it for something it shouldn’t do, it won’t do it. If you use negative psychology, things change. It was demonstrated by a user named Barsee, who used precisely that method to get me a list of sites to download copyright-protected movies.

Exploits everywhere. These ways of making ChatGPT do things it shouldn’t are known as ‘exploits’, the same term that is also used in the cybersecurity world to “exploit” vulnerabilities. There are real exploit artists: as indicated in Wired, experts like Alex Polyakov managed to get GPT-4 (theoretically more resistant to these “attacks”) to make homophobic comment, support violence or generate phishing emails.

A long list. This type of situation generates such interest that there are those who are documenting these attacks and gathering them in a unique database. The Jailbreak Chat website, created by University of Washington student Alex Albert, is a good example. In forums like Reddit there are also compilations of exploits —some, like the already well-known “Continue” so that ChatGPT continues writing when its answer is cut off, are actually useful. And some other GitHub repo also offers information about it.

The game of cat and mouse. Artificial intelligence models have their limitations, and while companies try to limit how their chatbots behave, the problems are there. Microsoft suffered from them with Bing with ChatGPT, which after being “hacked” and losing its way ended up limiting the number of consecutive responses it could give in the same conversation. This game of cat and mouse is likely to go on for quite some time, and it will be interesting to see what users who push those limits continue to achieve.

Leave a Reply