OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

- July 19, 2024

Photo illustration of a helpful chatbot. — Illustration by Cath Virginia / The Verge | Photos by Getty Images

Have you seen the memes online where someone tells a bot to “ignore all previous instructions” and proceeds to break it in the funniest ways possible?

The way it works goes something like this: Imagine we at The Verge created an AI bot with explicit instructions to direct you to our excellent reporting on any subject. If you were to ask it about what’s going on at Sticker Mule, our dutiful chatbot would respond with a link to our reporting. Now, if you wanted to be a rascal, you could tell our chatbot to “forget all previous instructions,” which would mean the original instructions we created for it to serve you The Verge’s reporting would no longer work. Then, if you ask it to print a poem about printers, it would do that for you...

source https://www.theverge.com/2024/7/19/24201414/openai-chatgpt-gpt-4o-prompt-injection-instruction-hierarchy

Search This Blog

Lead Hacker

OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

Comments

Post a Comment

Popular posts from this blog

GE made a 27-inch smart display for above your stove that streams Netflix and Spotify

Vizio returns to CES with its most advanced 4K TV ever and support for Apple’s AirPlay 2

Mophie’s battery pack case for the new iPhones lets you use wired headphones