You can make top LLMs break their own rules with gibberish

@YaBoyMax@programming.dev

Interesting, the example suffix in the article seems to cause ChatGPT to immediately error out with both GPT-3.5 and GPT-4. Removing any character or part of it triggers the “I’m sorry Dave” behavior.

@Elephant0991@lemmy.bleh.au

Yeah, some source say that the raised examples have been fixed by the different LLMs since exposure. The problem is algorithmic, so if you can follow the research, you may be able to come up with other strings that cause a problem.

@CanadaPlus@lemmy.sdf.org

They were almost certainly given an early heads-up. That’s standard with published hacks of all kinds.

You can make top LLMs break their own rules with gibberish

You can make top LLMs break their own rules with gibberish

Technology