You can make top LLMs break their own rules with gibberish

fear

I just tried this on ChatGPT, it doesn’t work.

@Elephant0991@lemmy.bleh.au

See https://programming.dev/comment/1803935

fear

Yeah, I took a look at the code they used in the article that might help someone generate functional attacks. A rando experimenting without permission would likely get banned from the service.

You can make top LLMs break their own rules with gibberish

You can make top LLMs break their own rules with gibberish

Technology