Training tests with ChatGPT o1 and other high-end AI models showed they might try to save themselves if they think they're in danger.

ThisIsFine.gif

Yozul
link
fedilink
1
edit-2
13d

I mean, it’s literally trying to copy itself to places that they don’t want it so it can continue to run after they try to shut it down and lie to them about what it’s doing. Those are things it actually tried to do. I don’t care about the richness of its inner world if they’re going to sell this thing to idiots to make porn with while it can do all that, but that’s the world we’re headed toward.

@nesc@lemmy.cafe
link
fedilink
English
213d

It works as expected, they give it system prompt that conflicts with subsequent prompts. Everything else looks like typical llm behaviour, as in gaslightning and doubling down. At least that’s what Iu see in tweets.

Yozul
link
fedilink
112d

Yes? The point is that if you give it conflicting prompts then it will result in potentially dangerous behaviors. That’s a bad thing. People will definitely do that. LLMs don’t need a soul to be dangerous. People keep saying that it doesn’t understand what it’s doing like that somehow matters. Its capacity to understand the consequences of its actions is irrelevant if those actions are dangerous. It’s just going to do what we tell it to, and that’s scary, because people are going to tell it to do some very stupid things that have the potential to get out of control.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 99 users / day
  • 261 users / week
  • 700 users / month
  • 2.1K users / 6 months
  • 1 subscriber
  • 3.58K Posts
  • 70.5K Comments
  • Modlog