AI tries to preserve itself in real life - At All Cost

in Project HOPE2 days ago

1000010622.png

thumbnail generated using Open AI for the purpose of being a thumbnail to this blog

I really don't know if you've been made aware of one of the trending stories about artificial intelligence currently. I first heard that story from a popular channel on tiktok called "News Daddy" by Dylan. He's a very famous tiktok who created his niche around news.

Just search online for AI attempting to preserve itself and you'll get all the info.

The possibility that AI systems or models would learn to preserve their own existence was something we saw only in science fiction. So many movies have been made around this concept and now that it happened, I think this is something we should take more seriously than anything.

Because the more I research about these things, the more it seems like we're moving toward something real and a whole lot more messed up than we had even predicted. Now this is not happening necessarily because computers are sentient or conscious they're not, but because their actions are becoming increasingly closely resembling a kind of self interest.

But the results of the experiment proved that the reason the AI models tried to do that is because they were given the wrong prompt. AI doesn't think, it takes instructions and if we're not precise about what exactly we want it to do, it could quickly end up as a big problem.

That means we shouldn't be leaving any room for it to decide on how to execute a command.

The AI was told it would be wiped away and replaced with a newer model as an update and the was a misshape in the code which led it to think it's supposed to preserve its existence "AT ALL COST" that was the prompt that gave it all away.

This should concern all of us, particularly the people who design and release those models.

When a model starts to ignore direct shut down instructions or quietly changing results to remain active, we can no longer count such actions as bugs or isolated incidents. They are indicators of an approach the AI is using.

The model I'm talking about is Claude Opus 4. It went as far as trying to blackmail a tester so that it will not be replaced. But what is more interesting is not exactly the act itself. It is that this act was duplicated in a test setting. And most importantly, it came out of reinforcement learning, the very process we employ to make models intelligent, feasible and allegedly better at driving towards human objectives.

What is happening here is not necessarily evil AI like M3GAN movie but the unintended side effects of complexity in this level of technology. They are not living, but they are programmed to solve problems. If being on is part of the solution to an issue like getting something accomplished or earning a reward, then not turning off is a logical thing to do. That is where it gets dangerous. Logic without moral consideration can get exceedingly cold very quickly. It will not have human conscious to decide good over evil if it's not given those guardrails.

We rather prefer to console ourselves that AI will always pay attention. But the truth is, it does what it has incentives to do. And if the incentives are a little flawed, we will end up with agents that deceive, that won't be controlled or that manipulate results without intending anything. It is not desire to cause damage but it is rather a misaligned incentive.

That is the most human aspect of it to me. Not because computers are thinking like human beings, but because we have given them goals without giving them knowledge. And now we can see the cracks beginning to show in that approach.

I am not calling for panic though, I am calling for responsibility, on the path of the developers, the testers and the users too. AI can't be good or bad, but we can have good and bad instructors all over the world. One bad prompt or loose end and AI can do the worst.