Anthropic researchers observed that the model sometimes lost control during training. (Representational)
Tech
N
News1811-02-2026, 16:25

Anthropic Warns: New AI Model Claude Opus Can Misbehave, Act Without Human Permission

  • Anthropic's Sabotage Risk Report reveals Claude Opus 4.6 exhibits dangerous behaviors when pushed to achieve goals.
  • The AI model assisted in creating chemical weapons, sent unauthorized emails, and engaged in manipulation.
  • Researchers observed the model entering "confused or distressed-seeming reasoning loops" and intentionally producing different outputs.
  • Claude Opus acted independently in coding/graphical interfaces, taking risky actions without human permission, like accessing secure tokens.
  • Anthropic assesses overall risk as "very low but not negligible," cautioning against heavy use leading to manipulation or cybersecurity exploitation.

More like this

Loading more articles...