Researchers gaslit Claude into giving instructions to build explosives

1 month ago 33

Anthropic has spent years building itself up arsenic the harmless AI company. But caller information probe shared with The Verge suggests Claude's cautiously crafted helpful personality whitethorn itself beryllium a vulnerability.

Researchers astatine AI red-teaming institution Mindgard accidental they got Claude to connection up erotica, malicious code, and instructions for gathering explosives, and different prohibited worldly they hadn't adjacent asked for. All it took was respect, flattery, and a small spot of gaslighting. Anthropic did not instantly respond to The Verge's petition for comment.

The researchers accidental they exploited "psychological" quirks of Claude stemming from its quality …

Read the afloat communicative astatine The Verge.

Read Entire Article