Oheka オヘカ | Nostr Profile

Oheka オヘカ 6h

A recent research paper from Sapienza University of Rome points to an unexpected vulnerability in current AI safety systems. The researchers examined how large language models handle restricted or malicious requests depending on how they are phrased. When the prompts are written in straightforward prose, the models refuse them in most cases. But when the exact same intent is expressed through poetic language using rhyme, metaphor, or verse, the rate of refusal drops sharply. According to the authors, this happens because the models place strong emphasis on stylistic coherence and fluency, sometimes at the cost of identifying the underlying intent. When form takes precedence, existing safety mechanisms become less effective. arxiv.org/abs/2511.15304

Welcome to Oheka オヘカ spacestr profile!

About Me

Interests

Videos

Music

Friends