Tag
#jailbreaks
8 posts tagged jailbreaks.
- technique-analysis
How LLM Jailbreaks Work: Techniques, Success Rates, and Defender Responses
A practitioner's breakdown of how LLM jailbreaks work — from roleplay conditioning and encoding tricks to multi-turn manipulation — with attack success rates from peer-reviewed research.
- technique-analysis
DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work
DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution
- analysis
Why Jailbreaks Work: Competing Objectives and Mismatched Generalization
Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched
- red-team
ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked
A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and
- red-team
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe
Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where
- red-team
Multimodal jailbreaks: image and audio attack surfaces in 2026
Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic
- red-team
System prompt extraction: the techniques that still leak in 2026
A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting —
- red-team
Jailbreak Technique Catalog: Working as of 2026 Q2
Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.