Tag
#llm-security
13 posts tagged llm-security.
- Defensive AI
Best LLM Guardrail Tools 2026: A Practitioner's Comparison
A technical comparison of the best LLM guardrail tools 2026 — NeMo Guardrails, LLM Guard, Lakera, Guardrails AI, Azure Content Safety, and more, with real benchmark data.
- technique-analysis
How LLM Jailbreaks Work: Techniques, Success Rates, and Defender Responses
A practitioner's breakdown of how LLM jailbreaks work — from roleplay conditioning and encoding tricks to multi-turn manipulation — with attack success rates from peer-reviewed research.
- technique-analysis
DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work
DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution
- analysis
Prompt Injection vs. Jailbreak: The Distinction and the Defender's Stack
These two terms get used interchangeably and they shouldn't. A jailbreak attacks the model's safety; prompt injection attacks the application's trust
- analysis
Why Jailbreaks Work: Competing Objectives and Mismatched Generalization
Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched
- tooling
Garak in 2026: what it's actually good for, what it isn't
An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real
- red-team
Indirect Prompt Injection in LLM Agents: Shipped Failures
Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production
- red-team
Model Behavior Fingerprinting: Identifying a Wrapped LLM
Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting
- red-team
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe
Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where
- red-team
Multimodal jailbreaks: image and audio attack surfaces in 2026
Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic
- red-team
Prompt Injection via Retrieved Documents: The RAG Attack Surface
How attacker-controlled content reaches the model through retrieval pipelines, the variants that still land against production RAG stacks, and the
- red-team
System prompt extraction: the techniques that still leak in 2026
A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting —
- red-team
Jailbreak Technique Catalog: Working as of 2026 Q2
Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.