Prompting Attack Types
𝐏𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 𝐚𝐭𝐭𝐚𝐜𝐤𝐬 𝐚𝐫𝐞 𝐭𝐡𝐞 𝐧𝐞𝐰 𝐭𝐲𝐩𝐞 𝐨𝐟 𝐀𝐈 𝐫𝐢𝐬𝐤𝐬.
While
LLMs are powerful, they are also vulnerable to clever manipulations
that bypass safeguards, expose sensitive data, or distort outputs.
Understanding
these attacks is critical - not just for researchers, but also for
businesses deploying AI. Here’s a breakdown of the major types of
prompting attacks and how they operate:
🔑 𝐓𝐲𝐩𝐞𝐬 𝐨𝐟 𝐏𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 𝐀𝐭𝐭𝐚𝐜𝐤𝐬
1. Jailbreaks (Safety Bypass)
Trick the model into ignoring built-in rules or safety policies to return disallowed content.
2. Prompt Injection
Hide malicious commands inside external content so the model unknowingly executes them.
3. Instruction Overriding / Role Abuse
Convince the model to adopt roles or personas that override its safety checks.
4. Chained / Recursive Prompting
Break big restrictions into small prompts executed step-by-step until rules are bypassed.
5. Resource / Command Injection
Exploit API/tool access by forcing repeated costly, unauthorized, or harmful operations.
6. Prompt Leakage / Chain-of-Trust Attacks
Trick models into exposing hidden system prompts, policies, or internal instructions.
7. Adversarial Examples (Input Perturbation)
Slight text tweaks (like punctuation changes) that confuse parsing and produce wrong outputs.
8. Trojan / Backdoor Triggers
Hidden phrases or patterns that trigger abnormal model behavior when encountered.
9. Social-Engineering Prompts
Persuasive prompts that manipulate models into generating deceptive or fraudulent outputs.
10. Data Exfiltration via LLMs
Coax models into leaking private or sensitive data from prior context or training.
11. Model Inversion / Membership Inference
Probe models to infer whether specific records were part of their training set.
12. Covert Channels / Steganographic Prompts
Hide malicious instructions in harmless-looking text or encoded patterns.
Prompting
attacks show that AI security is not just about model training - it is
about prompt design, monitoring, and safeguards.
The more structured and layered your defences, the harder it becomes for attackers to exploit these vulnerabilities.
Tags:
#ArtificialIntelligence, #AI, #ChatGPT, #LLM
0 comments