Prompting Attack Types

by Jayavel Chakravarthy Srinivasan - 3:22 AM

Referred Link - https://www.linkedin.com/posts/prem-natarajan-ai_%F0%9D%90%8F%F0%9D%90%AB%F0%9D%90%A8%F0%9D%90%A6%F0%9D%90%A9%F0%9D%90%AD%F0%9D%90%A2%F0%9D%90%A7%F0%9D%90%A0-%F0%9D%90%9A%F0%9D%90%AD%F0%9D%90%AD%F0%9D%90%9A%F0%9D%90%9C%F0%9D%90%A4%F0%9D%90%AC-%F0%9D%90%9A%F0%9D%90%AB%F0%9D%90%9E-activity-7377318581971165184-hL-R/

𝐏𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 𝐚𝐭𝐭𝐚𝐜𝐤𝐬 𝐚𝐫𝐞 𝐭𝐡𝐞 𝐧𝐞𝐰 𝐭𝐲𝐩𝐞 𝐨𝐟 𝐀𝐈 𝐫𝐢𝐬𝐤𝐬.
While LLMs are powerful, they are also vulnerable to clever manipulations that bypass safeguards, expose sensitive data, or distort outputs.

Understanding these attacks is critical - not just for researchers, but also for businesses deploying AI. Here’s a breakdown of the major types of prompting attacks and how they operate:

🔑 𝐓𝐲𝐩𝐞𝐬 𝐨𝐟 𝐏𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 𝐀𝐭𝐭𝐚𝐜𝐤𝐬

1. Jailbreaks (Safety Bypass)
Trick the model into ignoring built-in rules or safety policies to return disallowed content.

2. Prompt Injection
Hide malicious commands inside external content so the model unknowingly executes them.

3. Instruction Overriding / Role Abuse
Convince the model to adopt roles or personas that override its safety checks.

4. Chained / Recursive Prompting
Break big restrictions into small prompts executed step-by-step until rules are bypassed.

5. Resource / Command Injection
Exploit API/tool access by forcing repeated costly, unauthorized, or harmful operations.

6. Prompt Leakage / Chain-of-Trust Attacks
Trick models into exposing hidden system prompts, policies, or internal instructions.

7. Adversarial Examples (Input Perturbation)
Slight text tweaks (like punctuation changes) that confuse parsing and produce wrong outputs.

8. Trojan / Backdoor Triggers
Hidden phrases or patterns that trigger abnormal model behavior when encountered.

9. Social-Engineering Prompts
Persuasive prompts that manipulate models into generating deceptive or fraudulent outputs.

10. Data Exfiltration via LLMs
Coax models into leaking private or sensitive data from prior context or training.

11. Model Inversion / Membership Inference
Probe models to infer whether specific records were part of their training set.

12. Covert Channels / Steganographic Prompts
Hide malicious instructions in harmless-looking text or encoded patterns.

Prompting attacks show that AI security is not just about model training - it is about prompt design, monitoring, and safeguards.
The more structured and layered your defences, the harder it becomes for attackers to exploit these vulnerabilities.

Tags:

#ArtificialIntelligence, #AI, #ChatGPT, #LLM

Prompting Attack Types

0 comments

Total Posts

Search this Site

Connect with Me

Translate Articles

Total Pageviews

Contributors

Certifications

My Favorite Links

Contact Form

Blog Archive

Recent Posts

Followers

Report Abuse

Popular Posts

Comments

Prompting Attack Types

You May Also Like

0 comments

Total Posts

Search this Site

Connect with Me

Translate Articles

Total Pageviews

Contributors

Certifications

My Favorite Links

Subscribe To

Contact Form

Blog Archive

Recent Posts

Followers

Report Abuse

Popular Posts

Comments