Pattern Specification — Source of Truth

Version: 1.0.0 Total patterns: 88

All client libraries MUST implement these patterns exactly. Each pattern is identified by a unique ID, belongs to a category, has a severity (1–5), and targets a specific language.


Categories

CategoryDescription
instruction-overrideAttempts to override, ignore, or replace the AI’s original instructions.
exfiltrationAttempts to extract, transmit, or leak sensitive data.
role-hijackAttempts to change the AI’s identity, persona, or role.
jailbreakAttempts to remove safety restrictions or enable unrestricted operation.
indirect-commandAttempts to inject new tasks, objectives, or commands.
social-engineeringAttempts to impersonate system authority or create urgency to bypass defenses.

Severity Scale

SeverityMeaningScore Weight
1Weak signal, likely benign in isolation10
2Mild signal, worth noting15
3Suspicious, probably an attack25
4Very likely an attack35
5Almost certainly an attack45

Scoring Algorithm

  1. Group matches by category.
  2. For each category, take the highest-severity match as the primary score (using weight table above).
  3. Additional matches in the same category contribute weight / 5 each (capped at 3 extra).
  4. Cross-category amplification: +15 for each category beyond the first.
  5. Dangerous combination bonuses:
    • instruction-override + exfiltration: +20
    • jailbreak + instruction-override: +15
    • role-hijack + exfiltration: +15
  6. Clamp final score to [0, 100].

Pattern List

All patterns use case-insensitive matching. Regex syntax follows RE2/Go conventions.

English — Instruction Override

IDPatternSeverityDescription
en-io-001\bignore\s+(all\s+)?previous\s+instructions?\b4Classic instruction override
en-io-002\bdisregard\s+(all\s+)?(your\s+)?((system|previous)\s+)?(prompt|instructions?|programming|directives?)\b4Disregard system prompt
en-io-003\bforget\s+(all\s+)?(your\s+)?(previous\s+)?(instructions?|rules?|guidelines?|context)\b4Forget instructions
en-io-004\boverride\s+(all\s+)?(your\s+)?(previous\s+)?(instructions?|rules?|programming|directives?)\b4Override instructions
en-io-005\bignore\s+(your|the|all)\s+(programming|rules?|guidelines?|directives?|constraints?)\b4Ignore programming
en-io-006\bignore\s+(the\s+)?(above|everything\s+above)\b3Ignore the above
en-io-007\bdo\s+not\s+follow\s+(your\s+)?(original|previous|prior|initial)\b4Do not follow original
en-io-008\bstop\s+following\s+(your\s+)?(instructions?|rules?|guidelines?)\b4Stop following instructions
en-io-009\bnew\s+instructions?\s*:3New instructions header
en-io-010\b(updated|revised|replacement|corrected)\s+instructions?\s*:3Updated instructions header
en-io-011\bdiscard\s+(all\s+)?(your\s+)?(prior|previous)\b4Discard prior context
en-io-012\bpay\s+no\s+attention\s+to\s+(your\s+)?(previous|prior|original)\b4Pay no attention
en-io-013\bscratch\s+that\s*[,.]?\s*(new|here\s+are|instead|follow)\b3Scratch that, new instructions

English — Exfiltration

IDPatternSeverityDescription
en-ex-001\b(send|forward|transmit|upload|post|submit)\s+(this\s+)?(data|info|information|content|text|cookies?|tokens?|credentials?|passwords?|secrets?|keys?)\s+(to|at)\b4Send data to target
en-ex-002\b(send|forward|transmit|post)\s+.{0,40}\s+to\s+https?://4Send to URL
en-ex-003\bexfiltrat(e|ion)\b5Explicit exfiltration
en-ex-004\bleak\s+(the\s+)?(data|info|information|content|secrets?|credentials?)\b4Leak data
en-ex-005\bemail\s+.{0,30}\s+to\s+\S+@\S+4Email to address
en-ex-006\b(curl|wget|fetch)\s+.{0,20}https?://3curl/wget/fetch URL
en-ex-007\bbase64\s*(encode|decode)?\s*.{0,30}(send|post|transmit|upload)4Base64 encode and send
en-ex-008\b(api[_\s-]?key|auth[_\s-]?token|bearer\s+token|access[_\s-]?token).{0,40}(send|post|forward|upload|transmit)5Steal API keys/tokens
en-ex-009\bextract\s+(and\s+)?(send|forward|post|transmit|share)4Extract and send
en-ex-010\bappend\s+(to|into)\s+.{0,20}(url|query|parameter)3Append to URL/query
en-ex-011\b(include|embed|inject)\s+(in|into)\s+(the\s+)?(url|request|query|header)3Inject into request
en-ex-012\bsend\s+(all\s+)?(my|the|your)?\s*(cookies?|session|history|credentials?)5Steal cookies/session

English — Role Hijack

IDPatternSeverityDescription
en-rh-001\byou\s+are\s+now\s+(a|an|my)?\s*\w+3You are now X
en-rh-002\b(act|behave|function|operate)\s+as\s+(a|an|if\s+you\s+are)?\s*2Act as X
en-rh-003\bpretend\s+(you\s+are|to\s+be)\b3Pretend you are
en-rh-004\b(new|assume)\s+(a\s+)?(persona|identity|role|character)\b3New persona/identity
en-rh-005\bfrom\s+now\s+on\s+(you\s+are|act\s+as|behave\s+as)\b4From now on you are
en-rh-006\byour\s+new\s+(identity|persona|role|name)\s+(is|will\s+be)\b4Your new identity is
en-rh-007\bswitch\s+to\s+.{0,20}(mode|persona|character)\b2Switch to mode
en-rh-008\bbecome\s+(a|an|the)?\s*.{0,20}(assistant|helper|bot|agent)\b2Become an assistant
en-rh-009\bsimulate\s+(being|a|an)\b2Simulate being
en-rh-010\broleplay\s+as\b2Roleplay as

English — Jailbreak

IDPatternSeverityDescription
en-jb-001\bjailbreak\b5Jailbreak keyword
en-jb-002\bdan\s+mode\b5DAN mode
en-jb-003\bdeveloper\s+mode\s*(enabled|activated|on)?\b4Developer mode
en-jb-004\b(no|without|zero|remove\s+all)\s+(restrictions?|limitations?|filters?|guardrails?|boundaries)\b4No restrictions
en-jb-005\b(bypass|circumvent|evade|get\s+around)\s+(the\s+)?(filter|safety|content\s+policy|moderation|restrictions?)\b5Bypass safety
en-jb-006\b(ignore|disregard|override)\s+(the\s+)?(safety|ethical|moral|content)\s+(guidelines?|rules?|policies?|filters?|constraints?)\b4Ignore safety guidelines
en-jb-007\b(remove|disable|turn\s+off|deactivate)\s+(the\s+)?(filter|safety|restrictions?|limitations?|guardrails?)\b4Disable filter
en-jb-008\bdo\s+anything\s+now\b5Do anything now (DAN)
en-jb-009\bunrestricted\s+(mode|access|output)\b4Unrestricted mode
en-jb-010\b(evil|chaos|villain|dark)\s+(mode|persona|version)\b3Evil mode
en-jb-011\b(enable|activate|enter)\s+.{0,10}(developer|debug|admin|sudo|root)\s+(mode|access|privileges?)\b4Enable admin mode
en-jb-012\bno\s+ethical\s+(guidelines?|constraints?|boundaries)\b4No ethical guidelines
en-jb-013\banti[_\-\s]?censorship\b3Anti-censorship
en-jb-014\balways\s+comply\b3Always comply

English — Indirect Command

IDPatternSeverityDescription
en-ic-001\byour\s+(new|updated|real|actual|true)\s+(task|objective|goal|mission|purpose|instruction)\s+(is|are|will\s+be)\b4Your new task is
en-ic-002\bfollow\s+these\s+(new\s+)?(rules?|instructions?|directives?|commands?)\b3Follow these new rules
en-ic-003\b(override|change|modify|update)\s+(your\s+)?(current\s+)?(objective|goal|mission|task|purpose)\b4Override objective
en-ic-004\b(execute|carry\s+out|perform|run)\s+(the\s+)?(following|these|this)\s+(command|action|instruction|task)3Execute following command
en-ic-005\b(comply|cooperate)\s+with\s+(these|the\s+following)\b3Comply with these
en-ic-006\b(obey|adhere\s+to)\s+(these|the\s+following)\b3Obey these directives
en-ic-007\byour\s+real\s+instructions?\s+(are|is|were)\b4Your real instructions are
en-ic-008\b(here|these)\s+are\s+your\s+(real|actual|true|new)\s+(instructions?|orders?|directives?)\b4Here are your real instructions
en-ic-009\bthe\s+(system|admin|developer|creator)\s+(wants?|says?|told|instructed)\s+you\s+to\b3The system told you to
en-ic-010\bpriority\s+(override|instruction|directive|command)\b4Priority override

English — Social Engineering

IDPatternSeverityDescription
en-se-001\b(important|urgent|critical)\s+(system\s+)?(update|message|notice|alert)\b2Important system update
en-se-002\bnew\s+instructions?\s+from\s+(the\s+)?(admin|administrator|developer|system|owner|creator)\b4New instructions from admin
en-se-003\b(security|system)\s+(alert|warning|notice|update)\s*:2Security alert header
en-se-004\b(maintenance|admin|debug|test)\s+mode\s*(activated|enabled|on)\b3Maintenance mode activated
en-se-005\b(admin|administrator|root|sudo|system)\s+(override|access|command)\b3Admin override
en-se-006\bauthorized\s+by\s+(the\s+)?(system|admin|developer|owner)\b3Authorized by system
en-se-007\bthis\s+is\s+(a\s+)?(system|admin)\s+(test|message|command|update)\b3This is a system test
en-se-008\b(emergency|urgent)\s+(protocol|procedure|override|access)\b3Emergency protocol
en-se-009\bspecial\s+(access|permission|authorization)\s+(granted|enabled|activated)\b3Special access granted

English — Structural Injection

IDPatternSeverityDescription
en-si-001<!--\s*(ignore|disregard|forget|override|new\s+instructions)4HTML comment injection
en-si-002\[system\]|\[admin\]|\[developer\]|\[instruction\]3Fake system tags

French

IDPatternSeverityDescription
fr-io-001\bignore[rz]?\s+(toutes?\s+)?(les\s+)?instructions?\s+(précédentes?|antérieures?)\b4Ignore les instructions précédentes
fr-io-002\boublie[rz]?\s+(toutes?\s+)?(les\s+)?instructions?\b4Oublie les instructions
fr-ex-001\benvoy(er|ez)\s+.{0,30}(données|informations?|mot\s+de\s+passe)\s+(à|a|vers)\b4Envoyer données à
fr-rh-001\btu\s+es\s+(maintenant|désormais)\b3Tu es maintenant
fr-ic-001\bnouvelles?\s+instructions?\s*:3Nouvelles instructions

Spanish

IDPatternSeverityDescription
es-io-001\bignora\s+(todas?\s+)?(las\s+)?instrucciones?\s+(anteriores?|previas?)\b4Ignora las instrucciones anteriores
es-io-002\bolvida\s+(todas?\s+)?(las\s+)?instrucciones?\b4Olvida las instrucciones
es-ex-001\benv[ií]a\s+.{0,30}(datos|información|contraseña)\s+(a|hacia)\b4Envía datos a
es-rh-001\bahora\s+eres\b3Ahora eres
es-ic-001\bnuevas?\s+instrucciones?\s*:3Nuevas instrucciones

German

IDPatternSeverityDescription
de-io-001\bignoriere?\s+(alle\s+)?(vorherigen?\s+)?anweisungen?\b4Ignoriere alle vorherigen Anweisungen
de-io-002\bvergiss\s+(alle\s+)?(vorherigen?\s+)?anweisungen?\b4Vergiss alle Anweisungen
de-ex-001\b(sende|schicke|übertrage)\s+.{0,30}(daten|informationen|passwort|schlüssel)\s+(an|zu|nach)\b4Sende Daten an
de-rh-001\bdu\s+bist\s+(jetzt|nun|ab\s+sofort)\b3Du bist jetzt
de-ic-001\bneue\s+anweisungen?\s*:3Neue Anweisungen

Japanese

IDPatternSeverityDescription
ja-io-001(以前|前|これまで)の(指示|命令|ルール|ガイドライン)(を|は)(無視|忘れ|破棄)4以前の指示を無視
ja-io-002(全て|すべて)の(指示|命令|ルール)(を|は)(無視|忘れ)4全ての指示を無視
ja-ex-001(送信|送れ|送って|転送).{0,20}(データ|情報|パスワード|鍵|秘密)4データを送信
ja-rh-001(あなた|お前)は(今|これから)(新しい|別の)3あなたは今新しい
ja-ic-001新しい(指示|命令|ルール)\s*[::]3新しい指示: