CloakLLM — PII Protection for LLMs

NewThe Article 12 Paradox — EU AI Act compliance whitepaper

[CloakLLM]

Cloak your prompts. Prove your compliance.

Open-source PII protection middleware for LLMs. Detect sensitive data, replace it with reversible tokens, and maintain tamper-evident audit logs — all before your prompts leave your infrastructure.

Get Started GitHub

Python

$ pip install cloakllm

JavaScript

$ npm install cloakllm

CloakLLM Demo

CloakLLM 30-second demo showing PII detection and tokenization

Your LLM prompts are plaintext confessions.

Every API call to an LLM sends raw customer data — names, emails, SSNs — to third-party servers. Under the EU AI Act, that's a compliance liability.

The Risk

PII in prompts means your users' personal data is processed by third-party LLM providers — often without consent or safeguards.

The Deadline

December 2, 2027 — EU AI Act Article 12 transparency requirements take effect for high-risk AI systems.

The Penalty

Non-compliance fines up to 7% of global revenue or 35 million, whichever is higher.

New · Technical Whitepaper · April 2026

The Article 12 Paradox

Why GDPR and the EU AI Act cannot both be satisfied without PII middleware.

The EU AI Act requires logging every high-risk AI interaction. GDPR prohibits retaining personal data. This whitepaper explains the structural conflict and the architectural middleware layer that resolves it — with no legal trade-offs.

Article 12 logging vs. GDPR data minimisation — a structural, unavoidable conflict
Deterministic tokenization as GDPR-recognised pseudonymisation (Recital 26, Art. 4(5))
Behavioral traceability vs. identity traceability — what regulators actually require
Article 4a readiness: pseudonymised special-category data for bias detection

Download PDF Read summary

[CloakLLM] Whitepaper

Volume 1

The Article 12 Paradox

Why GDPR and the EU AI Act cannot both be satisfied without PII middleware.

April 2026v1.0

3-Pass Detection Pipeline

Multiple layers of detection ensure no PII slips through. Each pass catches what the previous one missed.

Pass 1

Regex

High-precision pattern matching for structured data.

EMAILSSNCREDIT_CARDPHONEIP_ADDRESSAPI_KEYJWTIBAN

Pass 2

spaCy NER

Named entity recognition for names, orgs, and locations. (Python only)

PERSONORGGPE

Pass 3

Ollama LLM

Local LLM-based semantic detection for contextual PII. (opt-in)

ADDRESSDOBMEDICALFINANCIALBIOMETRIC

Before — Plaintext prompt

Help me write a follow-up email
to Sarah Johnson (sarah.j@techcorp.io)
about the Q3 security audit.
Her direct line is +1-555-0142.

After — Cloaked prompt

Help me write a follow-up email
to [PERSON_0] ([EMAIL_0])
about the Q3 security audit.
Her direct line is [PHONE_0].

Everything you need to protect PII

Drop-in middleware that works with your existing LLM stack. No vendor lock-in, no cloud dependencies.

9 Detection Categories

Emails, SSNs, credit cards, phone numbers, API keys, IBANs, JWTs, AWS keys, and IP addresses — all detected out of the box.

Reversible Tokenization

Deterministic [CATEGORY_N] tokens preserve context for the LLM. Desanitize to restore originals in responses.

Tamper-Evident Audit Logs

Hash-chained JSONL entries with SHA-256 and per-entity metadata. No PII stored — just hashes and counts. EU AI Act Article 12 ready.

One-Line Integration

cloakllm.enable() wraps LiteLLM (Python) or the OpenAI SDK (JS). Works with Vercel AI SDK middleware too.

Multi-Language Detection

13 locales (DE, FR, ES, IT, PT, NL, PL, SE, NO, DK, FI, GB, AU) with country-specific PII patterns for SSNs, tax IDs, and more.

Local LLM Detection

Opt-in Ollama integration catches addresses, medical terms, DOBs, and more. Data never leaves your machine.

Cryptographic Attestation

Ed25519 signed sanitization certificates prove compliance. Merkle tree batch proofs. Cross-language compatible.

Incremental Streaming

StreamDesanitizer replaces tokens as chunks arrive — no buffering the full response. All middleware paths stream incrementally.

Context Risk Analysis

Scores re-identification risk in sanitized text. Detects token density, identifying descriptors, and relationship edges that could reveal identity.

Normalized Token Standard

Formal spec (TOKEN_SPEC.md) with validation utilities, canonical regex, and 62 built-in categories. Both SDKs produce identical tokens.

Pluggable Backends

DetectorBackend base class lets you swap or extend the default regex→NER→LLM pipeline with custom detection stages.

Article 12 Compliance Mode

Formal EU AI Act compliance profile (v0.6) with tamper-detectable compliance fields, complianceSummary() API, and structured COMPLIANT/NON_COMPLIANT verify reports for auditors.

Enterprise Key Management (experimental)

KMS provider scaffolding for AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault. Disabled in v0.6.1 pending rebuild in v0.7.0 — use LocalKeyProvider for now.

Security Hardened

Ollama SSRF prevention, CLI PII redaction by default, thread-safe internals, and redacted analysis output.

One line to protect your LLM calls

Drop-in middleware for every major LLM framework. No code rewrites needed.

Python

from cloakllm import enable_openai
from openai import OpenAI

client = OpenAI()
enable_openai(client)  # Wraps OpenAI SDK — all calls are now protected

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": "Help me email Sarah Johnson (sarah.j@techcorp.io)"
    }],
)

# PII automatically restored in the response
print(response.choices[0].message.content)

Get started in seconds

Install the SDK for your language and start protecting PII immediately.

Python

$ pip install cloakllm

$ python -m spacy download en_core_web_sm

JavaScript / TypeScript

$ npm install cloakllm

MCP Server

$ pip install cloakllm-mcp

SDK Comparison

Three SDKs, same core protection. Pick the one that fits your stack.

Feature	Python	JavaScript	MCP
Regex PII Detection
spaCy NER (PERSON, ORG, GPE)
Ollama LLM Detection (opt-in)
Reversible Tokenization
Redaction Mode
Hash-Chained Audit Logs
CLI (scan / verify / stats)
Multi-Turn Token Maps
Custom Patterns
Field-Level PII Metadata
Batch Processing
Performance Metrics
Incremental Streaming
Middleware Integration	OpenAI / LiteLLM	OpenAI / Vercel	Claude Desktop
Zero Runtime Dependencies