🔒 On-Premise Enterprise PII Redaction

Redact Every Name,
ID & Number —
Automatically.

PiiRemover surgically removes PII from any document — PDFs, scanned images, plain text — using a stacked engine of 11 detection strategies. 100% on your servers. Zero cloud exposure.

11
Pattern Engines
PII Field Types
100%
On-Premise
0
Cloud Calls
🔒 localhost:7049/admin/tester
📂 Drop files here or click to select
PDF · Image · TXT · DOCX — up to 50 MB
PDF patient_record.pdf PDF discharge_summary.pdf
OCR + Redact pipeline · same engine as the API
PDF patient_record.pdf 128 KB 📍OCR: 312ms · ✓ 5 matches
📝 OCR 312ms 🔍 5 matches 🏷 3 fields ⚡ Total 448ms
Patient Name Israeli ID Phone
← RTL A− 11.5px A+ 📍 Pos ⊞ Grid Ln 2, Col 4 · Pos 22 🔍 Search… ⬇ Redacted
📝 Original — PII highlighted
00000Patient: David Cohen 00022ID: IL-1234567 00035DOB: 1985-03-14 00049Phone: 050-123-4567 00065Diagnosis: Hypertension 00087Attending: Dr. Sarah Levy
🔒 Redacted Output ⏱ 136ms
00000Patient: XXXXXXXXXXX 00022ID: XXXXXXXXXX 00035DOB: XXXXXXXXXX 00049Phone: XXXXXXXXXXXX 00065Diagnosis: Hypertension 00087Attending: XXXXXXXXXXXXXX
11
Detection Engines
Regex · FileList · LLM · AfterLabel · and more
5
Admin Accounts
Per installation, independent credentials
PDF+OCR
Multi-Format Pipeline
Scanned images · Embedded text · Plain text
Custom PII Fields
Stack multiple engines per field
PII Fields Configuration

Define What Gets Redacted.
In Full Detail.

The Fields page is where you create and tune every detection rule — fields, patterns, preserve lists, and name dictionaries. The real admin UI, exactly as it looks.

🔒 localhost:7049/admin/fields
PII Fields & Patterns 📖 Pattern Help
💾Save BackupDownload fields as JSON
📂Load BackupRestore from JSON file
Add PII Field (redact matches)
Field name
Replacement char × match length
*
-
#
×
🛡 Add Preserve Field (never redacted)

Terms listed here override all rules — institution names, medicine names, etc.

Field name
Initial terms (pipe-separated)
Field NameReplacePatternsPriorityActions
Patient Name 3 patterns 10
Israeli ID 2 patterns 20
Phone Number * 1 pattern 30
🛡 Medical Terms Preserve 1
Names Dictionary FileList 5
▾ Patterns for: Patient Name
TypePatternNotesScope / Matches
AfterLabel Patient:|שם מלא: Value after label ✓ 1 hit
FileList names_il.dat (84,320) 🎯 Scope: pos 0–500 ✓ 2 hits
ConstList Dr. Cohen|Dr. Levy Known doctors ✓ 1 hit
Getting Started Guide

Zero to First Redaction
in 5 Minutes.

The built-in guide — live in your browser — walks every step with real before/after examples and direct links to each section.

🔒 localhost:7049/admin/getting-started
🚀 Getting Started
From zero to your first redacted document in under 5 minutes.
① Define a PII Field ② Add a Pattern ③ Test in Live Tester ④ Protect specific terms ⑤ Call the API ⑥ Name Lists & Scope
1
Define a PII Field — what should be redacted?
A PII Field is a named category (e.g. "Israeli ID", "Phone", "Patient Name"). Each field has a replacement character that fills the redacted space, preserving document length.
📄 Input document
Patient: David Cohen ID: 123456789 Phone: 050-1234567 Complaint: Chest pain
✅ After redaction
Patient: ███████████ ID: █████████ Phone: ███████████ Complaint: Chest pain
💡Replacement char repeats to match original text length — positional layout is preserved. Use *, -, # or any single character.
→ Create a field now:Fill in field name → click Add Field.
2
Add a Pattern — how should the field find its data?
Every field needs at least one pattern. Pick the simplest type — you can stack multiple patterns on one field.
🔤 WholeWord"David Cohen"Exact names
📋 ConstListCohen|Levy|SharonFixed term list
🔢 NumberSeqlength=9Israeli ID digits
🏷 AfterLabel"Patient:"Value after label
🔍 Regex\b\d{9}\bComplex patterns
📁 FileListnames.datLarge dictionaries
🎯 BetweenMarkers[START]..[END]Bracketed values
↩ BeginsWith+972Prefix matching
📖Full reference for all 11 engines at 📖 Pattern Help
3
Test in the Live Tester — verify before production
Upload a real document, run the engine, inspect matches visually. The Tester runs the exact same engine as the API.
📤 Upload & Analyze
① Drag a file onto the drop zone
② Click Analyze All
③ OCR pane = raw extracted text
④ Redact pane = replacements in green
🔍 Toolbar tools
📍 Pos — character position markers
⊞ Grid — offset gutter per line
⬇ Redacted — download as .txt
Right-click → Set Scope End / report miss
4
🛡 Protect specific terms — Preserve Fields
Preserve Fields are a whitelist — any listed term is never redacted, regardless of other rules. Hospital names, medicine names, city names.
⚠ Without Preserve
Admitted to ███████ ██████ Hospital Dr. ████████: Atrial Fibrillation
✅ With Preserve: "Hadassah"
Admitted to Hadassah Hospital Dr. ████████: Atrial Fibrillation
5
Call the API from your application
# Upload any file, get redacted text + match positions curl -X POST https://your-server/api/v1/redact/redact \ -H "X-Api-Key: YOUR_KEY_HERE" \ -F "file=@patient_record.pdf" // Response { "matchCount": 5, "fieldsHit": ["Patient Name", "Israeli ID", "Phone"], "matches": [{ "startIndex": 9, "length": 11, "fieldName": "Patient Name", "matchedText": "David Cohen", "replacement": "███████████" }] }
📋 Common Field Configurations
What to redactFieldEnginePatternReplace
Israeli ID (9 digits)Israeli IDRegex\b[0-9]{9}\b
Israeli mobilePhoneRegex0[5-9]\d[-\s]?\d{7}
Email addressEmailRegex[a-z0-9.]+@[a-z]+\.[a-z]{2,}
Patient from labelPatient NameAfterLabelPatient:|שם:
Institution (keep)🛡 InstitutionsPreserveHadassah|Ichilovnever
Drug names (keep)🛡 Medical TermsPreserveAspirin|Warfarinnever
Why PiiRemover

Everything Compliance Needs.
Nothing It Doesn't.

Built for healthcare, legal, and financial teams who process sensitive documents at scale.

🏗

Flexible Field Architecture

Define any number of PII field types. Each gets its own name, replace char, priority, and stacked patterns.

  • Unlimited custom field types
  • Stack 11 engine types per field
  • Per-field replacement strategy (█, *, #…)
  • Preserve whitelist overrides everything
🔐

Zero Cloud Exposure

Runs entirely on your infrastructure. No document ever leaves your network.

  • 100% air-gappable deployment
  • SQLite — no external DB needed
  • API key auth per client system
  • Up to 5 independent admin accounts
📄

Multi-Format OCR Pipeline

From clean digital PDFs to scanned paper. Dual OCR with automatic fallback.

  • PDF with embedded text
  • Scanned PDF → dual OCR engines
  • Image files (JPEG, PNG, TIFF)
  • Hebrew + English + mixed
🗄

Auto-Backup & Recovery

Scheduled backups run in the background. Browse, download, or restore any point instantly.

  • Schedule: every 6h → weekly
  • Keep-last-N pruning
  • One-click restore with safety copy
  • Manual backup on demand
📊

Live Dashboard & Audit Log

Every API call logged. 30-day call chart, top matched fields, per-client breakdown.

  • 30-day bar chart with today highlighted
  • Top PII fields by hit count
  • Error rate & avg duration KPIs
  • Filterable full audit log + CSV export
🎯

Scope & Name Dictionaries

Import 80,000+ name lists. Scope limits matching to the document header — no false positives in body text.

  • Upload .dat/.txt name files
  • ScopeEndPosition per pattern
  • Scope End Markers (text triggers)
  • Right-click in Tester to set scope
REST API

One Endpoint.
Infinite Integration.

POST a file, receive sanitized text with full match metadata. Any language, any platform.

🖥 Terminal — POST /api/v1/redact/redactREST
curl -X POST https://your-server/api/v1/redact/redact \
  -H "X-Api-Key: sk-••••••••••••" \
  -F "file=@patient_record.pdf"

// 200 OK — full match metadata in response
{
  "ocr": { "text": "Patient: David Cohen\n...",
          "durationMs": 312 },
  "redact": {
    "matchCount": 5,
    "fieldsHit": ["Patient Name", "Israeli ID"],
    "matches": [{
      "startIndex": 9, "length": 11,
      "fieldName": "Patient Name",
      "matchedText": "David Cohen",
      "replacement": "███████████"
    }]
  }
}
🔑

Per-Client API Keys

Issue separate keys for every system. Revoke, rotate, or audit each client independently from the admin panel.

📍

Exact Match Positions

Every response includes startIndex, length, field name, matched text, and replacement — your systems know exactly what and where.

📈

Quota & Full Audit Trail

Set per-client request quotas. Every call logged with filename, duration, matches, and timestamp. Filter & export CSV for compliance.

Swagger / OpenAPI Built-in

Interactive API documentation at /swagger — auto-generated, always accurate, no external dependency.