100% On-Premises — Your Data Never Leaves

Enterprise PII Redaction
Built for Healthcare

Automatically detect and redact patient identifiers from scanned documents, PDFs, DICOM images, and clinical screenshots — in Hebrew and English — without sending a single byte to the cloud.

100%
On-Premises
500k+
Terms/Field
הע
Hebrew + English
🖼+📄
Image + Text
POST /api/v1/redact/redact
// Input document text
Patient: כהן שאול
ID: 205447709
DOB: 30/11/1994
Phone: 058-3245670
Referral: Sheba Medical Center
 
// Redacted response
"redactedText":
 Patient: ████████
 ID: █████████
 DOB: ██████████
 Referral: Sheba Medical Center ← preserved ✓
"matchCount": 4, "durationMs": 138
✨ New in This Release

Five Major Capabilities.
Built for Real Clinical Environments.

This release brings pixel-level image redaction, intelligent dark-background OCR, allow-list Preserve Fields, a half-million-term performance engine, and a built-in debug console — everything a healthcare IT team actually needs.

🖼

Pixel-Level Image Redaction

Upload a DICOM screenshot, PACS image, or scanned document. PiiRemover OCRs the image, runs your PII rules, and paints black rectangles over only the matched words — leaving all other image content pixel-perfect and intact.

🌑

Smart Dark-Background OCR

Clinical workstations and PACS viewers render white text on black. PiiRemover automatically detects dark-background images, inverts colours before OCR, and picks the best pass — dramatically improving recognition on medical imaging systems.

🛡

Preserve Fields — Intelligent Allow-List

Define terms that must never be redacted: hospital names, clinic brands, equipment identifiers. The Preserve engine runs before redaction and protects approved terms even when they match a PII pattern — zero false positives on your own brand.

500,000-Term HashSet Engine

The redesigned pattern engine replaces regex alternation with a HashSet word-tokenizer: O(text length) matching regardless of term count. A 480,000-name list runs at the same speed as 100 terms. Multi-word names and Unicode/Hebrew supported.

🔍

Live Tester & Debug Console

Browser-based testing tool built into the admin panel. Upload files or paste text directly. See OCR output side-by-side with the redacted result. Right-click any un-redacted word to instantly add it as a Redact or Preserve rule — without leaving the page.

✏️

Inline Pattern Editor

Edit keyword lists directly in the admin panel — no more CSV uploads for small changes. Add or remove individual terms with a chip editor, or bulk-paste a pipe-separated list. Changes invalidate the cache and take effect on the next request.

The Challenge

Healthcare Data Is the #1 Target

Medical records contain the most sensitive personal data. Existing cloud solutions create unacceptable compliance risk.

☁️

Cloud Exposure

Cloud-based redaction services require uploading patient records to third-party servers — a HIPAA and GDPR violation risk.

🔓

Data Breach Risk

Every external API call is a potential breach vector. Healthcare organisations cannot afford to trust third-party data processors with PHI.

🌐

Internet Dependency

Air-gapped hospital networks, secure clinical environments, and offline deployments are incompatible with cloud-only solutions.

🏥

Fully On-Premises

PiiRemover runs entirely on your infrastructure. Zero internet calls. Zero cloud dependencies. Patient data stays inside your perimeter.

Real-Time Processing

Integrated OCR engine (Windows.Media.Ocr + Tesseract) processes scanned documents and images in milliseconds — no external service round-trips.

🛡️

Compliance-First Design

Built from the ground up for HIPAA, GDPR, and Israeli privacy law (PPPA). Every architectural decision prioritises data isolation.

Capabilities

Everything You Need. Nothing More.

Purpose-built for healthcare document workflows. No bloat, no cloud hooks, no telemetry.

🖼

Image RedactionNEW

Pixel-level selective redaction. OCR identifies each word's exact bounding box. PII-matched words get painted black. Everything else stays untouched — including DICOM metadata and imaging measurements.

🌑

Dark-Background OCRNEW

Automatically detects black-background medical images. Luminance sampling across 2,000 pixel grid points. White-on-black PACS overlays are colour-inverted before OCR — the right engine pass is selected automatically.

🛡

Preserve FieldsNEW

Allow-list engine that runs before redaction. Define hospital names, clinic brands, drug names, or any approved term. The Preserve pass overrides all redaction patterns — zero false positives on protected terms.

500k-Term EngineNEW

HashSet word-tokenizer replaces regex alternation. O(text length) regardless of term count. 480,000-name lists run at the same speed as 100 terms. Greedy longest-match with Unicode and Hebrew support.

🔍

Live Tester & DebugNEW

Built-in browser console. Upload files or type text directly. OCR result and redacted output shown side-by-side. Right-click any selection to instantly add a Redact or Preserve rule without leaving the page.

🔍

Multi-Language OCR

Dual-engine OCR pipeline — Windows.Media.Ocr (built-in, zero install) with Tesseract fallback. Recognises Hebrew and English in the same document. Smart language-pass selection picks the best result.

📄

PDF & Image Support

Native PDFs (PdfPig), scanned PDFs (rasterisation + OCR), JPEG, PNG, TIFF, BMP, and WebP out of the box. MIME-type-based extractor dispatch handles mixed document queues automatically.

🎯

Configurable PII Fields

14 pre-seeded PII types (Israeli ID, DOB, phone, email, IBAN, passport, name…). Add custom Regex, keyword-list, context-aware, or LLM patterns per client. Inline chip editor for fast updates.

🔗

REST API

Clean versioned REST endpoints for text redaction, image redaction, and OCR extraction. API-key auth. OpenAPI/Swagger docs included. Integrates with any HIS, PACS, or document management system.

⚙️

Admin Console

Web-based backoffice to manage clients, API keys, PII patterns, preserve allow-lists, view request logs, and monitor system health — all behind secure login.

🧵

Thread-Safe & Concurrent

SemaphoreSlim-gated OCR engine. Per-call Tesseract instances. ConcurrentDictionary term-index cache. Handles high-throughput parallel API calls without race conditions or memory spikes.

🔑

RSA-Signed Licensing

Tamper-proof RSA-2048 license files. Offline validation — no license server calls. Enforces organisation name, expiry, and quota limits. Deployable to air-gapped networks.

500k+
Terms per field (HashSet engine)
<200ms
Avg. text redaction time
4
API endpoints (text + image + OCR + health)
Documents/day — no cloud limits
Workflow

How PiiRemover Works

Three processing paths for every scenario — REST API, Image Redaction, and Batch Loader.

📄 Text & Document API Flow

Real-time document redaction via HTTP POST

📤

Upload

POST file via REST API with API key

🔍

Extract

OCR or text extraction by MIME type

🎯

Detect

Preserve pass then PII pattern matching

✂️

Redact

Right-to-left safe text replacement

📋

Return

JSON with redacted text + match stats

🖼 Image Redaction Flow NEW

Pixel-level selective redaction for DICOM, PACS screenshots, and scanned images

🖼

Upload Image

JPEG, PNG, TIFF, BMP, WebP

🌑

Dark Detection

Auto-invert for DICOM / PACS

📐

OCR + Bounds

Per-word pixel rectangles

🎯

Match & Map

PII positions → pixel regions

🔒

Paint & Return

JPEG with black boxes over PII only

📦 Batch Loader Flow

Unattended bulk processing via BatchLoader CLI

📁

Input Folder

Drop files into configured folder

⚙️

BatchLoader

Runs on schedule or on demand

🔗

API Call

Sends each file to redact endpoint

📝

Output

UTF-8 .txt saved to output folder

Done

Originals moved to done/ folder

Integration

Drop-In REST API

Four endpoints. Any language. Any platform. Integrates with your HIS, PACS, or document management system in hours, not weeks.

  • POST /api/v1/redact/redact
    Upload any document → receive redacted text + match statistics
  • POST ★ /api/v1/redact/redact-image NEW
    Upload image → returns JPEG with PII words painted black. Headers: X-Match-Count, X-Fields-Hit
  • POST /api/v1/ocr/extract
    OCR extraction only — returns raw text from scanned documents and images
  • GET  /api/v1/health
    Health probe — status, uptime, DB, OCR chain, license validity
OpenAPI / Swagger API Key Auth Multipart Upload image/jpeg Response
# Redact a medical document (text)
curl -X POST https://your-server/api/v1/redact/redact \
  -H "X-Api-Key: your-api-key" \
  -F "file=@patient_record.pdf"
 
# Response
{ "redactedText": "Patient: ████ ID: █████████",
  "matchCount": 5, "durationMs": 142 }
# Redact a DICOM / PACS image ★ NEW
curl -X POST https://your-server/api/v1/redact/redact-image \
  -H "X-Api-Key: your-api-key" \
  -F "file=@pacs_screenshot.png" \
  --output redacted.jpg
 
# HTTP 200 image/jpeg — redacted image file
# X-Match-Count: 3
# X-Fields-Hit: Israeli ID, Patient Name, DOB
PiiRemover Admin — Live Tester & Debug
[📂 File]   [📝 Text]
────────────────────────────────
Drag file or paste text here…
Mode: ● OCR + Redact   ○ OCR only
[ Analyze File ]
 
──────────── Result ─────────────
📄 mri_referral.jpg   1.3 MB
OCR       │ Redacted
כהן שאול │ ████████
ID: 052748│ ████████
Sheba Med │ Sheba Med ← preserved
 
✦ Right-click any text to add rule
Admin Console

Full Visibility & Control

A built-in web-based administration console gives your IT team complete control without needing database access.

🔍 Live Tester & Debug NEW — file + text input, side-by-side OCR/redacted panes, right-click to add rules instantly
🖼 Image Redactor NEW — upload DICOM/PACS images, see original vs. redacted side-by-side with word-level match list
🛡 Preserve Fields NEW — inline chip editor for allow-list terms, filter bar to separate Preserve from Redact fields
👥 Client Management — create API clients, issue keys, set quotas, enable/disable access
📊 Request Logs & Audit Trail — every document logged with timing, match count, fields hit, and client identity
❤️ Health Monitor — live API health check showing DB status, OCR engine chain, uptime, and license validity
Live Tester & Debug Console

See Every Redaction Decision.

The built-in Live Tester runs the exact same engine as the API. Upload real documents, inspect OCR and redacted output side-by-side, and use the ⊞ Grid mode to navigate character offsets like a professional forensics tool.

🔒 your-server/admin/tester
📂 Drop files here or click to select
PDF · Image · TXT · DOCX — up to 50 MB each
PDF patient_record.pdf PDF discharge_summary.pdf
OCR + Redact pipeline — same engine as the API
PDF patient_record.pdf 128 KB 📍OCR: 312ms · ✓ 5 matches
📝 OCR 312ms 🔍 5 matches 🏷 3 fields ⚡ Total 448ms
Patient Name Israeli ID Phone
← RTL A− 11.5px A+ 📍 Pos ⊞ Grid Ln 2, Col 4 · Pos 22 🔍 Search… ⬇ Redacted
📝 Original — PII highlighted
00000Patient: David Cohen 00022ID: IL-1234567 00035DOB: 1985-03-14 00049Phone: 050-123-4567 00065Diagnosis: Hypertension 00087Attending: Dr. Sarah Levy
🔒 Redacted Output ⏱ 136ms
00000Patient: XXXXXXXXXXX 00022ID: XXXXXXXXXX 00035DOB: XXXXXXXXXX 00049Phone: XXXXXXXXXXXX 00065Diagnosis: Hypertension 00087Attending: XXXXXXXXXXXXXX

Grid mode — character offset gutter per line for precise position debugging  ·  📍 Pos mode — hover to see exact char position  ·  Right-click → instantly add Redact or Preserve rule

PII Fields Configuration

Define What Gets Redacted. In Full Detail.

Stack multiple detection engines per field — Regex, file lists of 500k names, label-based capture, constant lists, and more. The real admin interface, exactly as it looks.

🔒 your-server/admin/fields
PII Fields & Patterns 📖 Pattern Help
💾 Save BackupDownload fields as JSON
📂 Load BackupRestore from JSON file
Add PII Field (redact matches)
Field name
Replacement char
*
-
#
×
🛡 Add Preserve Field (never redacted)

Terms here override all rules — institution names, drug names, etc.

Field name
Terms (pipe-separated)
Field NameReplacePatternsPriorityActions
Patient Name3 patterns10
Israeli ID2 patterns20
Phone Number*1 pattern30
🛡 Medical TermsPreserve1
Names DictionaryFileList5
▾ Patterns for: Patient Name
TypePatternNotesScope / Hits
AfterLabelPatient:|שם מלא:Value after label✓ 1 hit
FileListnames_il.dat (84,320)🎯 Scope: pos 0–500✓ 2 hits
ConstListDr. Cohen|Dr. LevyKnown doctors✓ 1 hit
Getting Started Guide

Zero to First Redaction in 5 Minutes.

The built-in interactive guide — available right in your admin panel — walks every step with real before/after examples and direct links.

🔒 your-server/admin/getting-started
🚀 Getting Started
From zero to your first redacted document in under 5 minutes.
① Define a PII Field ② Add a Pattern ③ Test in Live Tester ④ Protect specific terms ⑤ Call the API ⑥ Name Lists & Scope
1
Define a PII Field — what should be redacted?
A PII Field is a named category ("Israeli ID", "Phone", "Patient Name"). Each field has a replacement character that fills the redacted space, preserving document length.
📄 Input
Patient: David Cohen ID: 123456789 Phone: 050-1234567
✅ Redacted
Patient: ███████████ ID: █████████ Phone: ███████████
💡Replacement char repeats to match original text length — positional layout is preserved. Use █, *, -, # or any character.
2
Add a Pattern — how should the field find its data?
Every field needs at least one pattern. Pick the simplest type that works — stack multiple engines per field for maximum recall.
🔤 WholeWord"David Cohen"Exact names
📋 ConstListCohen|Levy|SharonFixed term list
🔢 NumberSeqlength=9Israeli ID digits
🏷 AfterLabel"Patient:"Value after label
🔍 Regex\b\d{9}\bComplex patterns
📁 FileListnames.dat500k dictionaries
🎯 BetweenMarkers[START]..[END]Bracketed values
↩ BeginsWith+972Prefix matching
3
🛡 Preserve Fields — protect institution names, drug names
⚠ Without Preserve
Admitted to ███████ ██████ Dr. ████████
✅ With Preserve: "Hadassah"
Admitted to Hadassah Dr. ████████
5
Call the API from your application
# Upload any file, get redacted text + match positions curl -X POST https://your-server/api/v1/redact/redact \ -H "X-Api-Key: YOUR_KEY_HERE" \ -F "file=@patient_record.pdf" // Response { "matchCount": 5, "fieldsHit": ["Patient Name", "Israeli ID"], "matches": [{ "startIndex": 9, "length": 11, "fieldName": "Patient Name", "replacement": "███████████" }] }
Common Field Configurations
What to redactFieldEnginePatternReplace
Israeli ID (9 digits)Israeli IDRegex\b[0-9]{9}\b
Israeli mobilePhoneRegex0[5-9]\d[-\s]?\d{7}
Email addressEmailRegex[a-z0-9.]+@[a-z]+\.[a-z]{2,}
Patient from labelPatient NameAfterLabelPatient:|שם:
Institution (keep)🛡 InstitutionsPreserveHadassah|Ichilovnever
Security

Built for Regulated Environments

Every design decision was made with security and compliance in mind. Not bolted on — baked in.

"Patient data must never leave the hospital's network perimeter. PiiRemover is the only redaction solution we evaluated that is architecturally capable of running fully air-gapped."

— Hospital IT Director, Design Review

"The RSA-signed offline licensing model means we can deploy to isolated clinical environments without any outbound network dependency — exactly what our compliance team required."

— Healthcare Integration Architect
🏥 HIPAA Ready
🇪🇺 GDPR Compliant Architecture
🇮🇱 Israeli Privacy Law (PPPA)
🔒 RSA-2048 License Signing
🛡️ Zero Cloud Dependencies
📝 Full Audit Trail
🔑 API Key Authentication
💾 Air-Gap Deployable
Licensing

Simple, Transparent Licensing

Perpetual on-premises licenses. No subscription, no per-document fees, no cloud lock-in.

Pilot
Contact us
Evaluate in your environment with real documents
  • 90-day trial license
  • Full feature access
  • Image redaction included
  • Up to 2 API clients
  • Email support
  • 1,000 requests included
Start Pilot
Enterprise
Contact us
Hospital-wide or multi-site deployment
  • Multi-year perpetual license
  • Unlimited documents & clients
  • Custom 500k-term name lists
  • DICOM / PACS image redaction
  • Custom PII pattern development
  • HL7 / FHIR integration guidance
  • SLA-backed support
  • On-site installation
  • Source code escrow available
Talk to Us

All licenses are RSA-2048 signed, offline-validated, and deployable to air-gapped networks. Renewal is optional — the software continues working after expiry; only new licenses are blocked.

Technology

Enterprise-Grade Stack

Built on proven, supported Microsoft and open-source technologies. No exotic dependencies.

.NET 10 / ASP.NET Core

Latest LTS runtime. High performance, IIS in-process hosting, minimal footprint.

🗄️

SQLite + Dapper

Zero-config embedded database. WAL mode. No SQL Server license required.

🔍

Windows.Media.Ocr + Tesseract

Dual-engine OCR fallback chain. Dark-background auto-detection. OS-native Hebrew support.

🎨

SkiaSharp

Cross-platform 2D graphics engine. Powers pixel-level image redaction — black rect painting over OCR word bounds.

Get in Touch

Ready to Protect Your Patients' Data?

Contact us for a live demo, pilot license, or enterprise pricing.
We respond within one business day.

✉️  Info@medilis.com
Request a Demo Request Pilot License

VGS — Healthcare Technology Solutions