PiiRemover — On-Premises PII Redaction Platform

✨ New in This Release

Five Major Capabilities.
Built for Real Clinical Environments.

This release brings pixel-level image redaction, intelligent dark-background OCR, allow-list Preserve Fields, a half-million-term performance engine, and a built-in debug console — everything a healthcare IT team actually needs.

🖼

Pixel-Level Image Redaction

Upload a DICOM screenshot, PACS image, or scanned document. PiiRemover OCRs the image, runs your PII rules, and paints black rectangles over only the matched words — leaving all other image content pixel-perfect and intact.

🌑

Smart Dark-Background OCR

Clinical workstations and PACS viewers render white text on black. PiiRemover automatically detects dark-background images, inverts colours before OCR, and picks the best pass — dramatically improving recognition on medical imaging systems.

🛡

Preserve Fields — Intelligent Allow-List

Define terms that must never be redacted: hospital names, clinic brands, equipment identifiers. The Preserve engine runs before redaction and protects approved terms even when they match a PII pattern — zero false positives on your own brand.

⚡

500,000-Term HashSet Engine

The redesigned pattern engine replaces regex alternation with a HashSet word-tokenizer: O(text length) matching regardless of term count. A 480,000-name list runs at the same speed as 100 terms. Multi-word names and Unicode/Hebrew supported.

🔍

Live Tester & Debug Console

Browser-based testing tool built into the admin panel. Upload files or paste text directly. See OCR output side-by-side with the redacted result. Right-click any un-redacted word to instantly add it as a Redact or Preserve rule — without leaving the page.

✏️

Inline Pattern Editor

Edit keyword lists directly in the admin panel — no more CSV uploads for small changes. Add or remove individual terms with a chip editor, or bulk-paste a pipe-separated list. Changes invalidate the cache and take effect on the next request.

The Challenge

Healthcare Data Is the #1 Target

Medical records contain the most sensitive personal data. Existing cloud solutions create unacceptable compliance risk.

☁️

Cloud Exposure

Cloud-based redaction services require uploading patient records to third-party servers — a HIPAA and GDPR violation risk.

🔓

Data Breach Risk

Every external API call is a potential breach vector. Healthcare organisations cannot afford to trust third-party data processors with PHI.

🌐

Internet Dependency

Air-gapped hospital networks, secure clinical environments, and offline deployments are incompatible with cloud-only solutions.

🏥

Fully On-Premises

PiiRemover runs entirely on your infrastructure. Zero internet calls. Zero cloud dependencies. Patient data stays inside your perimeter.

⚡

Real-Time Processing

Integrated OCR engine (Windows.Media.Ocr + Tesseract) processes scanned documents and images in milliseconds — no external service round-trips.

🛡️

Compliance-First Design

Built from the ground up for HIPAA, GDPR, and Israeli privacy law (PPPA). Every architectural decision prioritises data isolation.

Capabilities

Everything You Need. Nothing More.

Purpose-built for healthcare document workflows. No bloat, no cloud hooks, no telemetry.

🖼

Image RedactionNEW

Pixel-level selective redaction. OCR identifies each word's exact bounding box. PII-matched words get painted black. Everything else stays untouched — including DICOM metadata and imaging measurements.

🌑

Dark-Background OCRNEW

Automatically detects black-background medical images. Luminance sampling across 2,000 pixel grid points. White-on-black PACS overlays are colour-inverted before OCR — the right engine pass is selected automatically.

🛡

Preserve FieldsNEW

Allow-list engine that runs before redaction. Define hospital names, clinic brands, drug names, or any approved term. The Preserve pass overrides all redaction patterns — zero false positives on protected terms.

⚡

500k-Term EngineNEW

HashSet word-tokenizer replaces regex alternation. O(text length) regardless of term count. 480,000-name lists run at the same speed as 100 terms. Greedy longest-match with Unicode and Hebrew support.

🔍

Live Tester & DebugNEW

Built-in browser console. Upload files or type text directly. OCR result and redacted output shown side-by-side. Right-click any selection to instantly add a Redact or Preserve rule without leaving the page.

🔍

Multi-Language OCR

Dual-engine OCR pipeline — Windows.Media.Ocr (built-in, zero install) with Tesseract fallback. Recognises Hebrew and English in the same document. Smart language-pass selection picks the best result.

📄

PDF & Image Support

Native PDFs (PdfPig), scanned PDFs (rasterisation + OCR), JPEG, PNG, TIFF, BMP, and WebP out of the box. MIME-type-based extractor dispatch handles mixed document queues automatically.

🎯

Configurable PII Fields

14 pre-seeded PII types (Israeli ID, DOB, phone, email, IBAN, passport, name…). Add custom Regex, keyword-list, context-aware, or LLM patterns per client. Inline chip editor for fast updates.

🔗

REST API

Clean versioned REST endpoints for text redaction, image redaction, and OCR extraction. API-key auth. OpenAPI/Swagger docs included. Integrates with any HIS, PACS, or document management system.

⚙️

Admin Console

Web-based backoffice to manage clients, API keys, PII patterns, preserve allow-lists, view request logs, and monitor system health — all behind secure login.

🧵

Thread-Safe & Concurrent

SemaphoreSlim-gated OCR engine. Per-call Tesseract instances. ConcurrentDictionary term-index cache. Handles high-throughput parallel API calls without race conditions or memory spikes.

🔑

RSA-Signed Licensing

Tamper-proof RSA-2048 license files. Offline validation — no license server calls. Enforces organisation name, expiry, and quota limits. Deployable to air-gapped networks.

Workflow

How PiiRemover Works

Three processing paths for every scenario — REST API, Image Redaction, and Batch Loader.

📄 Text & Document API Flow

Real-time document redaction via HTTP POST

📤

Upload

POST file via REST API with API key

→

🔍

Extract

OCR or text extraction by MIME type

→

🎯

Detect

Preserve pass then PII pattern matching

→

✂️

Redact

Right-to-left safe text replacement

→

📋

Return

JSON with redacted text + match stats

🖼 Image Redaction Flow NEW

Pixel-level selective redaction for DICOM, PACS screenshots, and scanned images

🖼

Upload Image

JPEG, PNG, TIFF, BMP, WebP

→

🌑

Dark Detection

Auto-invert for DICOM / PACS

→

📐

OCR + Bounds

Per-word pixel rectangles

→

🎯

Match & Map

PII positions → pixel regions

→

🔒

Paint & Return

JPEG with black boxes over PII only

📦 Batch Loader Flow

Unattended bulk processing via BatchLoader CLI

📁

Input Folder

Drop files into configured folder

→

⚙️

BatchLoader

Runs on schedule or on demand

→

🔗

API Call

Sends each file to redact endpoint

→

📝

Output

UTF-8 .txt saved to output folder

→

✅

Done

Originals moved to done/ folder

Integration

Drop-In REST API

Four endpoints. Any language. Any platform. Integrates with your HIS, PACS, or document management system in hours, not weeks.

POST /api/v1/redact/redact
Upload any document → receive redacted text + match statistics
POST ★ /api/v1/redact/redact-image NEW
Upload image → returns JPEG with PII words painted black. Headers: X-Match-Count, X-Fields-Hit
POST /api/v1/ocr/extract
OCR extraction only — returns raw text from scanned documents and images
GET /api/v1/health
Health probe — status, uptime, DB, OCR chain, license validity

OpenAPI / Swagger API Key Auth Multipart Upload image/jpeg Response

# Redact a medical document (text)

curl -X POST https://your-server/api/v1/redact/redact \

  -H "X-Api-Key: your-api-key" \

  -F "file=@patient_record.pdf"

# Response

{ "redactedText": "Patient: ████ ID: █████████",

  "matchCount": 5, "durationMs": 142 }

# Redact a DICOM / PACS image ★ NEW

curl -X POST https://your-server/api/v1/redact/redact-image \

  -H "X-Api-Key: your-api-key" \

  -F "file=@pacs_screenshot.png" \

  --output redacted.jpg

# HTTP 200 image/jpeg — redacted image file

# X-Match-Count: 3

# X-Fields-Hit: Israeli ID, Patient Name, DOB

PiiRemover Admin — Live Tester & Debug

[📂 File] [📝 Text]
────────────────────────────────
Drag file or paste text here…
Mode: ● OCR + Redact ○ OCR only
[ Analyze File ]

──────────── Result ─────────────
📄 mri_referral.jpg 1.3 MB
OCR │ Redacted
כהן שאול │ ████████
ID: 052748│ ████████
Sheba Med │ Sheba Med ← preserved

✦ Right-click any text to add rule

Admin Console

Full Visibility & Control

A built-in web-based administration console gives your IT team complete control without needing database access.

🔍 Live Tester & Debug NEW — file + text input, side-by-side OCR/redacted panes, right-click to add rules instantly

🖼 Image Redactor NEW — upload DICOM/PACS images, see original vs. redacted side-by-side with word-level match list

🛡 Preserve Fields NEW — inline chip editor for allow-list terms, filter bar to separate Preserve from Redact fields

👥 Client Management — create API clients, issue keys, set quotas, enable/disable access

📊 Request Logs & Audit Trail — every document logged with timing, match count, fields hit, and client identity

❤️ Health Monitor — live API health check showing DB status, OCR engine chain, uptime, and license validity

Live Tester & Debug Console

See Every Redaction Decision.

The built-in Live Tester runs the exact same engine as the API. Upload real documents, inspect OCR and redacted output side-by-side, and use the ⊞ Grid mode to navigate character offsets like a professional forensics tool.

🔒 your-server/admin/tester

🔒 PiiRemover Dashboard 🔍 Tester & Debug Clients PII Fields Logs ⚙ Settings v3.0.11

📂 Drop files here or click to select

PDF · Image · TXT · DOCX — up to 50 MB each

PDF patient_record.pdf PDF discharge_summary.pdf

OCR + Redact pipeline — same engine as the API

PDF patient_record.pdf 128 KB 📍OCR: 312ms · ✓ 5 matches

📝 OCR 312ms 🔍 5 matches 🏷 3 fields ⚡ Total 448ms

Patient Name Israeli ID Phone

📝 Original — PII highlighted

00000Patient: David Cohen 00022ID: IL-1234567 00035DOB: 1985-03-14 00049Phone: 050-123-4567 00065Diagnosis: Hypertension 00087Attending: Dr. Sarah Levy

🔒 Redacted Output ⏱ 136ms

00000Patient: XXXXXXXXXXX 00022ID: XXXXXXXXXX 00035DOB: XXXXXXXXXX 00049Phone: XXXXXXXXXXXX 00065Diagnosis: Hypertension 00087Attending: XXXXXXXXXXXXXX

⊞ Grid mode — character offset gutter per line for precise position debugging · 📍 Pos mode — hover to see exact char position · Right-click → instantly add Redact or Preserve rule

PII Fields Configuration

Define What Gets Redacted. In Full Detail.

Stack multiple detection engines per field — Regex, file lists of 500k names, label-based capture, constant lists, and more. The real admin interface, exactly as it looks.

🔒 your-server/admin/fields

🔒 PiiRemover Dashboard 🔍 Tester PII Fields Logs ⚙ Settings v3.0.11

PII Fields & Patterns 📖 Pattern Help

💾 Save BackupDownload fields as JSON

📂 Load BackupRestore from JSON file

Add PII Field (redact matches)

Field name

Replacement char

█

*

-

#

•

×

🛡 Add Preserve Field (never redacted)

Terms here override all rules — institution names, drug names, etc.

Field name

Terms (pipe-separated)

Field NameReplacePatternsPriorityActions

Patient Name█3 patterns10

Israeli ID█2 patterns20

Phone Number*1 pattern30

🛡 Medical Terms—Preserve1

Names Dictionary█FileList5

▾ Patterns for: Patient Name

TypePatternNotesScope / Hits

AfterLabelPatient:|שם מלא:Value after label✓ 1 hit

FileListnames_il.dat (84,320)🎯 Scope: pos 0–500✓ 2 hits

ConstListDr. Cohen|Dr. LevyKnown doctors✓ 1 hit

Getting Started Guide

Zero to First Redaction in 5 Minutes.

The built-in interactive guide — available right in your admin panel — walks every step with real before/after examples and direct links.

🔒 your-server/admin/getting-started

🔒 PiiRemover Dashboard 🔍 Tester PII Fields Logs ⚙ Settings 🚀 Getting Started v3.0.11

🚀 Getting Started

From zero to your first redacted document in under 5 minutes.

① Define a PII Field ② Add a Pattern ③ Test in Live Tester ④ Protect specific terms ⑤ Call the API ⑥ Name Lists & Scope

1

Define a PII Field — what should be redacted?

A PII Field is a named category ("Israeli ID", "Phone", "Patient Name"). Each field has a replacement character that fills the redacted space, preserving document length.

📄 Input

Patient: David Cohen ID: 123456789 Phone: 050-1234567

→

✅ Redacted

Patient: ███████████ ID: █████████ Phone: ███████████

💡Replacement char repeats to match original text length — positional layout is preserved. Use █, *, -, # or any character.

2

Add a Pattern — how should the field find its data?

Every field needs at least one pattern. Pick the simplest type that works — stack multiple engines per field for maximum recall.

🔤 WholeWord"David Cohen"Exact names

📋 ConstListCohen|Levy|SharonFixed term list

🔢 NumberSeqlength=9Israeli ID digits

🏷 AfterLabel"Patient:"Value after label

🔍 Regex\b\d{9}\bComplex patterns

📁 FileListnames.dat500k dictionaries

🎯 BetweenMarkers[START]..[END]Bracketed values

↩ BeginsWith+972Prefix matching

3

🛡 Preserve Fields — protect institution names, drug names

⚠ Without Preserve

Admitted to ███████ ██████ Dr. ████████

→

✅ With Preserve: "Hadassah"

Admitted to Hadassah Dr. ████████

5

Call the API from your application

# Upload any file, get redacted text + match positions curl -X POST https://your-server/api/v1/redact/redact \ -H "X-Api-Key: YOUR_KEY_HERE" \ -F "file=@patient_record.pdf" // Response { "matchCount": 5, "fieldsHit": ["Patient Name", "Israeli ID"], "matches": [{ "startIndex": 9, "length": 11, "fieldName": "Patient Name", "replacement": "███████████" }] }

Common Field Configurations

What to redactFieldEnginePatternReplace

Israeli ID (9 digits)Israeli IDRegex\b[0-9]{9}\b█

Israeli mobilePhoneRegex0[5-9]\d[-\s]?\d{7}█

Email addressEmailRegex[a-z0-9.]+@[a-z]+\.[a-z]{2,}█

Patient from labelPatient NameAfterLabelPatient:|שם:█

Institution (keep)🛡 InstitutionsPreserveHadassah|Ichilovnever

Security

Built for Regulated Environments

Every design decision was made with security and compliance in mind. Not bolted on — baked in.

"Patient data must never leave the hospital's network perimeter. PiiRemover is the only redaction solution we evaluated that is architecturally capable of running fully air-gapped."

— Hospital IT Director, Design Review

"The RSA-signed offline licensing model means we can deploy to isolated clinical environments without any outbound network dependency — exactly what our compliance team required."

— Healthcare Integration Architect

🏥 HIPAA Ready

🇪🇺 GDPR Compliant Architecture

🇮🇱 Israeli Privacy Law (PPPA)

🔒 RSA-2048 License Signing

🛡️ Zero Cloud Dependencies

📝 Full Audit Trail

🔑 API Key Authentication

💾 Air-Gap Deployable

Licensing

Simple, Transparent Licensing

Perpetual on-premises licenses. No subscription, no per-document fees, no cloud lock-in.

Pilot

Contact us

Evaluate in your environment with real documents

90-day trial license
Full feature access
Image redaction included
Up to 2 API clients
Email support
1,000 requests included

Start Pilot

Enterprise-Grade Stack

Built on proven, supported Microsoft and open-source technologies. No exotic dependencies.

⚡

.NET 10 / ASP.NET Core

Latest LTS runtime. High performance, IIS in-process hosting, minimal footprint.

🗄️

SQLite + Dapper

Zero-config embedded database. WAL mode. No SQL Server license required.

🔍

Windows.Media.Ocr + Tesseract

Dual-engine OCR fallback chain. Dark-background auto-detection. OS-native Hebrew support.

🎨

SkiaSharp

Cross-platform 2D graphics engine. Powers pixel-level image redaction — black rect painting over OCR word bounds.

Get in Touch

Ready to Protect Your Patients' Data?

Contact us for a live demo, pilot license, or enterprise pricing.
We respond within one business day.

✉️ Info@medilis.com

Request a Demo Request Pilot License

VGS — Healthcare Technology Solutions

Enterprise PII Redaction Built for Healthcare

Five Major Capabilities. Built for Real Clinical Environments.

Pixel-Level Image Redaction

Smart Dark-Background OCR

Preserve Fields — Intelligent Allow-List

500,000-Term HashSet Engine

Live Tester & Debug Console

Inline Pattern Editor

Healthcare Data Is the #1 Target

Cloud Exposure

Data Breach Risk

Internet Dependency

Fully On-Premises

Real-Time Processing

Compliance-First Design

Everything You Need. Nothing More.

Image RedactionNEW

Dark-Background OCRNEW

Preserve FieldsNEW

500k-Term EngineNEW

Live Tester & DebugNEW

Multi-Language OCR

PDF & Image Support

Configurable PII Fields

REST API

Admin Console

Thread-Safe & Concurrent

RSA-Signed Licensing

How PiiRemover Works

📄 Text & Document API Flow

Upload

Extract

Detect

Redact

Return

🖼 Image Redaction Flow NEW

Upload Image

Dark Detection

OCR + Bounds

Match & Map

Paint & Return

📦 Batch Loader Flow

Input Folder

BatchLoader

API Call

Output

Done

Drop-In REST API

Full Visibility & Control

See Every Redaction Decision.

Define What Gets Redacted. In Full Detail.

Zero to First Redaction in 5 Minutes.

Built for Regulated Environments

Simple, Transparent Licensing

Enterprise-Grade Stack

.NET 10 / ASP.NET Core

SQLite + Dapper

Windows.Media.Ocr + Tesseract

SkiaSharp

Ready to Protect Your Patients' Data?

Enterprise PII Redaction
Built for Healthcare

Five Major Capabilities.
Built for Real Clinical Environments.