Automatically detect and redact patient identifiers from scanned documents, PDFs, DICOM images, and clinical screenshots — in Hebrew and English — without sending a single byte to the cloud.
Medical records contain the most sensitive personal data. Existing cloud solutions create unacceptable compliance risk.
Cloud-based redaction services require uploading patient records to third-party servers — a HIPAA and GDPR violation risk.
Every external API call is a potential breach vector. Healthcare organisations cannot afford to trust third-party data processors with PHI.
Air-gapped hospital networks, secure clinical environments, and offline deployments are incompatible with cloud-only solutions.
PiiRemover runs entirely on your infrastructure. Zero internet calls. Zero cloud dependencies. Patient data stays inside your perimeter.
Integrated OCR engine (Windows.Media.Ocr + Tesseract) processes scanned documents and images in milliseconds — no external service round-trips.
Built from the ground up for HIPAA, GDPR, and Israeli privacy law (PPPA). Every architectural decision prioritises data isolation.
Purpose-built for healthcare document workflows. No bloat, no cloud hooks, no telemetry.
Pixel-level selective redaction. OCR identifies each word's exact bounding box. PII-matched words get painted black. Everything else stays untouched — including DICOM metadata and imaging measurements.
Automatically detects black-background medical images. Luminance sampling across 2,000 pixel grid points. White-on-black PACS overlays are colour-inverted before OCR — the right engine pass is selected automatically.
Allow-list engine that runs before redaction. Define hospital names, clinic brands, drug names, or any approved term. The Preserve pass overrides all redaction patterns — zero false positives on protected terms.
HashSet word-tokenizer replaces regex alternation. O(text length) regardless of term count. 480,000-name lists run at the same speed as 100 terms. Greedy longest-match with Unicode and Hebrew support.
Built-in browser console. Upload files or type text directly. OCR result and redacted output shown side-by-side. Right-click any selection to instantly add a Redact or Preserve rule without leaving the page.
Dual-engine OCR pipeline — Windows.Media.Ocr (built-in, zero install) with Tesseract fallback. Recognises Hebrew and English in the same document. Smart language-pass selection picks the best result.
Native PDFs (PdfPig), scanned PDFs (rasterisation + OCR), JPEG, PNG, TIFF, BMP, and WebP out of the box. MIME-type-based extractor dispatch handles mixed document queues automatically.
14 pre-seeded PII types (Israeli ID, DOB, phone, email, IBAN, passport, name…). Add custom Regex, keyword-list, context-aware, or LLM patterns per client. Inline chip editor for fast updates.
Clean versioned REST endpoints for text redaction, image redaction, and OCR extraction. API-key auth. OpenAPI/Swagger docs included. Integrates with any HIS, PACS, or document management system.
Web-based backoffice to manage clients, API keys, PII patterns, preserve allow-lists, view request logs, and monitor system health — all behind secure login.
SemaphoreSlim-gated OCR engine. Per-call Tesseract instances. ConcurrentDictionary term-index cache. Handles high-throughput parallel API calls without race conditions or memory spikes.
Tamper-proof RSA-2048 license files. Offline validation — no license server calls. Enforces organisation name, expiry, and quota limits. Deployable to air-gapped networks.
Three processing paths for every scenario — REST API, Image Redaction, and Batch Loader.
Real-time document redaction via HTTP POST
POST file via REST API with API key
OCR or text extraction by MIME type
Preserve pass then PII pattern matching
Right-to-left safe text replacement
JSON with redacted text + match stats
Pixel-level selective redaction for DICOM, PACS screenshots, and scanned images
JPEG, PNG, TIFF, BMP, WebP
Auto-invert for DICOM / PACS
Per-word pixel rectangles
PII positions → pixel regions
JPEG with black boxes over PII only
Unattended bulk processing via BatchLoader CLI
Drop files into configured folder
Runs on schedule or on demand
Sends each file to redact endpoint
UTF-8 .txt saved to output folder
Originals moved to done/ folder
Four endpoints. Any language. Any platform. Integrates with your HIS, PACS, or document management system in hours, not weeks.
/api/v1/redact/redact/api/v1/redact/redact-image
NEWX-Match-Count, X-Fields-Hit
/api/v1/ocr/extract/api/v1/healthA built-in web-based administration console gives your IT team complete control without needing database access.
The built-in Live Tester runs the exact same engine as the API. Upload real documents, inspect OCR and redacted output side-by-side, and use the ⊞ Grid mode to navigate character offsets like a professional forensics tool.
⊞ Grid mode — character offset gutter per line for precise position debugging · 📍 Pos mode — hover to see exact char position · Right-click → instantly add Redact or Preserve rule
Stack multiple detection engines per field — Regex, file lists of 500k names, label-based capture, constant lists, and more. The real admin interface, exactly as it looks.
Terms here override all rules — institution names, drug names, etc.
The built-in interactive guide — available right in your admin panel — walks every step with real before/after examples and direct links.
Every design decision was made with security and compliance in mind. Not bolted on — baked in.
"Patient data must never leave the hospital's network perimeter. PiiRemover is the only redaction solution we evaluated that is architecturally capable of running fully air-gapped."
"The RSA-signed offline licensing model means we can deploy to isolated clinical environments without any outbound network dependency — exactly what our compliance team required."
Perpetual on-premises licenses. No subscription, no per-document fees, no cloud lock-in.
All licenses are RSA-2048 signed, offline-validated, and deployable to air-gapped networks. Renewal is optional — the software continues working after expiry; only new licenses are blocked.
Built on proven, supported Microsoft and open-source technologies. No exotic dependencies.
Latest LTS runtime. High performance, IIS in-process hosting, minimal footprint.
Zero-config embedded database. WAL mode. No SQL Server license required.
Dual-engine OCR fallback chain. Dark-background auto-detection. OS-native Hebrew support.
Cross-platform 2D graphics engine. Powers pixel-level image redaction — black rect painting over OCR word bounds.
Contact us for a live demo, pilot license, or enterprise pricing.
We respond within one business day.
VGS — Healthcare Technology Solutions