|
Single HTML · ~262 KB · open locally |
Integrity check · 89 bytes |
Release notes, older builds |
Double-click the downloaded HTML to open in your browser.
Verify with shasum -a 256 -c document-redactor.html.sha256 first.
Offline DOCX redaction for legal work.
Open one local HTML file, review deterministic matches, and download a verified .redacted.docx
without sending the source document anywhere.
Important
This is the safety step before AI. document-redactor is intentionally not AI-powered. It is the local pre-upload filter you run before a contract, memo, pleading, or court document goes into any LLM.
|
One file The shipped product is document-redactor.html. No installer, no backend, no asset tree, no auto-update channel.
|
Rule-based Detection is deterministic, auditable, and regression-testable. No remote inference and no hidden model behavior. |
Local-only The app opens as a file:// page, uses strict CSP, and is built around a zero-network runtime model.
|
Verified output Redaction is not trusted blindly. The output DOCX is re-parsed and checked before download. |
- Input files are capped at 50 MB.
- Any single decompressed ZIP entry is capped at 20 MB.
- DOCX relationship files (
*.rels) are checked during verification. - External
http://andhttps://relationship targets are stripped from output, and selected literals found in relationship targets are repaired before verification.
Legal teams increasingly want to send contracts, pleadings, memos, and court documents into AI assistants for summary, issue spotting, or clause review. The blocker is obvious: those files contain company names, people, phone numbers, IDs, bank data, case references, and other strings you should not upload raw.
Manual redaction inside Word is slow, repetitive, and easy to get wrong.
document-redactor turns that pre-upload cleanup into a local workflow:
- Open one HTML file from disk.
- Drop a
.docx. - Review grouped candidates and inline highlights.
- Apply redaction.
- Download a verified
.redacted.docx.
| What it is | What it is not |
|---|---|
| An offline browser tool for legal DOCX redaction | A cloud redaction service |
| One downloadable HTML artifact plus a hash sidecar | An installer, daemon, or desktop app |
| A rule-based, deterministic review-and-redact pipeline | An AI model or probabilistic black box |
| A product whose artifact and source can be audited directly | A system you must trust without inspection |
| A pre-AI safety layer | A replacement for your downstream AI assistant |
flowchart TD
A["📄 <b>Open</b><br/>document-redactor.html"] --> B["📥 <b>Drop</b><br/>your .docx file"]
B --> C["⚙️ <b>Parse</b><br/>ZIP + XML<br/>locally in-browser"]
C --> D["🔍 <b>Detect</b><br/>candidates via<br/>deterministic rules"]
D --> E["👀 <b>Review</b><br/>by section +<br/>inline preview"]
E --> F["✂️ <b>Apply</b> redaction<br/>+ scrub metadata<br/>+ strip fields"]
F --> G{"✅ <b>Verify</b><br/>round-trip scan<br/>+ rels check"}
G -->|clean| H["💾 <b>Download</b><br/>.redacted.docx<br/>+ SHA-256 sidecar"]
G -->|leak detected| I["🔴 Risk review<br/>+ jump to<br/>survived item"]
classDef default fill:#0f172a,stroke:#1e3a5f,stroke-width:2px,color:#f8fafc;
classDef action fill:#0f766e,stroke:#14b8a6,color:#ffffff;
classDef verify fill:#1d4ed8,stroke:#60a5fa,color:#ffffff;
classDef success fill:#166534,stroke:#22c55e,color:#ffffff;
classDef fail fill:#991b1b,stroke:#ef4444,color:#ffffff;
class F action;
class G verify;
class H success;
class I fail;
Artifactdocument-redactor.html
|
Current checked size 262 KB 268,571 bytes |
Integrity sidecar 89 bytes |
Runtime network calls 0 |
Automated coverage 1,700+ tests |
Current checked release artifact on April 30, 2026:
document-redactor.htmlSHA-256:363d7c93008038a6e56137ab0a43251771f8911c7d7aad6e21cd6771a6a8003a- Verified locally with
shasum -a 256 -c document-redactor.html.sha256
|
Local DOCX traversal Walks body, headers, footers, footnotes, endnotes, comments, and relationship references inside the DOCX package. |
Structured review UX Groups candidates by parties, aliases, identifiers, amounts, dates, entities, case/docket references, heuristics, and catch-all additions. |
Verification-guided export Re-checks the generated output, reports residual survivors clearly, and keeps warnings separate from verified-clean downloads. |
|
Inline preview Shows the document text with selection-aware highlights so review happens in context, not in a blind list. |
OOXML leak hardening Flattens risky field and hyperlink structures, strips comments, scrubs metadata, and normalizes redaction across split runs. |
Manual recovery paths Lets users add missed strings, reuse local policy JSON files, jump back to surviving items, and acknowledge residual risk when they still need the file. |
For the public detection catalog, see docs/RULES_GUIDE.md.
- Easier to use: download once, double-click, redact.
- Easier to audit: one shipped artifact, not a service mesh.
- Easier to distribute: GitHub Releases, USB, email, Kakao, shared drives.
- Easier to trust: no backend means no server-side document path to defend.
- Sensitive documents never need model inference.
- Behavior is deterministic and explainable.
- Regression testing is straightforward.
- The artifact stays small enough to remain practical as a local HTML tool.
This choice is deliberate. The AI assistant comes after redaction, not inside it.
DOCX files are ZIP archives of XML parts. Using JSZip plus direct WordprocessingML traversal gives the project the control it needs to:
- detect matches across split text runs,
- scan more than just the body text,
- rewrite only the affected segments,
- verify the exact output it produces.
The UI needs to feel modern without blowing up the artifact. Svelte 5 and vite-plugin-singlefile give the project:
- fast local interactivity,
- a small runtime footprint,
- one-file packaging that still supports a real review workflow.
sha256sum -c document-redactor.html.sha256
# expected output:
# document-redactor.html: OKIf sha256sum is not available on your Mac:
shasum -a 256 -c document-redactor.html.sha256Double-click document-redactor.html. It opens as a file:// page in your browser. There is no install step and no account setup.
- Drop a
.docx - Review candidates
- Click
Apply and verify - Download
{original}.redacted.docx
For a detailed walkthrough, see USAGE.md. For the Korean guide, see USAGE.ko.md.
| Layer | Mechanism | Why it matters |
|---|---|---|
| Source | ESLint bans fetch, XMLHttpRequest, WebSocket, EventSource, sendBeacon, and similar primitives |
Network code is stopped before it casually enters the app |
| Build | Single-file ship gate rejects external JS or CSS references and writes a SHA-256 sidecar | The release stays auditable as one artifact |
| Runtime | Embedded CSP uses default-src 'none' and connect-src 'none' |
The browser blocks outbound requests at execution time |
| Export | Round-trip verification re-parses the generated DOCX | The app does not silently ship a leaky output |
Note
The privacy story here is not just policy language. It is enforced in source code, build rules, runtime policy, and export verification.
| Layer | Choice | Why this choice |
|---|---|---|
| Distribution | Single document-redactor.html + .sha256 |
Simplest release artifact and easiest thing to verify |
| Package manager | Bun 1.x | Fast local workflow and a light toolchain |
| Build | Vite 8 | Clear plugin hooks and dependable modern bundling |
| Single-file packaging | vite-plugin-singlefile |
Inlines JS and CSS into one shipped HTML file |
| UI | Svelte 5 | Fine-grained reactivity with a small runtime |
| DOCX engine | JSZip + raw OOXML traversal |
Precise control over read, rewrite, and verify |
| Detection | Rule-based regex + structural classifiers | Deterministic, inspectable, lightweight |
| Verification | Round-trip scan + word-count sanity + SHA-256 | Catches leaks, flags suspicious over-redaction, verifies artifacts |
| Quality gates | Vitest + strict TypeScript + svelte-check |
Strong regression safety for a trust-sensitive product |
- Product docs: README.ko.md, USAGE.md, USAGE.ko.md, docs/RULES_GUIDE.md
- Source:
src/ - Release output:
document-redactor.html - Integrity file:
document-redactor.html.sha256
Internal phase briefs and planning notes are intentionally being removed from the public git surface going forward.
- DOCX only. PDF requires a different pipeline.
- The preview is review-oriented, not a pixel-faithful Word layout clone.
Standardis the only implemented redaction level today.- No OCR for text embedded inside images.
- No traversal into embedded OLE objects.
- No macros/VBA or encrypted/password-protected DOCX packages.
- No SmartArt or WordArt extraction.
git clone https://github.com/kipeum86/document-redactor.git
cd document-redactor
bun install
bun run test
bun run typecheck
bun run lint
bun run build
open dist/document-redactor.htmlNotes:
- For browser QA, test the built
dist/document-redactor.html, not the dev server. - The repository currently carries 1,700+ automated tests across detection, DOCX rewriting, verification, UI state, and ship gates.
dist/is ignored in git; releases should publish the built HTML and its.sha256sidecar from CI or from a verified local build.
Built by @kipeum86.