Document Intelligence
Upload an Indonesian document — KTP, NPWP, NIB, faktur pajak, invoice, contract, akta — get back structured fields with confidence scores and bounding boxes.
Branded as Quantum AI in customer-facing surfaces. The internal model stack is intentionally not exposed.
What it does
| Capability | API |
|---|---|
| Upload + auto-classify | POST /api/documents/upload (multipart) |
| Extract by template | Auto-applied when classification has high confidence |
| Templates (custom schemas) | POST /api/templates — define your own field set |
| Re-extract with overrides | POST /api/documents/{id}/reextract |
| Analyst correction | POST /api/documents/{id}/extractions (manual fields) |
| Export | GET /api/documents/{id}/export (JSON or CSV) |
| Webhooks | Push on extracted, requires_review, failed |
Supported document types
| Group | Types |
|---|---|
| Indonesian IDs | ktp · npwp · kartu_keluarga · sim · passport · nib · bpkb |
| Civil registry | akta_lahir · surat_nikah |
| Commercial | invoice · purchase_order · delivery_note · receipt · faktur_pajak · slip_gaji · contract |
| Financial | rekening_koran · financial_statement |
| Generic | generic_ocr · other |
Each type has a curated extraction template — known field set, known formats, known validation rules. For documents that don't fit a known template, generic_ocr extracts free-form text + a best-effort key-value pass.
Use the right product for the right document
For KTP identity verification (anti-spoof + tamper + face enrollment), use Identity Platform → KTP capture. Document Intelligence's KTP type returns the OCR fields but does not run liveness or face enrollment. Two products, two purposes.
Core concepts
| Concept | What it is |
|---|---|
| Document | A single uploaded file. Has a type (classified or hint-provided), a status, and ≥0 extractions. |
| Extraction | A structured field set with confidence + bounding boxes. Multiple per document (model-extracted, template-extracted, analyst-corrected). |
| Template | A user-defined schema for a custom document type. Specifies fields, types, and required-ness. |
| Confidence | 0–100. Below the org's requiresReviewThreshold (default 70), the document is queued for analyst review. |
Common integration shape
- 1POST
/api/documents/uploadUpload — multipart with `file`. Returns `{ document: { id, status: "queued" } }`. - 2Async processing.Classify → Extract → `status: "extracted"` or `"requires_review"`.
- 3Get result.Subscribe to the `document.extracted` webhook, or poll `GET /api/documents/{id}` until status flips.
- 4GET
/api/documents/{id}/extractionsRead structured fields — or rely on the webhook payload.
Endpoints at a glance
| Group | Endpoints |
|---|---|
| Documents | GET/POST /api/documents · POST /api/documents/upload · POST /api/documents/upload-json · GET /api/documents/{id} |
| Document actions | GET /api/documents/{id}/download · GET /api/documents/{id}/export · POST /api/documents/{id}/erase · POST /api/documents/{id}/reextract |
| Extractions | GET/POST /api/documents/{id}/extractions · GET /api/extractions/{id} |
| Templates | GET/POST /api/templates · GET/PATCH/DELETE /api/templates/{id} |
| Search | GET /api/search?q=... — min 3 chars; case-insensitive across filename, customer name, extracted-field values |
| Webhooks | GET/POST /api/webhooks · POST /api/webhooks/{id}/test |
| API keys | GET/POST /api/api-keys |
| Health | GET /api/healthz |
| Admin / audit (dashboard) | GET /api/activity · GET /api/audit-log · GET /api/metrics · GET/POST /api/roles · GET/POST /api/members · GET/POST /api/invites · GET/POST /api/integrations |
Routes marked (dashboard) are intended for analyst UI use and require an internal-scope key.
Production considerations
| Concern | Answer |
|---|---|
| Data residency | Uploaded PDFs and extracted JSON live in id-jkt-1. PDFs are sealed at rest with AES-256-GCM in org-isolated storage. Never replicated cross-border. |
| Retention | Original PDFs: configurable per org (default 90 days). Extracted JSON: retained indefinitely (regulated record). Set retention via PATCH /api/organization. |
| Model / parser | Per-document-type parser pipeline. AI fallback runs as Quantum AI; internal model stack not exposed. Premium tier (>20MB files, faster) on enterprise plans. |
| Rate limits | Upload: 60/min/org (sandbox), 600/min (production). Re-extraction: 10/min/document. Search: 30/min. |
| Idempotency | Upload deduplicates on SHA-256 of file content — identical bytes within 24h return the original documentId without re-charging. Re-extractions intentionally don't dedup; each call creates a new extraction row (so you can compare prompt versions). |
| Audit | Every upload, extraction, template change, erase, member action audit-logged. Immutable, 7 years. Filter via GET /api/audit-log. |
| Erasure | POST /api/documents/{id}/erase cryptographically shreds the sealed PDF + clears extraction PII (UU PDP-compliant). |
| Webhook signing | HMAC-SHA256 over the raw body, header X-DocInt-Signature: sha256=<hex>. Replay protection: include X-DocInt-Timestamp in your signed-payload check + reject deliveries >5min old. |