📘 Public beta · Endpoints are stable; OpenAPI specs and SDKs ship monthly. See changelog →
Products
Document Intelligence
Overview
Document Intelligence

Document Intelligence

Upload an Indonesian document — KTP, NPWP, NIB, faktur pajak, invoice, contract, akta — get back structured fields with confidence scores and bounding boxes.

Branded as Quantum AI in customer-facing surfaces. The internal model stack is intentionally not exposed.

What it does

CapabilityAPI
Upload + auto-classifyPOST /api/documents/upload (multipart)
Extract by templateAuto-applied when classification has high confidence
Templates (custom schemas)POST /api/templates — define your own field set
Re-extract with overridesPOST /api/documents/{id}/reextract
Analyst correctionPOST /api/documents/{id}/extractions (manual fields)
ExportGET /api/documents/{id}/export (JSON or CSV)
WebhooksPush on extracted, requires_review, failed

Supported document types

GroupTypes
Indonesian IDsktp · npwp · kartu_keluarga · sim · passport · nib · bpkb
Civil registryakta_lahir · surat_nikah
Commercialinvoice · purchase_order · delivery_note · receipt · faktur_pajak · slip_gaji · contract
Financialrekening_koran · financial_statement
Genericgeneric_ocr · other

Each type has a curated extraction template — known field set, known formats, known validation rules. For documents that don't fit a known template, generic_ocr extracts free-form text + a best-effort key-value pass.

Use the right product for the right document

For KTP identity verification (anti-spoof + tamper + face enrollment), use Identity Platform → KTP capture. Document Intelligence's KTP type returns the OCR fields but does not run liveness or face enrollment. Two products, two purposes.

Core concepts

ConceptWhat it is
DocumentA single uploaded file. Has a type (classified or hint-provided), a status, and ≥0 extractions.
ExtractionA structured field set with confidence + bounding boxes. Multiple per document (model-extracted, template-extracted, analyst-corrected).
TemplateA user-defined schema for a custom document type. Specifies fields, types, and required-ness.
Confidence0–100. Below the org's requiresReviewThreshold (default 70), the document is queued for analyst review.

Common integration shape

  1. 1
    POST/api/documents/upload
    Upload — multipart with `file`. Returns `{ document: { id, status: "queued" } }`.
  2. 2
    Async processing.
    Classify → Extract → `status: "extracted"` or `"requires_review"`.
  3. 3
    Get result.
    Subscribe to the `document.extracted` webhook, or poll `GET /api/documents/{id}` until status flips.
  4. 4
    GET/api/documents/{id}/extractions
    Read structured fields — or rely on the webhook payload.

Endpoints at a glance

GroupEndpoints
DocumentsGET/POST /api/documents · POST /api/documents/upload · POST /api/documents/upload-json · GET /api/documents/{id}
Document actionsGET /api/documents/{id}/download · GET /api/documents/{id}/export · POST /api/documents/{id}/erase · POST /api/documents/{id}/reextract
ExtractionsGET/POST /api/documents/{id}/extractions · GET /api/extractions/{id}
TemplatesGET/POST /api/templates · GET/PATCH/DELETE /api/templates/{id}
SearchGET /api/search?q=... — min 3 chars; case-insensitive across filename, customer name, extracted-field values
WebhooksGET/POST /api/webhooks · POST /api/webhooks/{id}/test
API keysGET/POST /api/api-keys
HealthGET /api/healthz
Admin / audit (dashboard)GET /api/activity · GET /api/audit-log · GET /api/metrics · GET/POST /api/roles · GET/POST /api/members · GET/POST /api/invites · GET/POST /api/integrations

Routes marked (dashboard) are intended for analyst UI use and require an internal-scope key.

Production considerations

ConcernAnswer
Data residencyUploaded PDFs and extracted JSON live in id-jkt-1. PDFs are sealed at rest with AES-256-GCM in org-isolated storage. Never replicated cross-border.
RetentionOriginal PDFs: configurable per org (default 90 days). Extracted JSON: retained indefinitely (regulated record). Set retention via PATCH /api/organization.
Model / parserPer-document-type parser pipeline. AI fallback runs as Quantum AI; internal model stack not exposed. Premium tier (>20MB files, faster) on enterprise plans.
Rate limitsUpload: 60/min/org (sandbox), 600/min (production). Re-extraction: 10/min/document. Search: 30/min.
IdempotencyUpload deduplicates on SHA-256 of file content — identical bytes within 24h return the original documentId without re-charging. Re-extractions intentionally don't dedup; each call creates a new extraction row (so you can compare prompt versions).
AuditEvery upload, extraction, template change, erase, member action audit-logged. Immutable, 7 years. Filter via GET /api/audit-log.
ErasurePOST /api/documents/{id}/erase cryptographically shreds the sealed PDF + clears extraction PII (UU PDP-compliant).
Webhook signingHMAC-SHA256 over the raw body, header X-DocInt-Signature: sha256=<hex>. Replay protection: include X-DocInt-Timestamp in your signed-payload check + reject deliveries >5min old.

Read next