Upload & process
Multipart upload (recommended)
/api/documents/uploadForm fields
filebinary (multipart)RequiredtypeenumexternalRefstringcustomerNamestringcustomerIdstringpdfPasswordstringResponse
{
"data": {
"document": {
"id": "doc_01HXY...",
"type": "npwp",
"status": "queued",
"filename": "npwp-sample.pdf",
"sizeBytes": 184392,
"sha256": "...",
"externalRef": "loan-app-2026-05-24-001",
"createdAt": "..."
},
"deduped": false
}
}Deduplication
We hash files at upload. If you re-upload the same file (same SHA-256), we return the existing document and deduped: true. We do not re-extract — the original extractions are still valid.
To force re-extraction, use POST /api/documents/{id}/reextract.
Status progression
queued → parsing → extracted (success)
→ requires_review (low confidence — analyst should review)
→ failed (hard error — see errorMessage)
→ password_required (encrypted PDF, no/wrong password)JSON upload (small files)
For small images or base64-encoded payloads where multipart is awkward:
/api/documents/upload-json{
"filename": "ktp-front.jpg",
"contentType": "image/jpeg",
"contentBase64": "<base64 payload>",
"type": "ktp",
"externalRef": "..."
}Same response shape. Max body 16 MB after base64.
File-size limits
| File class | Max size |
|---|---|
| 20 MB | |
| Image (JPEG, PNG, etc.) | 20 MB per file |
| Multi-page TIFF | 20 MB total |
| HEIC | 20 MB (auto-converted to JPEG server-side) |
Larger files: split client-side (multi-page contracts), or talk to us about an enterprise tier.
Password-protected PDFs
Some banks PDF their KK and akta with passwords. Pass pdfPassword on upload:
curl -X POST .../api/documents/upload \
-F "file=@./locked-akta.pdf" \
-F "type=akta_lahir" \
-F "pdfPassword=birth-date-or-nik"We decrypt in memory, never store the password, and discard the decrypted bytes after extraction. If the password is wrong, status becomes password_required and you can supply a corrected password via re-extract.
Re-extract
/api/documents/{id}/reextractRe-runs the pipeline. Use when:
- A new template was published since the original extraction.
- You provided a
pdfPasswordafter a previouspassword_requiredoutcome. - Original extraction failed and a model update may have fixed the issue.
Optional body:
{
"templateId": "tpl_01HXY...",
"type": "invoice",
"pdfPassword": "..."
}Erase (PDP / GDPR)
/api/documents/{id}/erase{
"reason": "Customer requested PDP deletion 2026-05-24, ticket #4521"
}Hard-deletes the original file and bounding boxes. Extracted fields can optionally be retained (set keepFields: true) for audit, but the source file is gone. The reason is required and logged.
Download
/api/documents/{id}/downloadReturns a signed URL valid for 5 minutes. Don't proxy through your servers; the URL is single-use and stateless.
List documents
/api/documentsFilters: type (comma-sep), status (comma-sep), q (free-text), customerName, externalRef, from, to, limit (default 50, max 200), offset.
Don't store the raw file twice
We store the original file (sealed-at-rest, AES-256-GCM) for the configured retention period. You don't need to keep your own copy unless your audit policy requires it. If you do, /download is the canonical re-fetch path.