📘 Public beta · Endpoints are stable; OpenAPI specs and SDKs ship monthly. See changelog →
Products
Document Intelligence
Upload & process

Upload & process

Multipart upload (recommended)

POST/api/documents/upload
Auth · API keyScope · documents:write

Form fields

filebinary (multipart)Required
PDF, JPEG, PNG, HEIC, WebP, or TIFF. Max 20 MB.
typeenum
Document type hint. Skips auto-classify and goes straight to extract. See Document types.
externalRefstring
Your reference for the upload — survives dedupe; useful for cross-system tracking.
customerNamestring
Bind to a customer name; auto-creates the customer if new.
customerIdstring
Bind to an existing customer ID.
pdfPasswordstring
For password-protected PDFs. We never persist the password.

Response

{
  "data": {
    "document": {
      "id": "doc_01HXY...",
      "type": "npwp",
      "status": "queued",
      "filename": "npwp-sample.pdf",
      "sizeBytes": 184392,
      "sha256": "...",
      "externalRef": "loan-app-2026-05-24-001",
      "createdAt": "..."
    },
    "deduped": false
  }
}

Deduplication

We hash files at upload. If you re-upload the same file (same SHA-256), we return the existing document and deduped: true. We do not re-extract — the original extractions are still valid.

To force re-extraction, use POST /api/documents/{id}/reextract.

Status progression

queued → parsing → extracted          (success)
                 → requires_review    (low confidence — analyst should review)
                 → failed             (hard error — see errorMessage)
                 → password_required  (encrypted PDF, no/wrong password)

JSON upload (small files)

For small images or base64-encoded payloads where multipart is awkward:

POST/api/documents/upload-json
Auth · API keyScope · documents:write
{
  "filename": "ktp-front.jpg",
  "contentType": "image/jpeg",
  "contentBase64": "<base64 payload>",
  "type": "ktp",
  "externalRef": "..."
}

Same response shape. Max body 16 MB after base64.

File-size limits

File classMax size
PDF20 MB
Image (JPEG, PNG, etc.)20 MB per file
Multi-page TIFF20 MB total
HEIC20 MB (auto-converted to JPEG server-side)

Larger files: split client-side (multi-page contracts), or talk to us about an enterprise tier.

Password-protected PDFs

Some banks PDF their KK and akta with passwords. Pass pdfPassword on upload:

curl -X POST .../api/documents/upload \
  -F "file=@./locked-akta.pdf" \
  -F "type=akta_lahir" \
  -F "pdfPassword=birth-date-or-nik"

We decrypt in memory, never store the password, and discard the decrypted bytes after extraction. If the password is wrong, status becomes password_required and you can supply a corrected password via re-extract.

Re-extract

POST/api/documents/{id}/reextract
Auth · API keyScope · documents:write

Re-runs the pipeline. Use when:

  • A new template was published since the original extraction.
  • You provided a pdfPassword after a previous password_required outcome.
  • Original extraction failed and a model update may have fixed the issue.

Optional body:

{
  "templateId": "tpl_01HXY...",
  "type": "invoice",
  "pdfPassword": "..."
}

Erase (PDP / GDPR)

POST/api/documents/{id}/erase
Auth · API keyScope · documents:write
{
  "reason": "Customer requested PDP deletion 2026-05-24, ticket #4521"
}

Hard-deletes the original file and bounding boxes. Extracted fields can optionally be retained (set keepFields: true) for audit, but the source file is gone. The reason is required and logged.

Download

GET/api/documents/{id}/download
Auth · API keyScope · documents:read

Returns a signed URL valid for 5 minutes. Don't proxy through your servers; the URL is single-use and stateless.

List documents

GET/api/documents
Auth · API keyScope · documents:read

Filters: type (comma-sep), status (comma-sep), q (free-text), customerName, externalRef, from, to, limit (default 50, max 200), offset.

Don't store the raw file twice

We store the original file (sealed-at-rest, AES-256-GCM) for the configured retention period. You don't need to keep your own copy unless your audit policy requires it. If you do, /download is the canonical re-fetch path.