Extractions

An extraction is a structured field set tied to a document. A single document can have multiple extractions: the model-extracted one (initial pass), template-extracted ones (if a template was applied later), and analyst-corrected ones.

Extraction object

{
  "id": "ext_01HXY...",
  "documentId": "doc_01HXY...",
  "method": "quantum_ai_vision | rule_template | manual_correction",
  "modelUsed": "quantum-ai:v1",
  "confidenceScore": 94,
  "fields": {
    "npwp_number": "12.345.678.9-012.000",
    "name": "BUDI SANTOSO",
    "address": "JL. SUDIRMAN KAV. 1, JAKARTA SELATAN",
    "registered_at": "2018-04-15"
  },
  "rawText": "(full OCR text if requested or document type is generic_ocr)",
  "fieldBoundingBoxes": {
    "npwp_number": { "page": 0, "x": 0.34, "y": 0.21, "w": 0.32, "h": 0.04 }
  },
  "flags": ["analyst_verified"],
  "templateId": "tpl_01HXY...",
  "createdAt": "..."
}

Method

Method	Source
`quantum_ai_vision`	Default. AI vision extraction.
`rule_template`	Template-based extraction (e.g. fixed-form invoices, deterministic).
`manual_correction`	Analyst overrode/corrected via dashboard or API.

Confidence

A number 0–100. Sources:

Model confidence (for quantum_ai_vision)
Template-match confidence (for rule_template)
Always 100 for manual_correction (analyst sign-off is authoritative)

If the model's confidence falls below the org-configured requiresReviewThreshold (default 70), the document goes to status: requires_review and pings analysts via webhook.

Flags

Flag	Meaning
`low_confidence`	At least one field below threshold
`photo_quality_poor`	Detected glare / blur / occlusion / low-resolution
`analyst_verified`	An analyst signed off on this extraction
`template_partial`	A template was applied but didn't cover every field

List extractions on a document

GET/api/documents/{id}/extractions

Auth · API keyScope · documents:read

curl .../api/documents/doc_01HXY.../extractions \
  -H "Authorization: Bearer $QE_API_KEY"

Returns all extractions for the document, newest first. For most flows you want the latest of method=manual_correction (if any) or the latest quantum_ai_vision otherwise.

Read a single extraction

GET/api/extractions/{id}

Auth · API keyScope · documents:read

Submit a manual correction

POST/api/documents/{id}/extractions

Auth · API keyScope · documents:write

{
  "fields": {
    "npwp_number": "12.345.678.9-012.000",
    "name": "Budi Santoso",
    "address": "Jl. Sudirman Kav. 1, Jakarta Selatan",
    "registered_at": "2018-04-15"
  },
  "confidenceScore": 100,
  "flags": ["analyst_verified"]
}

Creates a new extraction (we never overwrite — every correction is its own row for audit). The latest manual_correction extraction becomes the canonical answer.

This also triggers an extraction.corrected webhook your downstream systems can subscribe to.

Export

GET/api/documents/{id}/export

Auth · API keyScope · documents:read

Query param format:

format=json — returns the latest extraction's fields as JSON
format=csv — returns a single-row CSV with field columns

Useful for downstream system integrations that pull periodically rather than subscribing to webhooks.

Bounding boxes

Bounding boxes are normalized [0.0, 1.0] relative to page width/height. Each box has:

{ "page": 0, "x": 0.34, "y": 0.21, "w": 0.32, "h": 0.04 }

Use them to highlight fields on the original document in your UI. For multi-page PDFs, page is 0-indexed.

Confidence threshold tuning

requiresReviewThreshold is per-org, default 70. Lower it (e.g. 60) to send more docs to analysts (higher recall, more manual work). Raise it (e.g. 80) to auto-trust more (higher throughput, more risk of bad data leaking through).

The right number depends on your downstream consequence. For loan underwriting where bad data costs IDR-millions, 80–85 is conservative. For document archival, 50 is fine.

Field-level thresholds are more powerful than document-level

The default requiresReviewThreshold looks at average confidence across fields. If you have a "must be right" field (e.g. NIK, invoice total), set a per-field threshold via the template's field.requireConfidenceAbove property. A single below-threshold critical field routes the whole doc to review even if the average is high.

Document types Templates