Extractions
An extraction is a structured field set tied to a document. A single document can have multiple extractions: the model-extracted one (initial pass), template-extracted ones (if a template was applied later), and analyst-corrected ones.
Extraction object
{
"id": "ext_01HXY...",
"documentId": "doc_01HXY...",
"method": "quantum_ai_vision | rule_template | manual_correction",
"modelUsed": "quantum-ai:v1",
"confidenceScore": 94,
"fields": {
"npwp_number": "12.345.678.9-012.000",
"name": "BUDI SANTOSO",
"address": "JL. SUDIRMAN KAV. 1, JAKARTA SELATAN",
"registered_at": "2018-04-15"
},
"rawText": "(full OCR text if requested or document type is generic_ocr)",
"fieldBoundingBoxes": {
"npwp_number": { "page": 0, "x": 0.34, "y": 0.21, "w": 0.32, "h": 0.04 }
},
"flags": ["analyst_verified"],
"templateId": "tpl_01HXY...",
"createdAt": "..."
}Method
| Method | Source |
|---|---|
quantum_ai_vision | Default. AI vision extraction. |
rule_template | Template-based extraction (e.g. fixed-form invoices, deterministic). |
manual_correction | Analyst overrode/corrected via dashboard or API. |
Confidence
A number 0–100. Sources:
- Model confidence (for
quantum_ai_vision) - Template-match confidence (for
rule_template) - Always 100 for
manual_correction(analyst sign-off is authoritative)
If the model's confidence falls below the org-configured requiresReviewThreshold (default 70), the document goes to status: requires_review and pings analysts via webhook.
Flags
| Flag | Meaning |
|---|---|
low_confidence | At least one field below threshold |
photo_quality_poor | Detected glare / blur / occlusion / low-resolution |
analyst_verified | An analyst signed off on this extraction |
template_partial | A template was applied but didn't cover every field |
List extractions on a document
/api/documents/{id}/extractionscurl .../api/documents/doc_01HXY.../extractions \
-H "Authorization: Bearer $QE_API_KEY"Returns all extractions for the document, newest first. For most flows you want the latest of method=manual_correction (if any) or the latest quantum_ai_vision otherwise.
Read a single extraction
/api/extractions/{id}Submit a manual correction
/api/documents/{id}/extractions{
"fields": {
"npwp_number": "12.345.678.9-012.000",
"name": "Budi Santoso",
"address": "Jl. Sudirman Kav. 1, Jakarta Selatan",
"registered_at": "2018-04-15"
},
"confidenceScore": 100,
"flags": ["analyst_verified"]
}Creates a new extraction (we never overwrite — every correction is its own row for audit). The latest manual_correction extraction becomes the canonical answer.
This also triggers an extraction.corrected webhook your downstream systems can subscribe to.
Export
/api/documents/{id}/exportQuery param format:
format=json— returns the latest extraction'sfieldsas JSONformat=csv— returns a single-row CSV with field columns
Useful for downstream system integrations that pull periodically rather than subscribing to webhooks.
Bounding boxes
Bounding boxes are normalized [0.0, 1.0] relative to page width/height. Each box has:
{ "page": 0, "x": 0.34, "y": 0.21, "w": 0.32, "h": 0.04 }Use them to highlight fields on the original document in your UI. For multi-page PDFs, page is 0-indexed.
Confidence threshold tuning
requiresReviewThreshold is per-org, default 70. Lower it (e.g. 60) to send more docs to analysts (higher recall, more manual work). Raise it (e.g. 80) to auto-trust more (higher throughput, more risk of bad data leaking through).
The right number depends on your downstream consequence. For loan underwriting where bad data costs IDR-millions, 80–85 is conservative. For document archival, 50 is fine.
Field-level thresholds are more powerful than document-level
The default requiresReviewThreshold looks at average confidence across fields. If you have a "must be right" field (e.g. NIK, invoice total), set a per-field threshold via the template's field.requireConfidenceAbove property. A single below-threshold critical field routes the whole doc to review even if the average is high.