Agents & conversations
Flows are batch. Conversations are multi-turn chat. Use conversations when an analyst (or end-user) interacts iteratively with an LLM that has access to tools.
Agents
An agent is a reusable LLM character: system prompt + tool set + model choice. Define once, instantiate into many conversations.
/api/agents{
"name": "Fraud Investigator",
"systemPrompt": "You are a fraud-investigation assistant for an Indonesian bank. You can search customer history, look up transactions, and check device flags. Always cite specific transactions when making accusations. Be conservative — when in doubt, recommend human review.",
"tools": [
{
"slug": "search_customer",
"name": "Search customer by NIK or name",
"description": "Fetch a customer record by NIK or partial name match.",
"inputSchema": {
"type": "object",
"properties": {
"nik": { "type": "string" },
"name": { "type": "string" }
}
}
},
{
"slug": "list_recent_transactions",
"name": "List recent transactions",
"description": "Get the customer's most recent transactions.",
"inputSchema": {
"type": "object",
"properties": {
"customerId": { "type": "string" },
"limit": { "type": "integer", "default": 20 }
},
"required": ["customerId"]
}
}
],
"model": "quantum-ai:default"
}Tools are abstract — defining the tool gives the LLM permission to call it. Implementing the tool happens elsewhere: tool calls fire as agent.tool_called events that your service handles + returns results to the conversation.
Conversations
A conversation is a multi-turn session with an agent. Each user message + LLM response + tool calls live in the conversation history.
/api/conversations{
"agentId": "agt_01HXY...",
"context": {
"investigatorId": "usr_01HXY...",
"openCases": ["cas_01HXY..."]
}
}context is opaque per-conversation state passed into the system prompt.
Response (201):
{
"data": {
"conversation": {
"id": "cnv_01HXY...",
"agentId": "agt_01HXY...",
"createdAt": "..."
}
}
}Send a message
/api/conversations/{id}/messages{
"role": "user",
"content": "Customer cus_01HXY... reported an unauthorized IDR 5M debit yesterday. Investigate."
}Response is the LLM's reply (streamed if you pass stream=true in the headers):
{
"data": {
"message": {
"id": "msg_01HXY...",
"role": "assistant",
"content": "I'll search the customer's recent transactions and check for the IDR 5M debit. Let me look this up.",
"toolCalls": [
{
"slug": "list_recent_transactions",
"input": { "customerId": "cus_01HXY...", "limit": 30 }
}
],
"createdAt": "..."
}
}
}When the LLM calls a tool, your service receives the call via agent.tool_called webhook (or polling), executes it, and POSTs the result back:
POST /api/conversations/{id}/messages
{
"role": "tool",
"toolCallId": "tc_...",
"content": { "transactions": [...] }
}The conversation continues — the LLM uses the tool output to compose the next assistant message.
Streaming
Pass Accept: text/event-stream on the message send. Server-sent events stream the assistant's tokens as they're generated:
event: token
data: {"text": "I'll "}
event: token
data: {"text": "search "}
event: tool_call
data: {"slug": "list_recent_transactions", "input": {...}}
event: done
data: {"messageId": "msg_..."}Standard fetch / EventSource in any language works. Browser-side EventSource only supports GET, so use fetch with streaming reader for POST:
const res = await fetch(`/api/conversations/${conversationId}/messages`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
'Accept': 'text/event-stream',
},
body: JSON.stringify({ role: 'user', content: text }),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';
for (;;) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
for (const block of buffer.split('\n\n')) {
const [eventLine, dataLine] = block.split('\n');
if (!dataLine) continue;
const event = eventLine.slice('event: '.length);
const data = JSON.parse(dataLine.slice('data: '.length));
if (event === 'token') ui.appendText(data.text);
if (event === 'tool_call') ui.showToolBubble(data.slug, data.input);
if (event === 'done') ui.finalize(data.messageId);
}
buffer = buffer.slice(buffer.lastIndexOf('\n\n') + 2);
}Continuation after a tool call
The assistant message that requested a tool call is incomplete until you post the tool result. The conversation will not auto-resume — your service must POST the tool message:
curl -X POST .../api/conversations/cnv_01HXY.../messages \
-d '{
"role": "tool",
"toolCallId": "tc_01HXY...",
"content": { "transactions": [...] }
}'The server then re-runs inference with the tool result in context and emits the next assistant message (streamed if you re-subscribe to the stream, or returned synchronously).
Conversation history
/api/conversations/{id}/api/conversationsGET /api/conversations/{id} returns the full message history. GET /api/conversations?agentId=agt_...&limit=50 lists conversations for an agent.
When to use a conversation vs a flow
| Use a conversation | Use a flow |
|---|---|
| Multi-turn (analyst iterating with LLM) | Single-shot classification / extraction |
| Free-form input from a user | Structured input from your service |
| Tool use is unbounded (LLM decides) | Tool use is graph-defined (you decide) |
| Real-time UI (chat widget) | Async backend (queue → process → callback) |
Many integrations use both: a flow handles the deterministic backend pipeline; a conversation handles the analyst-facing investigation UI.
LLM cost on conversations
Each message records tokens. Long conversations get expensive — the full history goes into every call. Trim context by summarizing older turns into a single message when the conversation exceeds ~30 turns.