RAG queries
Query a knowledge base with natural language. Axon embeds your question, retrieves the most relevant document chunks using vector similarity search, then generates a grounded answer — with source citations. Fully sovereign, single API call.
Query a knowledge base
bash
POST /v1/axon/knowledge-bases/:id/query| Name | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | The natural language question to answer. |
| model | string | No | Generation model. Defaults to "axon-sovereign-1". |
| top_k | integer | No | Number of chunks to retrieve. Default 5, max 20. |
| include_chunks | boolean | No | Include retrieved chunks in the response. Default true. |
| system_prompt | string | No | Override the system prompt for the grounding step. |
bash
curl -X POST https://api.hldgroup.org/v1/axon/knowledge-bases/kb_01hxyz/query \
-H "x-internal-secret: <key>" \
-H "x-tenant-id: ten_01hxyz" \
-H "x-user-id: usr_01hxyz" \
-H "x-platform-role: tenant-standard-user" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the first 3 steps when ransomware is detected on an endpoint?",
"top_k": 5,
"include_chunks": true
}'json
{
"data": {
"id": "qry_01hxyz",
"knowledge_base_id": "kb_01hxyz",
"knowledge_base_name": "Security runbooks",
"model": "axon-sovereign-1",
"data_residency": "au",
"sovereign": true,
"query": "What are the first 3 steps when ransomware is detected on an endpoint?",
"answer": "Based on your security runbooks, the first three steps are: (1) Immediately isolate the affected endpoint from the network using Sentinel device isolation. (2) Preserve evidence by capturing a memory dump via the forensics API before any remediation. (3) Notify the incident response team and open a critical-severity incident in HomeBase.",
"citations": [
{
"document_id": "doc_01hxyz",
"filename": "ransomware-response-v3.md",
"chunk_index": 2,
"relevance_score": 0.94
}
],
"chunks": [
{
"document_id": "doc_01hxyz",
"filename": "ransomware-response-v3.md",
"chunk_index": 2,
"text": "## Immediate containment\n1. Isolate the device..."
}
],
"usage": {
"prompt_tokens": 912,
"completion_tokens": 87,
"total_tokens": 999
}
}
}Grounding prompt pattern
By default Axon uses a conservative grounding prompt that instructs the model to answer only from the retrieved context and clearly state when information is not in the knowledge base. Override with system_prompt to adjust tone or scope:
json
{
"query": "Summarise our patching policy",
"system_prompt": "You are a friendly IT helpdesk assistant. Answer from the provided documents in plain language. If the answer is not in the documents, say so clearly and suggest the user contact IT."
}Multi-turn RAG chat
For conversational RAG (where follow-up questions reference prior context), use the completions endpoint with knowledge_base_id and pass the full message history:
bash
POST /v1/axon/completions
{
"model": "axon-sovereign-1",
"knowledge_base_id": "kb_01hxyz",
"messages": [
{ "role": "user", "content": "What are the steps for ransomware response?" },
{ "role": "assistant", "content": "The steps are: 1) Isolate..." },
{ "role": "user", "content": "How long does step 2 typically take?" }
]
}Tip:
relevance_score in citations indicates how closely the chunk matched the query (0–1). Scores below 0.6 typically indicate the knowledge base doesn't contain a strong answer — consider prompting the user to refine their question or expand the knowledge base.