RAG queries

Query a knowledge base with natural language. Axon embeds your question, retrieves the most relevant document chunks using vector similarity search, then generates a grounded answer — with source citations. Fully sovereign, single API call.

Query a knowledge base

bash
POST /v1/axon/knowledge-bases/:id/query
NameTypeRequiredDescription
querystringYesThe natural language question to answer.
modelstringNoGeneration model. Defaults to "axon-sovereign-1".
top_kintegerNoNumber of chunks to retrieve. Default 5, max 20.
include_chunksbooleanNoInclude retrieved chunks in the response. Default true.
system_promptstringNoOverride the system prompt for the grounding step.
bash
curl -X POST https://api.hldgroup.org/v1/axon/knowledge-bases/kb_01hxyz/query \
  -H "x-internal-secret: <key>" \
  -H "x-tenant-id: ten_01hxyz" \
  -H "x-user-id: usr_01hxyz" \
  -H "x-platform-role: tenant-standard-user" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the first 3 steps when ransomware is detected on an endpoint?",
    "top_k": 5,
    "include_chunks": true
  }'
json
{
  "data": {
    "id": "qry_01hxyz",
    "knowledge_base_id": "kb_01hxyz",
    "knowledge_base_name": "Security runbooks",
    "model": "axon-sovereign-1",
    "data_residency": "au",
    "sovereign": true,
    "query": "What are the first 3 steps when ransomware is detected on an endpoint?",
    "answer": "Based on your security runbooks, the first three steps are: (1) Immediately isolate the affected endpoint from the network using Sentinel device isolation. (2) Preserve evidence by capturing a memory dump via the forensics API before any remediation. (3) Notify the incident response team and open a critical-severity incident in HomeBase.",
    "citations": [
      {
        "document_id": "doc_01hxyz",
        "filename": "ransomware-response-v3.md",
        "chunk_index": 2,
        "relevance_score": 0.94
      }
    ],
    "chunks": [
      {
        "document_id": "doc_01hxyz",
        "filename": "ransomware-response-v3.md",
        "chunk_index": 2,
        "text": "## Immediate containment\n1. Isolate the device..."
      }
    ],
    "usage": {
      "prompt_tokens": 912,
      "completion_tokens": 87,
      "total_tokens": 999
    }
  }
}

Grounding prompt pattern

By default Axon uses a conservative grounding prompt that instructs the model to answer only from the retrieved context and clearly state when information is not in the knowledge base. Override with system_prompt to adjust tone or scope:

json
{
  "query": "Summarise our patching policy",
  "system_prompt": "You are a friendly IT helpdesk assistant. Answer from the provided documents in plain language. If the answer is not in the documents, say so clearly and suggest the user contact IT."
}

Multi-turn RAG chat

For conversational RAG (where follow-up questions reference prior context), use the completions endpoint with knowledge_base_id and pass the full message history:

bash
POST /v1/axon/completions
{
  "model": "axon-sovereign-1",
  "knowledge_base_id": "kb_01hxyz",
  "messages": [
    { "role": "user", "content": "What are the steps for ransomware response?" },
    { "role": "assistant", "content": "The steps are: 1) Isolate..." },
    { "role": "user", "content": "How long does step 2 typically take?" }
  ]
}
Tip:relevance_score in citations indicates how closely the chunk matched the query (0–1). Scores below 0.6 typically indicate the knowledge base doesn't contain a strong answer — consider prompting the user to refine their question or expand the knowledge base.