Skip to content

Chat API

The Chat API is the backend Hono.js service powering the fiskaly Workspace assistant. It provides server-sent events (SSE) streaming, RAG context retrieval using Vertex AI, and a robust security layer for both public chat and admin dashboards.

  • Streaming responses — Server-Sent Events (SSE) deliver low-latency typed characters and structured metadata.
  • RAG Grounding — Context is retrieved from 5 integrated sources (Docs MDX, OpenAPI, Zendesk KB, Web, and PDFs).
  • Dual Models — Requests intelligently route between Gemini 2.5 Pro (complex queries) and Gemini 2.0 Flash (simple queries/greetings).
  • Persona System — Tailor responses for developers, product managers, or retail operators with different fallback behaviors.

To use the Chat API, you will typically create an anonymous session, then open an EventSource connection to the /api/chat streaming endpoint.

POST /api/session

Returns a JWT session token required for rate limiting and continuity.

POST /api/chat
Authorization: Bearer <session_token>
Content-Type: application/json
{
"message": "How do I create a TSS in SIGN DE?",
"persona": "developer",
"history": []
}

The response is an SSE stream emitting JSON payloads with the data: prefix. The stream will contain both text chunks and metadata (like retrieved citations or the final quality score).

The Chat API includes strict guardrails for production use:

  • Rate limiting — 5 messages per minute, 30 per hour per session.
  • Input filtering — Jailbreak detection and length validation (max 3000 chars per message).
  • Output filtering — PII scanning and groundedness verification.
  • Budget guard — A configurable daily spend limit across the entire tenant, preventing unexpected LLM costs.

The RAG knowledge base is automatically re-indexed daily at 3:00 AM UTC via a Kubernetes CronJob. This ensures that new or updated documentation, Zendesk articles, and API specs are reflected in chat responses within 24 hours.

The chat-api service also hosts an internal React SPA at /admin/*, secured by Google OAuth. The dashboard provides:

  • Conversation review and quality tagging.
  • Content improvement Action Items (Todos).
  • LLM prompt overrides based on keyword triggers.
  • Usage, cost, and budget analytics.

Was this page helpful?