Chat API

The Chat API is the backend Hono.js service powering the fiskaly Workspace assistant. It provides server-sent events (SSE) streaming, RAG context retrieval using Vertex AI, and a robust security layer for both public chat and admin dashboards.

Core Capabilities

Streaming responses — Server-Sent Events (SSE) deliver low-latency typed characters and structured metadata.
RAG Grounding — Context is retrieved from 5 integrated sources (Docs MDX, OpenAPI, Zendesk KB, Web, and PDFs).
Dual Models — Requests intelligently route between Gemini 2.5 Pro (complex queries) and Gemini 2.0 Flash (simple queries/greetings).
Persona System — Tailor responses for developers, product managers, or retail operators with different fallback behaviors.

API Integration

To use the Chat API, you will typically create an anonymous session, then open an EventSource connection to the /api/chat streaming endpoint.

1. Create a Session

POST /api/session

Returns a JWT session token required for rate limiting and continuity.

2. Stream a Conversation

POST /api/chat
Authorization: Bearer <session_token>
Content-Type: application/json

{
  "message": "How do I create a TSS in SIGN DE?",
  "persona": "developer",
  "history": []
}

The response is an SSE stream emitting JSON payloads with the data: prefix. The stream will contain both text chunks and metadata (like retrieved citations or the final quality score).

💡Using React?

If you are building a React application, we provide a complete, drop-in UI library. See the Chat UI Components documentation instead of building the SSE client from scratch.

Security and Limits

The Chat API includes strict guardrails for production use:

Rate limiting — 5 messages per minute, 30 per hour per session.
Input filtering — Jailbreak detection and length validation (max 3000 chars per message).
Output filtering — PII scanning and groundedness verification.
Budget guard — A configurable daily spend limit across the entire tenant, preventing unexpected LLM costs.

Content Re-indexing

The RAG knowledge base is automatically re-indexed daily at 3:00 AM UTC via a Kubernetes CronJob. This ensures that new or updated documentation, Zendesk articles, and API specs are reflected in chat responses within 24 hours.

Admin Dashboard

The chat-api service also hosts an internal React SPA at /admin/*, secured by Google OAuth. The dashboard provides:

Conversation review and quality tagging.
Content improvement Action Items (Todos).
LLM prompt overrides based on keyword triggers.
Usage, cost, and budget analytics.

What’s Next

Chat UI Components

Drop-in React components for the fiskaly Chat Widget and Full-Page experience.

Backend Source Code

View the underlying API source code, including Hono routes and the RAG pipeline.

Was this page helpful?