Chat API
The Chat API is the backend Hono.js service powering the fiskaly Workspace assistant. It provides server-sent events (SSE) streaming, RAG context retrieval using Vertex AI, and a robust security layer for both public chat and admin dashboards.
Core Capabilities
Section titled “Core Capabilities”- Streaming responses — Server-Sent Events (SSE) deliver low-latency typed characters and structured metadata.
- RAG Grounding — Context is retrieved from 5 integrated sources (Docs MDX, OpenAPI, Zendesk KB, Web, and PDFs).
- Dual Models — Requests intelligently route between Gemini 2.5 Pro (complex queries) and Gemini 2.0 Flash (simple queries/greetings).
- Persona System — Tailor responses for developers, product managers, or retail operators with different fallback behaviors.
API Integration
Section titled “API Integration”To use the Chat API, you will typically create an anonymous session, then open an EventSource connection to the /api/chat streaming endpoint.
1. Create a Session
Section titled “1. Create a Session”POST /api/sessionReturns a JWT session token required for rate limiting and continuity.
2. Stream a Conversation
Section titled “2. Stream a Conversation”POST /api/chatAuthorization: Bearer <session_token>Content-Type: application/json
{ "message": "How do I create a TSS in SIGN DE?", "persona": "developer", "history": []}The response is an SSE stream emitting JSON payloads with the data: prefix. The stream will contain both text chunks and metadata (like retrieved citations or the final quality score).
If you are building a React application, we provide a complete, drop-in UI library. See the Chat UI Components documentation instead of building the SSE client from scratch.
Security and Limits
Section titled “Security and Limits”The Chat API includes strict guardrails for production use:
- Rate limiting — 5 messages per minute, 30 per hour per session.
- Input filtering — Jailbreak detection and length validation (max 3000 chars per message).
- Output filtering — PII scanning and groundedness verification.
- Budget guard — A configurable daily spend limit across the entire tenant, preventing unexpected LLM costs.
Content Re-indexing
Section titled “Content Re-indexing”The RAG knowledge base is automatically re-indexed daily at 3:00 AM UTC via a Kubernetes CronJob. This ensures that new or updated documentation, Zendesk articles, and API specs are reflected in chat responses within 24 hours.
Admin Dashboard
Section titled “Admin Dashboard”The chat-api service also hosts an internal React SPA at /admin/*, secured by Google OAuth. The dashboard provides:
- Conversation review and quality tagging.
- Content improvement Action Items (Todos).
- LLM prompt overrides based on keyword triggers.
- Usage, cost, and budget analytics.
What’s Next
Section titled “What’s Next”Was this page helpful?