Technical Architecture

How kevi works

The site is a hiring-focused AI-native portfolio: content stays the source of truth, RAG retrieves evidence, and DeepSeek SSE streams the final answer with citations.

RAGBGE 512pgvectorRedisDeepSeek SSE

RAG Flow

Retrieval happens before generation. Missing site evidence is not invented; general questions are marked before being answered.

Question

The visitor asks a question from /chat, the AI drawer, or a contextual Ask action on an article or project.

Guarded API

The Nitro API validates input, checks Redis rate limits, and embeds the question before any LLM call.

BGE 512

Transformers.js runs Xenova/bge-small-zh-v1.5 locally and verifies the BGE vector is 512 dimensions.

pgvector

PostgreSQL pgvector searches indexed article, project, profile, and custom Q&A chunks for the RAG context.

DeepSeek SSE

DeepSeek SSE streams deltas back through the site API while structured citations come from retrieved chunks.

Engineering Decisions

The MVP chooses fewer moving parts when they make the product easier to verify and explain.

Vector store tradeoff

PostgreSQL already stores content and metadata, so pgvector keeps the MVP deployable as one database instead of adding Pinecone, Milvus, or Weaviate operational overhead.

docs-first workflow

Docs-first keeps scope decisions explicit: route names, vector dimensions, SSE events, and excluded modules are written down before implementation.

Rate limiting

Redis rate limiting runs before embedding and LLM calls, protecting cost and latency under repeated public chat requests.

Prompt boundary

prompt injection text is treated as user content. Server-owned system rules and retrieved context remain separate messages.

Safety and failure behavior

Cold embedding model loads can delay the first RAG request, so the service lazy-loads once and reuses the pipeline.
No-hit site facts return an explicit insufficient-evidence note instead of guessed personal facts; general questions receive a scope marker.
Citation cards are generated from source metadata, not from model-written free text.
The MVP avoids comments, message walls, voice chat, and technology radar to keep the AI loop verifiable.

Runtime boundary

Client: Renders messages, local history, citations, and loading or error states.
Nitro server: Owns validation, prompt construction, Redis limits, RAG retrieval, and provider calls.
Database: Stores structured content plus vector(512) chunks for retrieval.