Question
The visitor asks a question from /chat, the AI drawer, or a contextual Ask action on an article or project.
The site is a hiring-focused AI-native portfolio: content stays the source of truth, RAG retrieves evidence, and DeepSeek SSE streams the final answer with citations.
Retrieval happens before generation. Missing site evidence is not invented; general questions are marked before being answered.
The visitor asks a question from /chat, the AI drawer, or a contextual Ask action on an article or project.
The Nitro API validates input, checks Redis rate limits, and embeds the question before any LLM call.
Transformers.js runs Xenova/bge-small-zh-v1.5 locally and verifies the BGE vector is 512 dimensions.
PostgreSQL pgvector searches indexed article, project, profile, and custom Q&A chunks for the RAG context.
DeepSeek SSE streams deltas back through the site API while structured citations come from retrieved chunks.
The MVP chooses fewer moving parts when they make the product easier to verify and explain.
PostgreSQL already stores content and metadata, so pgvector keeps the MVP deployable as one database instead of adding Pinecone, Milvus, or Weaviate operational overhead.
Docs-first keeps scope decisions explicit: route names, vector dimensions, SSE events, and excluded modules are written down before implementation.
Redis rate limiting runs before embedding and LLM calls, protecting cost and latency under repeated public chat requests.
prompt injection text is treated as user content. Server-owned system rules and retrieved context remain separate messages.