The brief
Dkubex is an enterprise GenAI platform — a RAG-first product letting customers ingest internal documents, build retrieval pipelines, and serve LLM-backed applications with their own data. I joined as a senior engineer and ended up owning the retrieval and serving stack end-to-end.
What I did
- Re-architected the RAG pipeline. Replaced a single-vector retrieval pass with a hybrid approach: BM25 lexical scoring fused with dense vector recall, re-ranked with a cross-encoder. Top-k relevance went up on internal eval set without changing the embedding model.
- Multimodal ingestion. Added a unified ingestion path that handled PDF, HTML, transcripts and image-extracted text through a single normalized document schema, with chunk-level provenance retained for citations.
- Langflow integration. Wired Langflow as a first-class flow editor inside the platform so non-engineering users could compose RAG pipelines visually while the runtime stayed strict and auditable.
- OAuth2 unification. Pulled three disparate auth flows behind a single OAuth2-Proxy fronted by the Kubernetes Gateway API; tenants got SSO without per-app config.
Outcome
- Hybrid search shipped as the new default retrieval mode.
- Ingestion latency dropped because we stopped re-tokenizing on every retrieval.
- Customer success team reported measurably fewer "wrong answer" tickets.
What I learned
Latency in a RAG system isn't where you think it is. Profile the whole path — ingestion, retrieval, re-ranking, generation — before optimizing any one piece.