Anubhav Anand
Full Stack AI/ML Engineer — production RAG, multi-agent systems & evals at scale
I build user-facing AI features and the infrastructure under them — RAG pipelines, multi-agent systems, and the evals and observability that keep them honest. I also built and open-sourced Grove↗, a native macOS app that runs many Claude Code agents in parallel, each in its own isolated git worktree. I work in the open with 25+ merged PRs across the AI tooling ecosystem (promptfoo, Haystack, WordPress AI, Supabase, Arize Phoenix), and I write a 50+ post field journal on what actually works when shipping production AI.
Skills
| Agents & LLM | LangGraph · LangChain · LlamaIndex · Pydantic AI · MCP / FastMCP · OpenAI Agents SDK · AWS Bedrock · LLM-as-judge |
| RAG & Retrieval | Milvus · Pinecone · Qdrant · FAISS · OpenSearch · Weaviate · Chroma · pgvector · BM25 · hybrid/RRF · MMR · reranking · HyDE · multi-query |
| Evals & Observability | custom eval harnesses · promptfoo · RAGAS · Langfuse · OpenTelemetry |
| ML | PyTorch · TensorFlow · Keras · scikit-learn · NMF · YOLOv4 |
| Languages | Python · TypeScript · JavaScript · PHP · Swift |
| Backend / Infra | FastAPI · Django · Node/Express · MongoDB · Docker · Kubernetes (EKS) · GitLab CI/CD · Keycloak · Vault · AWS · Azure · Next.js · React |
Open Source 25+ merged PRs
promptfoo (16 PRs)
The LLM-eval framework used by OpenAI & Anthropic: BLEU/ROUGE scoring fixes (#9717, #9740, #9718, #9739); RAGAS context-relevance segmentation (#9734); SQL/XML validators (#9785, #9784, #9782); inverse (not-) handling (#9738, #9737, #9725, #9722); similarity / threshold scoring (#9721, #9736); provider token/cost accounting — watsonx, xAI (#9780, #9783).
deepset-ai / Haystack (5 PRs)
RAG correctness: zero-vector cosine NaN guard (#11628), split_overlap validation (#11625), joiner zero-weight error (#11629), nested metadata filters (#11649), from_dict crash fix (#11626).
WordPress / ai (3 PRs)
AI Request Log "Last 30 Days" drifting-window fix (#753); wp-dataviews i18n (#723); disabled-state icon (#720).
LibreChat · Supabase · Arize Phoenix
LibreChat (1): NO_PROXY for OpenID auth (#13716). Supabase (2): list() sortBy + edge-fn Content-Type (#2454, #2455). Arize Phoenix (1): phoenix-mcp User-Agent (#13743).
Experience
Publicis Sapient — Senior Associate, Data Science Dec 2023 – Present · India
Built production AI ground-up across SustainAI (AIOps/ITSM resolution & RCA agents for Nissan) and AskBodhi (AWS-Marketplace GenAI platform).
SustainAI — Multi-Agent ITSM Resolution & Ops Copilot
Platform impact: 35–40% lower operating cost · 50–82% faster incident resolution · higher uptime.
- Shipped for Nissan's service desk at enterprise scale: a LangGraph orchestrator with a Pydantic AI intent router fans out to 4 retrieval agents in parallel (similar incidents, KB articles, ticket history, solution drafting), returning a cited resolution in seconds — cutting L1 incident resolution time ~35%. Pre-built, industry-customizable IT agents auto-resolve common requests through automated workflows, and a self-maintaining KB auto-drafts articles from resolved tickets (human-approved, indexed in real time).
- A conversational ops copilot orchestrates specialized agents — a knowledge agent (RAG over runbooks, postmortems, Confluence & ServiceNow KB), a ticket-analytics agent (text-to-MQL over any client's ticket data), and a service-desk automation agent (runbook / workflow execution via the MCP layer, HITL-gated) — with role-based access (RBAC) scoping answers and actions to each user's role.
- Production-ready evals & observability: every release gated by an LLM-as-judge + promptfoo harness scoring citation accuracy, answer quality, and tool trajectory; end-to-end Langfuse tracing (token / latency / cost per step) with user-feedback-driven A/B prompt improvement.
Stack: LangGraph · LangChain · Pydantic AI · Langfuse · promptfoo · AWS Bedrock · ServiceNow · Jira · MCP
SustainAI — Autonomous SRE / RCA
- An autonomous Site-Reliability agent that resolves production incidents end to end: turns raw Grafana / Datadog alerts into confidence-scored root causes via a LangGraph ReAct loop over 6 evidence tools (Kubernetes logs/events/deploy status, Datadog metrics/logs, GitLab pipelines), producing a structured diagnosis (root cause, causal chain, remediation, confidence); a stagnation guard and noise gate keep runs bounded.
- A graph-based event-correlation agent collapses alert storms into one "situation" by service-dependency graph (topology), time window, and semantic similarity, surfacing the probable root trigger instead of dozens of raw alerts.
- Findings publish to a Microsoft Teams card with one-click approve/reject and pre-fill Jira, ServiceNow, GitLab; on approval an execution policy blocks destructive commands (sudo, kill, dd) before infra remediation (Kubernetes rollout restart/undo/scale).
Stack: LangGraph · Pydantic AI · Anthropic Claude · Grafana · Datadog · Prometheus · Kubernetes · GitLab · Teams · Jira · ServiceNow
SustainAI — Dynamic REST-to-Tool MCP Server
- A registry-based MCP server (FastMCP, JSON-RPC 2.0) that turns any registered REST API into an LLM-callable tool at runtime — onboarding 10+ internal REST APIs as agent tools.
- Auto-extracted OpenAPI spec, generated docstrings, pluggable Keycloak / API-key auth, and HashiCorp Vault secrets.
Stack: FastMCP · FastAPI · Pydantic · MongoDB · Keycloak · HashiCorp Vault
AskBodhi — Enterprise GenAI Platform
- Built the retrieval layer — a multi-vector-DB abstraction over LangChain unifying 8 vector stores (Milvus, Pinecone, Qdrant, FAISS, OpenSearch, Weaviate, Chroma, pgvector).
- A chunking + retrieval engine (recursive/semantic/hierarchical chunking; BM25, dense ANN, RRF hybrid, MMR, cross-encoder reranking, self-query, multi-query, HyDE) on LangChain retrievers and LlamaIndex node parsers; Dockerized FastAPI services with GitLab CI/CD on AWS EKS, stateless for horizontal scaling.
Stack: LangChain · LlamaIndex · rank_bm25 · cross-encoder rerankers · AWS Bedrock + SageMaker
Gesund.ai — Machine Learning Engineer Dec 2022 – Nov 2023 · Remote
Privacy-first MLOps platform for clinical-grade ML in healthcare and life sciences.
- Built computer-vision models for medical imaging (lung/liver-tumor segmentation, abnormality detection) in PyTorch/TensorFlow, shipped as FastAPI services on Docker + Kubernetes across on-prem and cloud.
- Built platform APIs/tooling for a centralized system coordinating distributed workers; in-house pipelines for fast experimentation and federated-learning integration; Pytest/Selenium automation around the model APIs.
Spritle — ML Engineer Jun 2022 – Nov 2022 · India
- Built a real-time computer-vision system for product-assembly verification (YOLOv4 on Darknet, fine-tuned on domain data) that improved worker safety.
Selected Projects (Freelance)
AskStanton — AI Engineer
LangGraph multi-agent legal AI with human-in-the-loop, memory stores, FastAPI streaming; Docker Compose microservices, CI/CD, multi-LLM routing, RAG over Supabase, Langfuse, Next.js chat UI with citations.
ZodiaQ — Full Stack Developer
Next.js/TS frontend + Node/Express/MongoDB backend, JWT-OTP auth, Razorpay payments/webhooks, CRM; Azure OpenAI + LangChain RAG and multi-step agents; Jest/Cypress, Dockerized, GitHub Actions to Azure + Vercel.
Education
Lovely Professional University — B.Tech, Computer Science
Specialization: Data Science & AI