Anubhav Anand

Full Stack AI/ML Engineer — production RAG, multi-agent systems & evals at scale

I build user-facing AI features and the infrastructure under them — RAG pipelines, multi-agent systems, and the evals and observability that keep them honest. I also built and open-sourced Grove↗, a native macOS app that runs many Claude Code agents in parallel, each in its own isolated git worktree. I work in the open with 25+ merged PRs across the AI tooling ecosystem (promptfoo, Haystack, WordPress AI, Supabase, Arize Phoenix), and I write a 50+ post field journal on what actually works when shipping production AI.

Skills

Agents & LLMLangGraph · LangChain · LlamaIndex · Pydantic AI · MCP / FastMCP · OpenAI Agents SDK · AWS Bedrock · LLM-as-judge
RAG & RetrievalMilvus · Pinecone · Qdrant · FAISS · OpenSearch · Weaviate · Chroma · pgvector · BM25 · hybrid/RRF · MMR · reranking · HyDE · multi-query
Evals & Observabilitycustom eval harnesses · promptfoo · RAGAS · Langfuse · OpenTelemetry
MLPyTorch · TensorFlow · Keras · scikit-learn · NMF · YOLOv4
LanguagesPython · TypeScript · JavaScript · PHP · Swift
Backend / InfraFastAPI · Django · Node/Express · MongoDB · Docker · Kubernetes (EKS) · GitLab CI/CD · Keycloak · Vault · AWS · Azure · Next.js · React

Open Source 25+ merged PRs

promptfoo (16 PRs)

The LLM-eval framework used by OpenAI & Anthropic: BLEU/ROUGE scoring fixes (#9717, #9740, #9718, #9739); RAGAS context-relevance segmentation (#9734); SQL/XML validators (#9785, #9784, #9782); inverse (not-) handling (#9738, #9737, #9725, #9722); similarity / threshold scoring (#9721, #9736); provider token/cost accounting — watsonx, xAI (#9780, #9783).

deepset-ai / Haystack (5 PRs)

RAG correctness: zero-vector cosine NaN guard (#11628), split_overlap validation (#11625), joiner zero-weight error (#11629), nested metadata filters (#11649), from_dict crash fix (#11626).

WordPress / ai (3 PRs)

AI Request Log "Last 30 Days" drifting-window fix (#753); wp-dataviews i18n (#723); disabled-state icon (#720).

LibreChat · Supabase · Arize Phoenix

LibreChat (1): NO_PROXY for OpenID auth (#13716). Supabase (2): list() sortBy + edge-fn Content-Type (#2454, #2455). Arize Phoenix (1): phoenix-mcp User-Agent (#13743).

Experience

Publicis Sapient — Senior Associate, Data Science Dec 2023 – Present · India

Built production AI ground-up across SustainAI (AIOps/ITSM resolution & RCA agents for Nissan) and AskBodhi (AWS-Marketplace GenAI platform).

SustainAI — Multi-Agent ITSM Resolution & Ops Copilot

Platform impact: 35–40% lower operating cost · 50–82% faster incident resolution · higher uptime.

Stack: LangGraph · LangChain · Pydantic AI · Langfuse · promptfoo · AWS Bedrock · ServiceNow · Jira · MCP

SustainAI — Autonomous SRE / RCA

Stack: LangGraph · Pydantic AI · Anthropic Claude · Grafana · Datadog · Prometheus · Kubernetes · GitLab · Teams · Jira · ServiceNow

SustainAI — Dynamic REST-to-Tool MCP Server

Stack: FastMCP · FastAPI · Pydantic · MongoDB · Keycloak · HashiCorp Vault

AskBodhi — Enterprise GenAI Platform

Stack: LangChain · LlamaIndex · rank_bm25 · cross-encoder rerankers · AWS Bedrock + SageMaker

Gesund.ai — Machine Learning Engineer Dec 2022 – Nov 2023 · Remote

Privacy-first MLOps platform for clinical-grade ML in healthcare and life sciences.

Spritle — ML Engineer Jun 2022 – Nov 2022 · India

Selected Projects (Freelance)

AskStanton — AI Engineer

LangGraph multi-agent legal AI with human-in-the-loop, memory stores, FastAPI streaming; Docker Compose microservices, CI/CD, multi-LLM routing, RAG over Supabase, Langfuse, Next.js chat UI with citations.

ZodiaQ — Full Stack Developer

Next.js/TS frontend + Node/Express/MongoDB backend, JWT-OTP auth, Razorpay payments/webhooks, CRM; Azure OpenAI + LangChain RAG and multi-step agents; Jest/Cypress, Dockerized, GitHub Actions to Azure + Vercel.

Education

Lovely Professional University — B.Tech, Computer Science
Specialization: Data Science & AI