I spent last building a coding agent as a Product Manager and there were tonnes of learnings! LLMs have made the future of software incredibly exciting! While the internet democratised knowledge, AI is democratising “expertise, knowledge and skills”! It’s incredibly empowering to brainstorm to your intellectually smart partner (LLM) and bring ideas to life just through conversations!

Starting with where this journey began! Throwback to my ReactNexus 2024 conference where I presented early learnings from building a coding agent, an advanced RAG pipeline, AI based code migrator, and other AI capabilities such as mockups to code, idea to app. Over the year we put a lot of emphasis on Evals and trustworthy workflows to improve reliability of coding agent, and UX. It reflected in developer feedback and numbers!

https://www.youtube.com/watch?v=gWzDYetc7Mw&t=61s

Developer Love ❤️❤️ and the ensuing impact

Developers praising productivity gains in development and the ability to migrate apps to new platform versions seamlessly with AI assistance

“Positive feedback rate” almost doubled from 30% to 54% even while feedback volume jumped 87% - indicating both higher satisfaction and broader adoption
Continually improve “Response acceptance rate” - a critical milestone showing developers trust and actively adopt AI-generated suggestions
Developer engagement accelerated 32.8% QoQ - reflecting sustained and growing usage patterns

shapes at 25-06-22 08.58.19.png

CleanShot 2025-06-22 at 08.43.39.png

Key Learnings from building a “reliable AI product”

I wanted to share the learnings and un-learnings of building an AI-powered product as a Product Manager! LLMs are all the rage right now and incredibly powerful, but the "rigour" required to build a reliable AI product was an important learning for me!

Building "magical AI prototypes" is 10X easier, while building reliable products for customers has gotten "10X harder"! There is “rigour” required in building AI products (Ask your ML counterparts!). What looks like a working demo often breaks when exposed to real users! One needs robust evals, continuous LLM traces evaluation for error analysis, and trustworthy workflows!
Trust in AI products is really low thanks to overhyped claims! Bake trust into your AI workflows from Day 1. As a PM, ask your self what can I add into my workflow (such as transparency and control) that makes my customer trust the AI system (aka your product).
Robust Evals for accuracy, safety, and reliance! AI is non-deterministic but your product and customers who are using it need “reliability” with some scientific evidence! Evals are a critical part of AI product development and Product Managers should be the internal champions of rigorous evals! You need systematic ways to measure and communicate quality to your teams and your customers.

In the subsequent posts I will be talking about 2 things:

How we built an "AI evals" system (offline evals on test bed, evals on production data)
How we built "trust" into all the AI feature workflows!!