The Data Engineering Show
Llama 2 & 3 Safety: Soumya Batra on Agentic AI Training
April 8, 2026
What if the expertise that built foundation models could reshape how you think about AI's future? In this episode, Benjamin sits down with Soumya Batra, founder and CEO of WisePort AI and former safety lead on Llama 2 and Llama 3 at Meta, to explore how foundation models evolved from traditional NLP, why post-training holds the highest leverage for safety and controllability, and what natively agentic AI means for the next frontier of AI development. Whether you're curious about the model training lifecycle or wondering what comes after large language models, this conversation unpacks the technical strategies and vision shaping tomorrow's AI systems.
In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Soumya Batra, founder and CEO of WisePort AI and former tech lead at Meta where she led safety efforts for Llama 2 and Llama 3, to explore the evolution of NLP, the complete lifecycle of foundation model training, and why the next AI frontier lies in natively agentic systems rather than simply scaling larger transformers.


What You'll Learn:

If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.


About the Guest(s)

Soumya Batra is the Founder and CEO of WisePort AI, a foundational AI company specializing in agentic AI systems. With over twelve years of expertise in NLP and machine learning, she previously served as a Tech Lead and Applied Research Scientist at Meta, where she led safety and controllability efforts for both Llama 2 and Llama 3. Her career spans foundational work at Carnegie Mellon University, Microsoft, and Meta, establishing her as a pioneering voice in conversational AI and foundation model development. In this episode, Soumya demystifies the journey from traditional NLP to large language models, revealing how safety and controllability are embedded across the entire model lifecycle—from pretraining through reinforcement learning. Her insights on the future of agentic AI and the limitations of current scaling-only approaches provide essential perspective for data engineers and ML practitioners navigating the rapidly evolving AI landscape.


Quotes

"I did not know then that this would become my career for the next decade." - Soumya

"Whatever work that I've done in the past becomes irrelevant all of a sudden." - Soumya

"There is always a notion of, yes, this is the big thing, and then no, it's not anymore." - Soumya

"I really think that we are going to be proven wrong once again about scaling transformers being the only way to achieve general intelligence." - Soumya

"Safety was an issue even back then, even though we were training in such controlled settings." - Soumya

"If you don't put some toxic content there, then it will lose the ability to classify it and it'll be much easier to break the safety later on." - Soumya

"In the post training phase, we are giving it that ability to be able to answer users' questions." - Soumya

"The next unlock will now come from foundational agent models that are natively agentic, which will unlock use cases that look unimaginable to us right now." - Soumya

"Natively agentic means the foundational model itself needs to dynamically explore the action space, rather than scaffolding around existing LLMs." - Soumya

"The real unlock comes from creating your own use cases, creating your own synthetic data, and going deep into a few workflows." - Soumya


Resources

Connect on LinkedIn:

Websites:

Articles & Research Papers:

Educational Institutions:


The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so

Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.

Check out our three most downloaded episodes: