AI for Data and Data for AI: The Dual Frontier of Modern Data Engineering with Pranav Motarwar

In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Pranav Motarwar, a data engineer who worked across major tech companies, and the intersection of AI and data infrastructure, to explore how artificial intelligence is fundamentally reshaping the data engineering landscape not by eliminating roles, but by bifurcating the field into two distinct, equally critical domains.

What You'll Learn:

- Why the "data engineering is dying" narrative is clickbait: Data engineers remain essential because 60% of use cases by 2027 will involve providing data to AI agents, while simultaneously human-facing analytics demands continue growing, meaning more work, not less.

- How to future-proof your career by mastering "AI for Data" AND "Data for AI": Modern AI Data Engineer roles now require both using AI agents to accelerate traditional ETL/DBT workflows AND building entirely new data pipelines (chunking, embedding, vector storage) designed specifically for agent consumption.

- The transformation framework breaking down how data pipelines for humans differ from pipelines for agents: Human-facing pipelines traditionally handled structured data; agent pipelines now require handling unstructured multimodal inputs (videos, audio, images), demanding completely different architectural approaches.

- Why individual contributors now own end-to-end pipelines that previously required 7-8 engineers: AI-assisted coding and low-code platforms like Databricks Cortex and Snowflake's GenAI tools reduce traditional pipeline development from one month to 3-4 weeks, freeing engineers to focus on product strategy, governance, and business impact.

- How the next-gen data stack will evolve: traditional tools (DBT, BI platforms) stay relevant, but new specialized systems emerge: Companies like Vespa handle multimodal retrieval serving, while emerging startups build data warehouses purpose-built for video and complex unstructured data - eventual consolidation will come once larger players (Databricks, Snowflake) evolve their offerings.

- The exponential data explosion argument that guarantees ongoing demand: Data generated by all humanity through 2008 is now created daily; even single engineers replacing five-person teams will find more work arriving as use cases expand across AI agents, real-time recommendations, robotics, and physical AI systems.

About the Guest(s)

Pranav Motarwar is a data engineer with extensive experience across leading tech companies, where he has worked in risk, product, privacy, and core data engineering roles. With a background spanning from traditional data engineering to cloud infrastructure and AI-driven systems, Pranav brings a unique perspective on the industry's rapid evolution. In this episode, he explores how AI is fundamentally transforming data engineering workflows, discussing the emergence of dual pipeline architectures for both human and AI consumption, and the critical skills data engineers need to remain relevant in 2025 and beyond. His insights on the shift from structured data pipelines to multimodal, AI-optimized infrastructure provide actionable guidance for engineers navigating the next generation of data stack technology.

Quotes

"I've worked across different product-based companies in different domains like risk and product, as well as privacy, and the core data engineering teams as well." - Pranav Motarwar

"Data engineering is completely segmented into two different categories: one where the end consumer is human or product, and another where you are building data engineering flow, pipelines, and design for agents to consume." - Pranav Motarwar

"What used to take one month to create an entire flow with DBT has now been reduced to almost 30% of the time we usually spent three to four years ago." - Pranav Motarwar

"Data engineers need to be aware of the process of chunking, embedding, and how you are planning the vector store and optimizing the entire process." - Pranav Motarwar

"The data which was generated by humans from humanity till the year 2008 is currently generated in a day—that's how the volume is exploding." - Pranav Motarwar

"There are two main aspects to data engineering right now: AI for data and data for AI, and both things are essential for an engineer to plan their future." - Pranav Motarwar

"You can't say that you should focus on AI for data rather than data for AI because both are going to be very much important for the next couple of years." - Pranav Motarwar

"Companies like Apple and Tyro are raising relevant job applications in the market known as AI data engineer, with requirements around creating data pipelines for agents and using AI agents in your data engineering flow." - Pranav Motarwar

"Traditionally, we were consuming and processing data in a very structured format, but now that is getting transformed for agents, where it will be unstructured files, audios, videos—it can be pretty much anything." - Pranav Motarwar

"If you want to cope with market dynamics, you need to understand the requirements in the market and gauge your skills according to the market dynamics." - Pranav Motarwar

If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here: https://www.fame.so/follow-rate-review

Resources

LinkedIn Profiles:

Pranav Motarwar's LinkedIn: https://www.linkedin.com/in/pranav-motarwar-648a55169
Benjamin's LinkedIn: https://www.linkedin.com/in/wagjamin

Company Websites:

Firebolt: firebolt.io

Tools & Platforms:

DBT – Data transformation and modeling tool for building analytics engineering workflows
Fivetran – Data integration platform for automating data pipeline ingestion
Snowflake – Cloud-based data warehouse for structured and unstructured data processing
Databricks – Unified data analytics platform supporting ETL, data science, and AI workloads
BigQuery – Google Cloud's data warehouse for analytics and machine learning
Looker – Business intelligence and visualization platform
Cortex – Snowflake's AI-powered tool for data pipeline automation
LangChain – Framework for building applications with language models and data processing layers
Vespa – Retrieval engine for fast vector search and multimodal data serving
AdaptDB – Analytical database system for building software products

Articles & Research Papers:

"MIT Technology Review Report on Data Engineering and AI" – Co-published with Snowflake (2023-2025 projections on AI use cases in data engineering)