The Data Engineering Show
Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That
February 3, 2026
What happens when a team of seven engineers spends a year trying to build a production-ready CDC connector and fails? For Artie CTO and co-founder Robin Tang, it was the spark needed to build a platform that makes data streaming accessible. In this episode, Robin joins Benjamin to discuss the "DFS" (Deep First Search) approach to data sources, the engineering hurdles of real-time Postgres-to-Snowflake pipelines, and why "theoretically correct" architectures often fail in practice.
In this episode of The Data Engineering Show, Benjamin sits down with Artie CTO and co-founder Robin Tang, to explore the complexities of high-performance data movement. Robin shares his journey from building Maxwell at Zendesk to scaling data systems at Open Door, highlighting the gap between business-oriented SaaS connectors and the rigorous demands of production database replication.

Robin dives deep into Artie’s architecture, explaining how they leverage a split-plane model (Control Plane and Data Plane) to provide a "Bring Your Own Cloud" (BYOC) experience that engineering teams actually trust. You’ll hear about the technical nuances of CDC, from handling Postgres TOAST columns to the "economy of scale" challenges of processing billions of rows for Substack, Artie’s first customer. Whether you're struggling with real-time ingestion costs or curious about the future of platform-agnostic partitioning, this conversation provides a masterclass in modern data movement.


What You'll Learn:


If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.


About the Guest(s)


Robin is the CTO and cofounder of Artie, a data movement platform built for high-volume, low-latency production database replication. With over a decade of experience building large-scale data systems, including early work on Maxwell (an open-source CDC framework at Zendesk) and database architecture at venture-backed startups, Robin identified a critical gap: existing tools optimize for SaaS integrations, not production databases at scale. In this episode, Robin shares hard-won lessons from building mission-critical infrastructure, including architectural innovations that prevent data loss and failure modes that only surface under real-world production load. His work at Artie has powered reliable data replication for companies like Substack, making this conversation essential for engineering teams building or evaluating real-time data movement solutions.


Quotes


“Artie helps companies make data streaming accessible." - Robin


"I didn't want to make any sort of compromises and it just turned out to be a really hard problem, so then we started a company around this." - Robin


"The complexity is not just at the destination level, the complexity is also at the source level." - Robin


"Every pipeline that we touch is mission critical for customers, or else they would just use either their existing pipeline or a managed vendor that's out there." - Robin


"We handle the whole thing, whereas other vendors more or less provide a component and expect engineers to either build or attach additional pieces." - Robin


"I think the biggest bottleneck for real time right now is accessibility. When people think about real time, they immediately think it's not worth it because they implicitly have a cost associated with it." - Robin


"We use Kafka transactions, so we do not commit offsets until the destination tells us the data has actually been flushed." - Robin


"There's so much nuance with every single data source that it becomes a whack-a-mole problem." - Robin


"When there's sufficient pain on the other side and they buy into your vision, it's easier to overcome obstacles during technical implementation." - Robin


"We're spending more time developing platform-agnostic solutions so customers don't have to understand platform nuances." - Robin



Resources
 

Connect on LinkedIn:

Websites:

Tools & Platforms:


The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so

Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.

Check out our three most downloaded episodes: