In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide advantages over traditional systems like Flink, and explore the increasing role of Apache Iceberg in data pipelines.
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with
Yingjun Wu, founder and CEO of
Rising Wave, to explore the evolution of stream processing systems and the innovations his company is bringing to the space.
What you’ll learn:
- Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system.
- How Rising Wave's architecture, using S3 as primary storage, delivers second-level scalability, while other systems can take hours to scale.
- The competitive landscape of stream processing, with Rising Wave's Postgres compatibility providing a significant advantage in ease of use.
- How one major company reduced its CPU requirements from 20,000 to just 600 by switching from a traditional stream processing system to Rising Wave.
- The rising importance of Apache Iceberg as a destination for stream processing output, helping companies avoid vendor lock-in.
- How streaming systems fit into modern data stacks, especially as companies seek to avoid being locked into proprietary systems.
Yingjun Wu is the founder and CEO of Rising Wave, a stream processing system built in Rust and designed with a cloud-native architecture. With a PhD focused on stream processing and database systems, Yingjun previously worked at Redshift and IBM Research before founding Rising Wave. His company has developed a system that achieves significant performance and resource efficiency advantages over traditional stream processing solutions, while maintaining Postgres compatibility for ease of use.
Episode Highlights:
The Origins of Rising Wave (00:30)
Yingjun shares his background in stream processing from his PhD days and explains how his experience at Redshift revealed the need for better stream processing solutions, especially since many data warehouse workloads involve data ingested from streaming sources like Kinesis or Kafka.
Building a System from Scratch (04:10)
Yingjun describes the challenging first 2-3 years of developing Rising Wave without customers, highlighting how trust is a major barrier for new database systems. After 2.5 years, they secured their first customers, including a startup and several larger companies, which helped establish Rising Wave's credibility.
The Current Stream Processing Landscape (07:47)
Benjamin asks about the current stream processing space, with Yingjun positioning Rising Wave as a leader, particularly for SQL-based workloads. He highlights several key advantages of Rising Wave, including its Rust-based implementation and S3-based storage architecture.
S3 as Primary Storage (10:27)
Yingjun explains their decision to use S3 as primary storage from day one, despite its slowness and expense. He discusses how they've optimized for these challenges and would still make the same architectural choice today due to benefits like simplified state management and superior elastic scaling.
The Business Model (13:52)
Rising Wave offers open-source, cloud, and on-premise versions of its product. Yingjun notes that many highly regulated industries require on-premise deployment, including customers in the banking and aerospace sectors.
Typical Users and Competitive Advantages (15:01)
When asked about their typical users, Yingjun explains they directly compete with Flink but have advantages in ease of use due to Postgres compatibility. Their users are either new to stream processing or are migrating from systems like Spark Streaming or Flink due to performance issues or development complexity.
Apache Iceberg Integration (19:25)
Yingjun discusses how Apache Iceberg is emerging as an important destination for Rising Wave output, as companies seek to avoid vendor lock-in with proprietary data warehouses. He explains how Rising Wave typically performs ETL functions before data is sent to Iceberg tables.
The Future of Data Management (32:06)
The conversation concludes with a discussion about Iceberg becoming a "single source of truth" for data, with multiple specialized query engines potentially accessing the same data. Yingjun and Eldad share perspectives on how this shift away from proprietary data lock-in is changing the data ecosystem.
Episode Resources: