A ClickHouse Review from a Practitioner’s Point of View
Boaz: Hello, everybody. Welcome to another episode of a Data Engineering Show. Today with me is, Sudeep Kumar.
Hi, Sudeep. How are you?
Sudeep: Hey, I'm doing great. How are you?
Boaz: Very good. Very good. You're in Austin, correct?
Sudeep: Yes and it's very hot here, right now.
Boaz: I'm in Tel Aviv. It's also very hot indeed here. We are at around 30 Celsius, which should be, let's see 86 Fahrenheit.
Sudeep: Oh, we at 103.
Boaz: Ah, you win! But we have higher humidity which sucks.
Sudeep: Really okay. Here it's like 40% right now, maybe it's probably higher. In the studio, I just checked it up here.
Boaz: Yeah, here it's higher. You go out, you start sweating.
Sudeep: Oh, okay. Probably that gives you more feeling of heat, right?
Boaz: Yeah. Unfortunately, not a dried desert heat, but more of annoying city heat.
Boaz: Okay, so Sudeep thank you for joining us. Sudeep is a Principal Engineer at Salesforce and has been there for almost a year. Before that spent a long career and especially many years at eBay, doing a variety of engineering and data-related projects and roles. We will spend some time asking you a lot of questions about that. So, are you ready?
Sudeep: Yeah! Yes, let's do it.
Boaz: Okay. Let's start, Tell us a little bit about your background, how did your career over the years and what kind of data-related things your career end up letting you do?
Sudeep: Sure. Actually, I started my career within the telecom domain and this was back in my country, India. And at that time, telecom was like really booming into around like 2006 around that time. That was the most happening place to be. I spent a couple of years there and then I immediately saw a shift towards mobile. Everybody started just working on mobile, mobile applications, developing around that, wanted to be part of that ecosystem.
So, I joined the largest player out there within the mobile, Realm in India, which was Samsung at that time. And I joined Samsung and it got involved there and that's probably where I got exposed to the first real background on the data volumes specifically because we were building applications around social networking, very similar to Facebook that we had today. So, this was like in 2008 around that time. And then I was there for like around four and four and a half years, and immediately saw the shift of the market and the interest towards E-Commerce and around that and I wanted to be in that area then, and I just made that conscious switch from moving from mobile to E-Commerce. And, that's how I ended up on eBay. And I was there for eBay for 9 years. Primarily, within the platform engineering team where we were handling huge volumes of structured, and unstructured data, specifically more on telemetry data. So, we are handling events, logs, and metrics at scale. I moved to the US around 6 years back. I had an opportunity to do that and joined a similar team here, working on things like structured events or creating a platform around OLAP, so worked there for a while. And, then created a metrics platform, which can handle like 20-30 million data points per second. Further to that, I looked at distributed tracing there and got a similar opportunity in Salesforce. And, then I just made the switch at that point in time. It was a long stint there anyway. So, it was a good time to switch. During all this, I moved to Austin. I was in Bay Area initially but then moved to Austin a couple of years back when the pandemic started, before Elon Musk, for sure.
Boaz: So, at 9 years you were afraid of hitting the 10-year mark at eBay and decided let's try something new.
Boaz: You got scared by saying, staying at 10 years at the same spot. I know that thing that I had earlier in my career as well.
Sudeep: Yeah. It's like, everything is so comfortable and then you just want to switch around things.
Boaz: Yeah. So, let's talk about those days at eBay. Sounds super interesting. I mean, yes, starting from the early E-Commerce days, eBay probably was one of the first companies that really tackled a lot of data, really at the center of the big data game. Tell us a little bit more about that data, that platform engineering team, which you were a part of, sort of, how big was it? What kind of teams did it have? What kind of data-related projects were you in charge of?
Sudeep: Right. The larger umbrella was the data platform team. But then within it, we were specifically looking into monitoring and telemetry platforms, essentially they also handle large volumes of structure and structure data. And within that particular charter of monitoring or observability in general, we had different pillars. We had a pillar for metrics, another one for logs and events and then we have something even alerting, dashboarding, everything. So, they were not really mutually exclusive from each other. There was a lot of overlap, the people are moving around here within this charter. So, my primary focus was on the backend. So, I was completely on the backend side. I had exposure towards this logs, metrics and events in general. And, then towards the end, I did also look into distributed tracing. The team was divided. We had a team in China and we had a team in the US. We like fairly, evenly distributed around. I think we were around 30 folks, overall if you really look at it. I don't recall the exact number, but somewhere around there, overall.
Boaz: During those years what did the data stack look like and what part did you have a chance to work on?
Sudeep: Yeah. So, initially, I started with C++ there. We had something called a Cal publisher, which accepts all your log data. That was kind of what we were using and internally, behind it had Hadoop, where we were storing the data and so forth. Slowly, we moved on to Java and as part of Java, specifically Flink, we had a lot of streaming jobs that we are writing on Flink to emit useful metrics signals on this particular log data or different kinds of metrics even for doing some aggregations. So around that, and then slowly, when we started looking at or working with Prometheus for metrics, I think the entire team kind of shifted to Golang and we've been like a big-time Golang advocate for the past few years now. But I think, if anything you ask any of our engineers within eBay, specifically the data platform team to write any solution, they would just go to Golang and write something up.
Boaz: Interesting! What other major shifts throughout the years changed, I mean, sort of, were there any big projects that replaced Legacy and modernization? What were you part of?
Sudeep: Yeah. So, we did have a Legacy logging system, which was not very real-time, it was more from a batch kind of mode. You run a job, find out your matching logs and from there more real-time logs we moved to. There was a contribution that I played there, but more recently the one that I can think of is we completely shifted our OLAP use case from Druid onto ClickHouse, which happened a couple of years back. That was probably one of the bigger events that we did, from what I remember. For metrics also, we moved from HBase to Prometheus. We created our own distributed architecture on Prometheus score and that platform I still pretty well do. So, we say some of the things that over the period that I've seen change.
Boaz: So, the OLAP workloads also fell under the same group, under the platform engineering group. So, what kind of OLAP used cases, who are the end users?
Sudeep: SRE was a primary end user for most of these OLAP use cases, but there were also some alerting that people had on top of this particular data. There are no business-critical alerts or data there specifically. The way this OLAP data was generated was through logs. So, our logs were pretty structured. So, there was a component that would look into all these logs and emit these as events, like a much more scrape-down version of it. And, then we would write onto some kind of OLAP store, which is Druid in our case. We had a Kafka in between then finally just reached to Druid and that's kind of our...
Boaz: What data volumes were pushed to Druid and ClickHouse?
Sudeep: Right. I think, I had written a blog around this, and we had a few talks also around this, but from what I remember when we were doing Druid in a minute, at that time, right and the volume had definitely increased from them. It was, I think around 250 million events per minute. But, I think more recently on the newer platform, we were looking upwards of 2 billion events per minute.
Boaz: How big was the time window that was accessible?
Sudeep: So, you can go up till a year back, but then the data will be rolled up like we just keep rolling up the data. The granularity will be lost, but you can essentially access data up to a year.
Boaz: So, all the orchestration and how is the ETL done, the aggregation? What was you around Druid and ClickHouse?
Sudeep: Yeah. So, when these raw events were coming in, we really didn't do a lot of, like that was an entry point for us, but we did some level of transformations in terms of, I think removing some PCI data and things around that. But, other than that I think we also structured the data as part of the transformation that we did and essentially rolled them into raw tables initially. And, on these raw tables, you had more aggregations built on top of it. Like, there were other tables that would do aggregations on top of this data and roll it up further. And, that's kind of the model that we followed. So, so those rolled up would happen for every application, for every region. So, those were kind of rolled-ups that we essentially did and this was similar.
Boaz: Are we talking On-Premise or are we talking Cloud SaaS?
Sudeep: This was On-Premises. So, we have a Managed Kubernetes platform that was hosted On-Prem and all our workloads run over there.
Boaz: Data lake, how's the raw data stored? What is being used?
Sudeep: For the raw data store? It's the same, we used ClickHouse for the raw data store, for the OLAP, right, specifically.
Boaz: No, outside the OLAP.
Sudeep: Oh, the raw data, so Hadoop. For the same data, the log data, which will go onto Hadoop. We had a little bit of data, I think also written on to Elastic for some of this past search use cases, I guess.
Boaz: Over the years, was there any attempt or push to, go to AWS or something like everyone else?
Sudeep: So, eBay and Amazon probably are competitors, right?
So, probably I think AWS may not be a good fit, but I think there was some talk about doing a hybrid and many companies are trying to go that route. But I think, more or less the encouragement from our management has been to run on On-Prem. I'm not sure what is the motivations around that. There's probably more to it than I have visibility into, but I have a feeling that there is something more there but I'm sure I remember the recall that we did explore other clouds like a GCP and all, we did explore it.
Boaz: Today, at Salesforce, are you guys doing Premise work or do you do cloud work?
Sudeep: At Salesforce, I think it's publicly known that we do, sort of, AWS, so I think that's mostly it's there, but similar to many other companies, we have our own infrastructure as well. But, mostly these days, we are moving towards everything on the cloud.
Boaz: What do you miss from On-Premise the most?
Sudeep: I think, one thing, I loved about On-Premise, is you are much more control over things. You can SSH and all that with not hampering a lot of security, rules, or policies on that, that was convenient. But, I'm guessing that those things are very important, right? When you get things too broad. So, there is like a Pro and Con there, but definitely, for a developer, like for me to be able to develop things, roll out, test it out, I think, it was much easier to do on a machine where I had complete control on.
Boaz: That's an interesting trade-off. For more and more people, it's something they don't even have a chance in their career to try out anymore.
Sudeep: That's true.
Boaz: I appreciate the differences. Druid where you transitioned to ClickHouse, you guys were probably on Druid for a long time, what was the tipping point or the reasoning to try and replace or try another platform?
Sudeep: We've been running with Druid for quite a bit time, and we were very successful running it for like, I think, over like 3 years or so. But, then volumes fell low. What we did see that year over year, the growth was increasing and the amount of resources that we had to add to support something for Druid was also substantial and it was showing up on the cost.
Another one that we saw is that our availability for Druid on our infrastructure was not very reliable. It was very flaky. It had to do a lot with how our infrastructure is hosted. So, not really like, specific to Druid, but we have things that we have done to make things a little harder for systems like Druid to work, very nicely on our infra. So, we started hitting like our DevOps folks, those guys just started getting a lot of pages and probability issues. So, it was just becoming a huge problem. We wanted to look at alternatives, but we were not really seriously looking at them. We had basically tried to add more infrastructure and scale horizontally. But, at some point in time in 2019, I guess, somewhere around that, we heard of this data sort called ClickHouse. It was not very popular then, like, as it is today. I still remember in DB rankings, it was showing up like 180 or somewhere there. It was not popular at all by any standards.
Boaz: It was a hidden gem.
Sudeep: It was a hidden gem. Yes, that's probably the right way to put it. But, we just tried it out. There was another architect of ours and we just tried it out. We just liked trying out whatever new comes, and we just ran it. Since we were having all this load, also come to Kafka. We had separate consumers running there for this particular, same data pipeline. It was just something very basic for ClickHouse and started writing. And we didn't expect much. Like we just thought, okay, let's see what happens. And, I think we did not see any incidents. It was just running without any issues. And, meanwhile, our current existing platform started having a lot more issues and then it forces us to actually look deep into ClickHouse more because it just was eating up any kind of volume that you really throw at it, which was just very impressive for us, especially on the ingestion side. So, we investigated more, and we saw that this potentially could replace our OLAP pipeline and then, we put in our engineering effort around it. And, I think that really was very fruitful and then, we saved quite a bit of, like substantial infrastructure costs, like around 90% from what I recall.
Sudeep: Yeah, that was pretty substantial costs that we saved.
Boaz: Because ClickHouse had proven to be more hardware efficient.
Sudeep: Yeah, it was more hardware efficient when it comes to really ingestion. And for the search and things around that, we had to put a lot more effort around Druid to make sure that it scales. For ClickHouse, we really didn't have to do that much and that is pretty impressive about ClickHouse.
Boaz: Tell me more about the search use case. What kind of queries you're trying to run there?
Sudeep: There are queries that we run for SRE specifically, wherein we say that application health-related data, that we get from the log, so it says that how many errors did you see? How many URL counts you have? What has been the application heartbeat, in general? like how many heartbeats you have seen? and how much volumes you have handled, in terms of like, overall? What is a distribution for different kind of HTTP statuses that you have for your application? So, things around that was actually being rendered visually using this particular data and SRE was a big time, like power users of this particular data.
Boaz: You guys also do or doing like Joins in the queries or was it all sort of one big factor?
Sudeep: Yeah. There was Joins. I don't really recall, what kind of Joins we were really doing but there...
Boaz: Because Druid and Joins were never good friends, that became easier in ClickHouse.
Sudeep: Yeah, you're right. I do remember we had some Joins across different tables, but I just don't remember what top of my head, what the use case was.
Boaz: Did you use that opportunity to also do a proper evaluation and try other tools head to head that ClickHouse came out the winner or was it just ClickHouse, sort of, suddenly came in from the corner and surprised everybody for the better and that was that?
Sudeep: Yeah, I think, we were trying a few options, but nothing really was serious. Even ClickHouse was not really serious for us. But, ClickHouse probably made the cut just because with minimal effort, we could see a lot more benefits on our pipeline. So that's the only reason probably we looked a little more deeper into it. I do remember that we looked at things around whether we can write this data offline onto Hadoop itself and have some jobs run there, but it's not really very real-time.
Boaz: You guys, I guess, being On-Premise, you self-managed and ran the open source versions of both Druid and ClickHouse.
Boaz: In eBay, did your teams also contribute to their projects along the way or sort of go into the code and then add functionality?
Sudeep: We had some contributions on the ClickHouse operator, for the open source operator, from our team. But, we were quite active in the community. We were very active with folks who are actually very involved with ClickHouse as well. And, as part of that, probably I'm guessing some of those requirements that we raised probably translated into features in that product and that community was not that big at the time. It is probably much bigger right now. But, those days people generally knew each other by name and it was very, very small community. And, the way I used to meet is just hop on a call with them and then, they were much more accessible at that point.
Boaz: Yeah. Full disclosure, we don't talk about Firebolt in this podcasts typically, but Firebolt is based on part of it. The core execution engine was a hard fork from ClickHouse back then. ClickHouse is essentially amazing when it comes to speed and performance at scale, in sort of, part of the journey for us in Firebolt was how could we adapt it to the modern data stack and then to assess experience and so forth, but absolutely the performance and the scale is top notch.
Sudeep: Pretty impressive, right.
Boaz: Yeah, and tell me about the metrics platform that you mentioned.
Sudeep: What we essentially ended up doing is that we saw the advantage of using Prometheus, and it was integrated very nicely with Kubernetes and we had our own HBase-based solution for metrics, which was fine. But again, it had availability challenges and it was not very rich in feature, as much as we would've liked and some of the things around data locality and all that was very much well addressed in the Kubernetes world. So, we looked at whether we can use Prometheus core and then create a distributed architecture on top of it to be able to scale. So, essentially what we did is, we took the Prometheus core and we created an abstraction layer on top of it, which is essentially we call it like a shard and you can have multiple shards within the cluster and for things around replications and all was taken care by dual rights. And, when you egress a query, you essentially look at things around the health of the shard based on the data points for a particular tenant, how healthy that was from that shard and we would cater to those query based on that.
Boaz: Awesome! What are the projects do you remember the most as the ones you enjoyed the most in the last decade working with all these things you've worked on?
Sudeep: The one that probably I enjoyed the most was actually on a problem of called discovery of data within large volume sets. And that was pretty interesting because we had the challenge of uniquely identifying different metrics and within metric identifying different keys and values for it.
Boaz: We're talking about the eBay days, alright.
Sudeep: Yeah, the eBay days will give more context. So, we had a problem around topology discovery for metrics and at large, and these were like huge volumes of data, right? So, essentially those used to take a lot of time when you really run at the backend to find out what are, give me all the metrics, for each metric give me all the key and values for it and specifically when the...
Boaz: Sorry to question in the middle, but the topology for metrics, can you elaborate on that a little bit? So, what's the business scenario there? What we're trying to look into?
Sudeep: Imagine a case where a user is coming onto your metrics platform and he starts first with a metric name. So, he starts typing in, say, metric name called, say, my tenant dot CPU, something like that, and then you should see all the metrics associated with that particular expression, and then he selects it. It should be like a very interactive kind of mode there.
Boaz: And, that's a homegrown eBay metric store.
Sudeep: Yes. Right. So, this particular system was created in parallel to the metric store. So, your metric store has all this raw time series data, but the topology itself for each of these metrics, like the metric name and key and value pairs used to reside in a different data store. We used to use Elastic for that but then we had a different data store for it and all that, the metadata discovery on the, give me the metric name, give me the key and values those used to come from Elastics. So, initially what we tried is to save all that from the raw store but that was not very performant. And, we looked at what we can do to make it much more efficient because those need to be super fast. Users cannot really wait for those interactions to complete. So, we wanted to give a very delightful experience there and that's the first kind of interaction that users would have on our platform, start searching. And since we were like the system which is getting all this metric data in one, like single sync, it became more important for us to be able to do that faster.
Boaz: Got it! Awesome! We also sometimes like to ask about bad memories. So, give us, sort of, regretful story, something that you learned from over mistake, that we can teach others to avoid. Tell us a horror story.
Sudeep: Probably one of the horror stories I can remember is on the distributed tracing. So, when we really look at tracing, you can create a platform, you can create everything around it, but when it comes to adoption for something like tracing, you need end-to-end visibility. So that means, like any hop of your request that goes to say service one to service two to service three, needs instrumentation done on all three services. And, that was a challenge that we probably underestimated. So, what happened, we created the platform. We created everything. It works very nicely, but then if one of the service owners says that I am not ready for the instrument. I'm not ready to do it. So, you had the entire chain break. And then the experience really becomes not that great. You'll see different traces, which are not even like matching with each other. So, that experience was a problem. So, one thing I realized from that is that maybe a good idea to always check how feasible is it also for people to adopt it, like speed, right? And, I think that's something definitely was a little bit of pain also I felt to make people adopt.
Boaz: So, the program was discovered way too late.
Sudeep: Yeah, it is like, like you were so happy with this particular system that you built and you're thinking this is going to change the world. But then you find that, okay not everybody is excited about this, as you are excited about the same.
Boaz: Which is interesting. Something we hear about like in classic product management. How can we keep the end users close to what we're building? And often times in deep engineering projects that can get lost.
Sudeep: Yeah, exactly.
Boaz: But, for data projects and I myself, I'm a product guy, dealing with data my entire career, but a product guy and you see more and more positions to data that are called a data product manager because there's so much data activity and data projects going on in the software world that, there's this specialty in being a product manager for data related projects only. And, we need to put the end users into the planning ahead of time, even if those are sort of our internal users or analysts or whatever or engineers.
Sudeep: Yeah, that's a fair point. Yeah, I agree.
Boaz: How different is the data world at Salesforce compared to eBay?
Sudeep: The problems are similar. I guess so there are a lot of similarities, like in the concepts and everything, but, each company has, I think, a different way to go about things in terms of execution. So, there are pros and cons, in both places. I think, for me it was more like lift and shift, really to be honest because some of these terms and constructs and even technology stack was all familiar, I guess. So, for me, it didn't really take long and I don't feel a lot of difference within the two.
Boaz: What stack are you dealing with now at Salesforce?
Sudeep: It's Java, a little bit of Golang and Elastic search, Hadoop, I think a lot of OLAP, ClickHouse is not yet there, not yet. So, let me see what they can do.
Boaz: So one is being used for OLAP.
Sudeep: I think for OLAP, we do have something like Druid. I'm not so sure actually, because I'm still new in this particular charter, but I've heard that there are events pinned around, I think around Druid, but I'm not sure.
Boaz: It's probably a little bit of everything.
Sudeep: It's a little bit of everything. This team is also pretty big, and being remote and not being able to like interact with the team closely, I think some of the things I don't have visibility into directly, but, right now, we are focused on distributed tracing, me and our team.
Boaz: And that's for yourself, you've gotten used to the desert heat in Austin or are you planning to move back to the Bay Area, now that the pandemic is behind us? I missed the weather. I really do miss the weather in Bay Area and the food in Bay Area also is good. People say food in Texas, is also great but not to my liking probably. The Bay Area food is also good. So, I do miss these two things about Bay Area, but Bay Area is super expensive. And that's probably one of the reasons I moved in the first place, especially if you want to raise a family and you need more space and everything. Texas is pretty accommodating.
Boaz: Yeah, well, Sudeep it's been super, super interesting.
Sudeep: Same here.
Boaz: Thanks for sharing your journey and your experience with these technologies and the data challenges you've done and yeah, that's it, super interesting!
Sudeep: Thank you. Thank you for your time. All right.
Boaz: Thank you so much.