Cloud Crunch
Cloud Crunch

Episode · 1 year ago

S2E04: Data Migration to Snowflake

ABOUT THIS EPISODE

There's a new(ish) major player in cloud data management! 2nd Watch data and analytics experts, Sam Tawfik and Spencer Dorway, talk about practical data migration to Snowflake.

...involve Solve, evolve. Welcome to cloud Crunch The podcast for any large enterprise planning on moving to or is in the midst of moving to the cloud hosted by the cloud computing experts from Second Watch, Ian will be chief architect Cloud Solutions and Skip Berry, executive director of Cloud Enablement. And now, here are your hosts of Cloud Crunch. Welcome back to cloud crunch. Today we're gonna have an interesting episode a little different than normal. We're gonna dig into a very specific cloud product. I'm gonna welcome our special guest stay. Who are colleagues of mine? Sam Taufik in Spencer Doorway. Sam and Spencer. Welcome to the episode. Grammar. Yeah, absolutely. I think we have a great conversation. We're going to dig into Snowflake, but I want to get a little summary of what? We're gonna talk about it here. Data engineering and analytics experts Sam and Spencer will be joining us today to discuss really the data migration to Snowflake. How to get that done? You know, as many of you may know, Snowflake went public this year. They had an I p on the second half of 2020 they raised a small amount of money of about 3.4 billion. That would be interesting to see where they take it from there, but a quick introduction. Sam, you're the senior product manager for Data Engineering Analytics here. Second Watch, and I I've looked at your background. Obviously, we've worked together, but you have a deep experience with modernization of data and applications, So I think this is gonna be really relevant there. And Spencer Data Engineering. You have been doing that for quite a while as well. So I think you guys you guys have seen it all looks like So let's let's get into it. So I want to kind of start this off. Maybe Sam, you could take this first question, But why are businesses migrating to the cloud at this point? That's a great question. And really, businesses are choosing the cloud for a variety of reasons. But before they choose the cloud, when you take a step back and you think of, why are they making a change? And it turns out that cos basically make a change because the need to migrate off of a legacy platform, maybe the it's being end of life or they just they need toe modernize. That's the second reason that they feel that the old platform wasn't supporting their business needs and they wanted to modernize that environment. Then the next decision will. Where should we go for this? Because Cloud is not necessarily always the first answer. It is the most common answer now and for good reasons. But business is really have a choice. When you think about it, they may choose to stay on premise, but a lot of businesses are choosing to go to the cloud, and the reasons are the cloud gives them a lot of benefits benefits, and I'll keep it brief. I'll keep it at the high level. So let's start with agility. The cloud is really easy to get to. It's really easy to get started. I can have a cloud platform spun up and ready within, ready to go within minutes. If I wanted to, I can have a solution deployed within minutes or hours, so it's really easy to get started. You don't have the capital expense. You don't have a nightie team that's implementing or deploying your servers, so it's really easy to get to the clouds. That's the first reason second reason. I alluded to that a little bit in that it's pay as you go. So you don't start with a huge capital expense. You can start small. You can test it out and only pay for what you're using, which is really, really attractive for businesses right now, because it may work or may not work. And we don't expect everything toe work from the first stride. But it's always nice to know that we're not gonna have to spend a lot of money to try it out. And it gives us the freedom to be innovative and to try new things. And really, and from my perspective, the third reason is to write the technology...

...weight. By that, I mean, the cloud is being improved daily. If not, hourly companies are releasing new software all the time. You don't have to wait for that long release cycles so you can take advantage of new technologies right away as soon as they're released or what, you're ready for them. But you don't have to wait like we used to do in the old base. Yeah, I think that's a very key to that, you know, and obviously, what we're discussing today is modern data, and I think it goes right along with obviously application modernization as well. So let me ask you guys this, though. When it comes to migrating, obviously you know a lot of its lift and shift. There's the 56 or seven ours, depending on who you're asking, let's talk about some of the data. What are some of the challenges to getting the data migrated to the club? So some things you have to think about when you're migrating the data is where you're starting from. If you already have it on premises solution, that's some sort of data warehouse or a consolidated place where all your data lives. You write that lift and shift is really attractive because you could just migrate that over or if you're starting from just, uh, you have a bunch of applications and you're just starting on your journey to consolidating the data. What type of kind of paradigm do you want to adopt going forward? That will be flexible and you can make some decisions and that when, uh, starting to use the cloud if you're doing that there for every different cloud provider, they have a couple of different offerings, options for pulling your data out and moving it into their larger storage. For example, a WS has as three Google Cloud has cloud storage and Azure has their blob storage. And these are all pretty similar products that you can bring your data to stage it there to get it into some sort of data warehousing solution. So that's the first thing to think about. Well, that's good. Yeah, obviously they all have similar solutions, and that's great. So but how do you help get these companies started? Can you explain some of the examples of how you got them started and how to make some of those decisions? And those drivers around that Yeah, for sure. So you look into where do you like? How do you want your data to be used in the end? And where is your data starting from? So if you have a collection of application databases, which is a common situation that you run into, you wanna make sure that you're not interfering with the way those applications air running, and you're able to pull the data out to get it to a place where it can be used and so certain solutions for that include pulling the data out on some sort of schedule and storing it in these larger block storage and then pulling it into a data warehousing solution. That could be big query, Red Shift or snowflake, you know, So from a I'd like to add to also what Spencer just talked about from a broader perspective. I think organizations need to think kind of strategically, you want to start with a business case. You always wanna have a business case. We don't want to try to make the technology fit the business case. So we need a business case. Need sponsors and champions to help us with the proof of concept moving toe to the cloud that really helps the ICTY organization in the business focus on What do you want to accomplish? Because the next thing is that you need to define the success criteria. How do we know that this is successful? How do we know that this is something that we want to roll out for the organization? So it's important to understand and approve those the suq success criteria before moving in, and then finally, we'll not finally, but next. Do you really want to start small and kind of have a reasonable timeline? We cannot boil the ocean overnight, so I like to take the conservative approach of crawl, walk, run. Let's have a proof of concept. Get it finalized, get it rolled out. Everybody agreed that this...

...successful and roll it out and then celebrate that success because with the celebration, the word gets around. If this is successful and then the next project will come along and then the one after that and the one after that. So you really wanna build the momentum? You want to demonstrate success, and at the end of the day, it has to help the business in one way or another. That's great. You guys touched on the business case. We often bring that up on this podcast that everything usually needs to revolve around that business case. Get the proper stakeholder by in, and I think you guys would agree with. When that happens, things go smoother. Absolutely. The business users are your your champions, and they're the ones that they're going to say Yes, this has helped my business. This is increase my r y. I might have happy customers, and that's really what it should be all about. There's you can you can implement endless number of solutions in the cloud, but it could be wasting a lot of time. You wanna optimize the resource is and definitely meet the business demands. Absolutely So obviously, we've touched on the three major cloud providers here. But what are some of the other major players and cloud data management? So snowflake is definitely another major player. When it comes to data management, Snowflake focuses on being born in the cloud or get our housing. And, um, they really deliver on the principle. In that concept, it's easy to get the data into snowflake. It's designed for the cloud, so it almost unlimited scalability. It's easy to deploy, supports a lot of the business demands. And I know users like a lot like using it for its simplicity. And I think Spencer, you might be able to add to that because I know you had experiences with Snowflake. A swell Yeah, for sure. Snowflake is kind of built entirely for cloud usage. It's It makes it a lot easier for people to interact with it just out of the box just because it has, like a Web interface that you can go to to connect with it, creating users from there so that other people can do that without actually having to download applications to connect. And that whole process is streamlined. Um, as it's more of a data platform than just a data warehouse than the ingestion of data is cloud native as well and uses kind of all those existing tools to make it easier. Thio pull in the data for that. So you've obviously worked on some projects where this has been implemented. I mean, this is what you guys dio and can you kind of walk through like the evaluation, that decision process where you've ended up and stuff like, what were some of the business drivers? Technical consumption, things like, Obviously, you know, it connects the all types of data. We understand that part, but, like, really kind of dig in like the way that the pros and the cons a little bit from that business perspective as well, if you don't mind. Certainly. So whenever you're talking about migrating from one system to another, there's a lot of questions you have to answer and a big piece of it is. Is this worth it? Because managing that change as a business, depending on how much you have connected to it, how much you're using it, there has to be that motivation for transitioning. So when you are making that decision of whether it's worth it to change cloud environments or data warehousing solutions, there has to be some sort of reason why you're doing that. Maybe you're about to scale up to the next size, which is gonna be largely expensive. Or there's some sort of division or business requirement for migration, which is something we ran into when working with companies that either have a particular implementation that needs to change, or they have certain needs that aren't being met by that environment. So snowflake, for Let's use a particular example Ah, company that was acquired like it started in a startup and acquired and then started using a shared kind of data warehouse between all the companies that this company owned was then being sold off and needed to come up with their own implementation and...

...the strategy behind that. And so, given that decision, um, on a business level, you kind of have the opportunity toe kind of evaluate these newer tools like Snowflake and see if that fits your requirements, why it would be better and why you would want to move to that and one of the reasons we decided to use that one. Waas. It implements a similar strategy for getting data into the system and requires minimal set up to do that. Aziz. Well, a zit supports certain needs that were required, which is We had many users using the database, so migrating those over was relatively easy with snowflake and gave them a way to access it. That was pretty painless through their Web interface, as well as providing more support for unstructured data usage. So wearing things like Jason or nested data that we were experiencing problems with before Great. Now let's talk about the implementation process. It's just kind of like an all or nothing. Are you able to do it and more of an agile or phased approach? Or how can businesses kind of see how this is gonna toe work into their business? Yes, so it's important to think about each piece of your data life cycle and how you can migrate that as painlessly as possible. Snowflake made that pretty easy because it can ingest data from the same sort of sources that we were currently using in our existing platform. So having that data stored in S three or Google cloud or azure, you can create snowflake environments in any of those clouds and in just the data the same way you would to any of those data. Warehousing solutions said those cloud providers provide in the same way almost just through configuration, and then you can ingest it using sequel commands. Another advantage we saw was it's able to scale either horizontally or vertically to pull in whatever amount of data you're dealing with. And since it being cloud native. As Stand mentioned, there's kind of this movement towards pay as you use, and that's their entire structure is whatever amount of time your data warehouses running, you pay for that, and then it shuts off when you're done. It was just providing gestion times, so it makes it pretty easy to experiment with an illiterate on, and you don't have to have something running for 24 7, just what you're using with so you can use that for pieces and migrate them over one by one and even have them running in parallel so you can compare and test your new implementation. So one of the things I just want to touch on there you mentioned that it used a standard sequel. Is that correct? In order to access the day? Yes, we're being Yeah, it's pretty interesting. They have kind of their own sequel language. That's kind of an accumulation of a lot of different ones. Andi, that was a big decision maker in migration is Can this do everything that we are currently doing and can do more than that to fit those other needs that maybe we didn't think about, So that was definitely important. We certainly made sure that we can run all the same queries with minor tweaks. And we got the same output, which was a part of our evaluation of snowflake as a possible solution for us on, and it fit all those needs and could do the same and more so It sounds like it wouldn't require that much training for a user if they're familiar without a query data already. Exactly. If you, if you've worked in sequel in any sort of manner. Whether it's on an application database or in data warehousing, it should translate pretty well. It's great. Now, looking at some of the projects you've done What What is the typical timeline? Thio, Get this complete. Yeah, so usually you want to give yourself plenty of time to evaluate the whole process with all the stakeholders. So we were able Thio in...

...the same example where we're migrating from that kind of shared data warehouse to our own. The plan was to execute that migration in one month and then do data validation for the next several months running in parallel to them both and then switch things over over the next month. So it's kind of like a 2.5 months timeline that we were looking at. Ah, lot of the front end data engineering work in that first month of creating those pipelines and matching what was currently happening in the other database, which is relatively simple in that you're using the same kind of data lifecycle structure just switching what you're pointing to from one data warehouse to another. So you got in production then then what? What happens next everybody's using it. They're running along their happy. Totally. Yeah. I mean, obviously, every every change that you're incorporating as a business requires buy in from all departments. All applications using it as well as not all tools work the same way. So that can be frustrating for people. Uh, that being said, we had a great amount of success in our migration. And not only were people able to use it without an issue, but they also saw a performance improvement which always makes people happy when they get their data faster than they did before. So no, that's great. And then, obviously you're looking at governance. You're looking at data privacy protection, all those types of things as well. And there's a cover all those aspects because obviously, sometimes people want, you know, granular access control to certain sets of data. Yeah, so in, uh, snowflake, they kind of have multiple layers of that security. So one of those being on a user level where you manage users pretty similar to other databases where you grant access to certain groups. You organizing that you can even integrate with your active directory and have it run on those permissions as well and then have some sort of administrator that grants access to certain places or all tables or future tables in a certain area, which is pretty interesting. And then there's also on, like, implementation level. When you start using snowflake, you decide if your security level that you're passing off to snowflake would be, ah, certain tier either enterprise or they even have, like, HIPPA compliance and maximum date of security level that you pay more by the second to use. Yeah, that's great. That's good, because obviously we know security and privacy. They're all very, very important now, Sam, let me ask you this. How does the business know that they're done with data migration? I mean, is there a milestone or is this constant evolution? How do you view that? Yeah, they're not done. And they won't be done, hopefully for not quite a while, because hopefully their business will continue to grow, which means maybe bringing additional new data sources may be acquiring other companies. Eso that's going to continue to grow. So this is kind of an ongoing strategy. Yes, you will have milestones, and I think it's important to have milestones toe achieve specific business needs and again celebrate those successes often so again to get more more projects. But you're not done. You always want to keep on eye on what else can I? How can I enhance my analytics environment? If I bring in social data? Does that help me get a better perspective of my customers? If I bring in I o T data, does that give me a better perspective on my supply chain and so on and so on? So you always want to improve in in in that aspect and then also the technology changes and, uh, and the cloud is great for innovations. Tools come along all the pine you take a technology that's kind of evolving and maturing machine learning, for example. Tools are constantly being innovated in this space, so you wanna be able to take advantage of that because some...

...things that you might have might have been doing manually or with a kind of a longer process about a year ago. Now there might be a tool to automate that process for you and again. It's all about kind of continuous innovation, but not without being distracted without the temptation of just trying something for the sake of trying it. So, you know, back to the business case side. So you know, you develop the business case, Thio, launch this platform. And when you do these generations or future projects, do you find that doing another business case around that is very helpful? Or do you feel like it's more? The business owners already get it, and they're asking for it like, where did those needs come from us. So the needs will start kind of bubbling up on their own based on rolling out the particular solution. You're going to get some other users coming and saying, Hey, I want that, too. How come I don't get this type of dashboard? How come I don't get these types of alerts, which is great? That means that they're interested and there is a need there. So that's definitely going to be a major a major contributor for that, but also the same time as you start rolling out your solution and looking at inventorying the capabilities that you have and the data that that's coming in, that will also kind of spur up some other ideas because sometimes the users don't know what they want. And if you go back to them and you say, You know, did you know that you can build this perspective about your customer or about your partner? Because we already have the data and we can show it in this way? They might be saying, That's interesting. I didn't know that. So you also want to bring those ideas to the table? Is, well, data evangelism, I'm guessing. Good. Sam, obviously, you know, you've been out there in the trenches. You've helped a lot of clients. Any last words of wisdom before we wrap this up, when clients are considering a data migration of things they should be looking for? Yeah, I say, treated like almost anything else that you doing. Like If you're embarking on something new, it's great to try to do it yourself. And some things can be done on your own, but at the same time, be aware of maybe your current limitations, whether it would be your business or individual on and seek help. If if you can get the project started quickly by working with a trusted partner or somebody that can get you on the on the right track, and get you started quickly. I think that's a great way to get started and as you build your own skill set or your organization skill set. But it's really important to consider. Are there experts out there that can get me started? That the cloud offers unlimited solutions, unlimited tools, unlimited everything? And how do you navigate that? It really It helps a lot to have somebody that has done it before to help you walk through this process and get you started. Yeah, way. See that often it's analysis, paralysis because there's just so many decisions that you could make. Now, Spencer, you're out there in the trenches. What is, uh, some knowledge you can disperse with before we end this up today? Yeah, all echo a little bit. What Sam was saying in a lot of these cases, with these newer tools, you kind of don't know what you don't know either implementation wise or even the potential that's there that might suit some business needs you haven't even imagined yet or thought of. For example, one client I worked with was a company that worked with a bunch of other companies in building applications that they used. But a lot of these companies also wanted access to their data. And snowflake was a great solution for this because you can set up like sub accounts within and grant access through all of that. So pulling the data in and giving access to each client individually in a really Federated and organized fashion that meets the needs of thes smaller businesses at the same time not having a huge data warehouse solution that you have toe grow and grant access to because the scales on such a small or...

...large level for each one so really just dealing with people that understand the technology, know how toe implement it and even can suggest potential use cases for those as well, that's great. Appreciate you guys sharing the stories from the real world out there and kind of what it's like. Thio Use a product like this that that's fantastic. So thanks again for your time and everybody thanks for listening this week we want to hear from you, please email is that cloud crunch at second watch dot com with comments, questions and ideas. We'll talk to you next time. You've been listening to cloud crunch with Ian Willoughby and Skip Very. For more information, check out the blogged. Second watch dot com slash company slash blogged or reach out to second watch on Twitter.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (33)