Tech Talk: „Infrastructure as Code“ mit Martin Sereinig von Usersnap GmbH

Tech Talk: „Infrastructure as Code“ mit Martin Sereinig von Usersnap GmbH

Hello, my name is Martin. Thanks for coming to talk. We'll be talking today about infrastructure as code or from clouds on AWS. So just quickly to get this out of the way. Where am I? I'm Martin. I'm CTO at usersnap. I'm doing I do mostly back-end and devops work. Usersnap itself is build a self-service SaaS product running in the AWS cloud. And if you're wondering what we are doing - we build a customer feedback software for other SaaS companies to quickly get customer validation and make confident product decisions. If you like what you heard today from me, you can follow me on Twitter or connect on LinkedIn or we're always hiring for people, so just contact me.

Yeah. So usually when I hold a talk I usually like to talk about the history of things. Let's start with the history. In the beginning when we started with Web applications and when we talked about the infrastructure for Web publications we usually only talked about single servers. So you may have put your database on its own server and email server on its own server but usually everything about the application was managed on the server level and not on the bigger level. It was like this for a very long time and in some context it still makes sense if you work at the University Institute and all you need is a WordPress for the website then it's probably fine to just have a single server. But of course there are some challenges that come with that if you do that. One of the challenges what I like to call snowflake machines - like you have to server it's doing something and then “let me add this one thing” and then “can add this one other thing” and then you add another thing and then you do this five times or 10 times. And two years later nobody knows really what is running the machine anymore and that makes it very hard to scale horizontally. So if you suddenly need three of those machines it's getting more complicated. Another thing that's also the challenge of this is that you have lots of implicit knowledge that you need. So you need to know that you are using Apache as your Web server and not nginx because otherwise you don't really know where to look for the config.

So that's solved problems. You can just try to write the shell script that we set up your server or use Ansible or any other tool. So those are all soft problems, but they are challenges that we need that we needed to tackle.

But then something interesting happened. There were two revolutions or evolutions that happened in the last decade or decade and a half. The first one was cloud computing. So what did cloud computing really do for us when talking about infrastructure? Cloud computing turned complicated infrastructure into just off-the-shelf commodities like in 2009 when I wanted to build a load balancer I would need to provision a server and then I installed nginx on the server, and then I set up the reverse proxies, and then I set up the firewalls and all that stuff was manual and tedious work. Nowadays, when I want the load balancer I just click aws interface - I choose, I click and I pay. That's it. Nothing else is done. Nothing else needs to be done. And that allows us many more things like we can do things that we were never able to do before. I mean, how could you do a cdn just on your own? That is something that cloud computing allowed us to do.

The other revolution when talking about infrastructure is containerization. What did containerization do for us in this regard? Well, it turned out application servers into dump docker machines. When I wanted to run a ruby on rails app in 2009 that creates png files I needed to provision a server, I needed to install ruby in the correct version, I needed to install the png, I probably then needed to compile the png with ruby bindings again so it works bla bla bla.

I don't need to do that anymore. I just need to undock and everything that is in regard to infrastructure needs of our application is in the docker file. I don't I no longer need to care about that. The developers care about that, but I don't if I run the service. So if you look back now what we had before, we no longer have snowflakes machine. We're just running docker and nothing else. That makes it very easy to scale and there's no more implicit knowledge. Everything is explicitly described in the Docker file and that's all you need to know about the infrastructure for your application.

But what do we have now? We have a snowflake infrastructure because three years ago I clicked in aws somewhere to add a load balancer and if I didn't document it then somebody has to find it somewhere and while the individual machines and are very easy to scale, it's very hard to scale the entire infrastructure. So we set up all this crazy infrastructure with cdns and route 53 and all that things and now we want to do the same with the staging environment. You're going have a hard time doing that. And again we have lots of implicit knowledge and it's sometimes even gets worse because in aws for example you have different regions and if you go to ec2, to the ec2 dashboard and you´re in the wrong region, you don't see servers. Where are my servers? So we just swapped out the problems. We moved the same problems to a different level.

So how are we going to manage that? Well, I have some requirements that I think are necessary for modern infrastructure to live. At first all of our infrastructure tooling must be compatible with the cloud provider APIs. So all the best scripts that I wrote five years ago that I used to set up my servers - I can throw them all away, they are no longer valid.

Everything should be declarative. I don't want to describe how I want something to be done. I just want to declare what I want to have. That's also a contrast to the bash script. The bash script were direct or a docker file. They explicitly showing what to do in what order. Here I just want to say I want to have a cdn. That's it.

It should be important whenever I apply my infrastructure multiple times, it should always have the same result. It should be customizable if I want to have a staging environment and a different domain for example I should just be able to configure that. The result should not be special. It should not output the black box that I cannot look into. It should put out something that's normal in the in the environment of my cloud provider. I should be able to detect and deal with drift. Drift is when the current state of infrastructure diverges from the state where it should be, you need to deal with that. If something is different from your infrastructure than you want it to be and you don't notice, it's probably not good.

And the last the last requirement is it should be versionable. That's what everybody loves so much about docker when it came out right. All my infrastructure is not in the docker file and if I add a line, there's probably a git commit message that explains why this was ended.

So this brings us to infrastructure as code:

What options do we have? Well, there's tools that can talk to different providers, terraform, ansible chef. This is probably a good a good call if you like a consultancy or agency and you need to work with many different providers and don't want to relearn all that again and again.

There's cloud formation which is what we are using is basically just a gigantic jammer file that contains all the information about the entire infrastructure.

And there's aws cloud development kit which is pretty much it's a domain specific language DSL in typescript that allows you to code your infrastructure and then to synthesize it aws cloud formation templates from it. So it basically is just a frontend to cloud formation if you want to see it like this.

Why did we choose confirmation? Well, we're a product company that means we can just choose to say all in on aws. We don't really have any need now or ever to move to a different provider. So we just stick with what the platform allows us. And why didn't we choose the CDK? The current platform that we work on sadly predates the cdk so it was not available, but we would do it now I would definitely choose this.

So let's have a look at the minimum viable cloud formation template. What's interesting here? So there's basically three bigger sections. The first one is the description which is nothing really, it's just a “hello work”. But the first part is parameters. Parameters are things that let you define parameters that then show up in the aws console where you can input values. So this is for the remember when I said customization is important. So that's how you would go to implement customization of the templates and just add parameters that you can then change. And then usually the bigger part are resources. So in this case, we define a VPC, that's a virtual private cloud. It's like the most basic thing that you usually have in an aws environment. What's interesting here is that we don't show the how, we just show what we want. So we want to see with these properties. And that's it. It's declarative. And what's also interesting is we're used here the ref that we use here. That's the parameter that we defined on the top. So here you can use all the parameters and you can also use stuff that comes directly from us, like for example, the stuff that is inherent to the stack that you're using. So that's the most basic basic template possible.

How does it look in real life? Well, it's a bit more complicated. Usually have multiple stacks. So we have three stacks. I think that means three different environments and templates. We have 69 parameters. We have 85 resources. And overall, it's about two thousand lines of code. So that's pretty hard to maintain at times.

How does it look in reality? So if you take a look what we actually do with it, so once you have a template, the aws you can in the aws console, you can just apply your template to a stack. So if you already have a stack you can choose to use the current template if you just want to update the parameters or you can replace the current template. That's usually done when you add new resources. That's taken from our production interface and from our production stack. You can adapt various parameters and if you're done with that and cloud formation will show you a preview of all the changes that you will do. What's also pretty cool about confirmation it auto detects how the requirements are between the individual services. So in this case we update our elasticsearch domain and this in turn updates to beanstock environment and this in turn applies update to all the route53 records. And that's all done by customers. You don't need to do that yourself. That cluod formation knows how the relationships are. That's pretty cool.

And then you just press the big scary button and then stuff happens to infrastructure and you hope it works out and it usually does. Yeah.

So that was a very quick, quick intro. So in summary, what do we like about cloud formation?

We like that we got rid of all our snowflake machines and snowflake applications. That's just everything that is needed to know about the infrastructure can be found either in the template and the cloud formation template in the docker file or an perimeters. And that also means it's very easy to replicate, so it's very easy to just make a new staging environment for example. And of course, the best thing, it's “free”. You still have to pay for all the resources that you consume, but other than that, it's free.

Let me just close with a few lessons we learned over the years while working with cloud formation. So one thing is really you should use tooling to create a template. So aws cdk is great. We use troposphere it's called because CDK did not exist yet, but it's really cumbersome and it's really easy to introduce buck if you're just normally editing a two thousand line yammer. So use some tooling for that. Also, it's very important you should have a staging environment. If you're doing stuff to infrastructure, you really want to be able to try it out before you apply it to the environment and all your customers are on. What's also a good thing that we found out over the years is sometimes it makes sense to do stuff manually. Like if you want to be in complete control of what you're doing, you can still do it manually. And then the drift detection, as I mentioned earlier, kicks in. So if I update the database version and then I update it in production and then later on I move the template to the next version cloud formation will just detect that it's already correct and not try to do anything stupid.

And the last thing is that's I guess, true for every powerful tool in the world: starting with cloud formation and ending with a circular saw. Some things look innocent, but a very, very dangerous. So remember the thing that we had before where we updated the elasticsearch domain. Well, here the last column is replacement conditional. Conditional means it might be replaced. So what does replacing mean? Replacing means it was wrong with the old one, including all the data and create a new and empty one. And because cloud foramtion is really tidy and a good citizen, it will also delete all the automated backups that it did because you are not gonna need those anymore. So if you don't really pay attention to this thing, you can wake up with a really, really bad day. So that's a word of warning for that. And last it's not behind the snippet, there is a way to actually take care of that so that this doesn't happen to you. It's called stack policies. That means can protect important resources from deletion replacement, which is I would very recommend to do. And that's the end of my talk. I hope you like it. I hope it was informative and see you around. Thanks.