Tech Talk: "North Star Direction" mit Gerald Vrana von Sportradar

Tech Talk: "North Star Direction" mit Gerald Vrana von Sportradar

Hey my name is Gery and I'm from Sportradar. I am from the Ads-Unit. We are building a selfmanagement platform in the sportsbetting-industry. And today I'm showing you the direction into an north Star or how to deliver the best value for your customer.

Let me start with some foundation so that you know what I'm talking about. The world is far from a perfect place. We all live in a VUCA world. VUCA is a term which was introduced by the US-army back in 1980. It stands for:

  • volatile – change is rapid and unpredictable in its nature;
  • uncertain – the present is unclear and the future is uncertain;
  • complex – many different, interrelated factors come into play, with the potential to cause chaos and confusion;
  • ambiguous – there is a lack of clarity or awareness about situations.

So what does this mean now? It basically describes that when you work in the tech-industry, nothing is clear – there can be a requirement change every second, there can be a change in your services – so nothing is set in stone. There can be and will be always a change in the system. So we need to prepare for that.

So what’s the North Star Direction now? It will not show the way but it will guide you. I think everyone had the situation where they stand in front of something and don't know if they should go left or right. North Star will help you to find the direction. Every person is doing around 2000 decisions per hour which means roughly one decision every two seconds. The North Star will not show you the way to go but it will show the direction you will have to go.

When I planned my trip to Linz I was not sure if I should go by car or by train or by motorcycle. I just know that it is around 2 hours to the west – that was the direction I knew to go. That is how North Star is working.

Ten guidelines to find the right direction:

First one – customer obsessed. Work backwards from customers. Non-functional requirements are top priority. So what does that mean? You should deliver the right value at the right time. There is no need for a feature that customers need now for a delivery in months – the customer will then don't need it anymore. What’s also important is availability and scalability. The customer has no value of a product which is unusable on high load. You should think about the availability and scalability, you should deal with high load of useraccess, you should build your application that it is stable.

Cross-functional. We should have cross-functional, self-sufficient teams. This means we should keep the end-to-end capacities in the team. From designer to architect to developer to QA. We should build the architecture to be self-sufficient. In the tech industry – I already had some years of experience – it was frustrating back then. For example, you had a problem on your production system. You had to file a ticket to your – I don't know – system administrator who keeps the access of the database. Then you need to wait, for example, one to two weeks that you're getting a database dump. This is really frustrating and it's not helping at all. So that's why we try to keep the the the knowledge in this team separate and should be cross-functional from A to B or A to C.

End to end ownership. Take responsibility of your work. You build it, you ship it, you support it. It only counts when the customer uses it. ‚Done‘ is not always done. From my experience what I also learned is that sometimes features are getting treated as done, but it's really done when it's on production and the user can use it.

Keeping the hostage. One of the hardest principles probably, in my opinion, but also the most important one. Just imagine you have your product service, you're getting attacked, your product service is down or you're getting kept as a hostage, which happened some years ago to another big company. And they had a problem that they had a ransomware attack. They had to, I think, buy out with roughly 10 million euros that they are getting their services back because they got kept as a hostage. What we try or what we are doing is, we got to GitLab, we change our AWS accounts, we deploy the new service, create a new production service, put in our database dumps again. So we are live again. We're just ignoring what happens. I mean, we're not really ignoring it, but we just put it by side, spin up a new service in the free time again and then we're investigating what happened. From that on, you have four rules, more or less, you should test this process regularly. I mean, it's nice that you have it in place, but when you don't know how to do it or if it really works, it's not really counting. You should keep the backups in a separate account. What happens if your AWS account gets attacked? Your database dumps are also getting attacked. And if you lose your backups, you will have a big problem. So that's why have a backup strategy and plan. Have at least one to two accounts separately where you can store your backups. Never hardcode AWS resources. For example, when you need to change something, it gets really hard because you have dependencies to every service. You need to be able to be flexible. You need to change an AWS account and spin up the instances again. And it rewires basically on demand. And the only thing you should basically be able to do or need to do is: go to GitLab, change your account data and speed up your services again. Of course, there will always be some manual work, like putting in database terms, testing the application. If everything works fine. But you should not be off for weeks. It should be a matter of hours or days.

Elimination of Reds. Take care of the tech debt and your quality. Track and resolve warnings frequently. This is a picture of SonarQube, which is a static code analysis tool. It helps you to find some problems or tech debt in your code. It's basically scanning your code and is saying, hey, your code is nice or you have problems with that. You can imagine or just think of a cardhouse – a bad foundation, which could happen with a bad codebase and you flip a card, what will happen - the whole house will fall apart. And this is a thing that we try to prevent. We should keep track of the warnings and are resolving tech debt frequently.

Small, frequent releases. We should follow, invest and do frequent pushes to production, this follows basically the scrum principle of our investors, like to write stories or schemata, write stories which will describe what is the outcome of a story or the user story. What will happen with that - we reduce the time from to-do to done. This is basically like the first story, there is no value of a feature when the customer can't use it right now. And also, it helps you to deploy as fast as often as possible. Just imagine you have 20 features which are going live in one push and you have a bug. It's really hard to find out which feature is causing this issue and when you deploy each feature sequence on its own, it's way easier to track down all the bugs and problems with this rollout. And also you have always the value for the customer at the end of the day.

Transparency dispels myth - this is basically about cost tracking or KPI monitoring. You should track how many transactions you have in your system, what's the cost of your transactions and other relevant KPIs. It helps you to drive down your costs to bring best value for the customer. Also, it helps you on scaling because when your product is getting cheaper, you can scale better because then you save money. Also, the customer on the end of the day is happier because you can drive down the costs for the customer. And I mean, money today is kind of important. And I think every customer out there would be happy to get the same service by lower cost for the same value.

Operational excellence. Don't let customers find bugs. We need to notice them earlier. You should always mention data quality and latency. You should keep track of invocations and errors happening in your application. And you should always get notified about errors and delays in your system because this could be an indicator that something is going on and something bad is going on, for example. What's also really important is to react on detection of anomalies - an anomaly is for example, something when you have like, I don't know, let's imagine you have one thousand requests on average per minute and suddenly you're getting up to 10k, 20k. Something is going on there. What happens? Are you getting DDoS’d or are some other things happening? You should always be aware of that. What's going on in your system, because you should need to find this issue earlier than your customer’s noticing it.

Continuous improvement - next month should be better than last month. This is a promise that we are doing to our own. Quality is our responsibility. So, when you're seeing a code piece, which is not maybe the best: boy-scout it! There is a guidance in coding, which is called the Boy Scout rule, where you can imagine like you're camping as a Boy Scout on a field and you're going there and in the end, you should leave it cleaner than it was before. When you came so that when you do like a little change in a code base, you can always do some minor refactorings and make it cleaner than it was before.

You should write testable and high-quality product or high quality code. This might sound like a unicorn right now. I know that it's not really feasible all the time because it's really hard. There's pressure all the time and so on. But we at Sportradar we have 30% per sprint on improving quality in our system. You can, for example, refactor tech debts, you can learn something new – there's always room for improvement and you can do 30% per sprint to reach good quality code and it's good, testable code. And also we should eliminate the toil. Basically, every process which is manual and is repetitive could be automated. For example, you're doing reporting. Your reporting takes one hour per average and you do this five times a week, which means you spend five hours on this process. If you could invest time in an ultimate process, which would which would probably take, I don't know, 20 hours for developing after four weeks, you would reach the break even point, and this feature would already save you some money. Basically eliminate repetitive processes. That's our improvement.

And my favorite rule is: if it's not tested, it's broken. Do not make assumptions that it works, verify it! You should have around 80% of test coverage. There is no hard rule, but you should strive for it. You should always make smoke tests and regression tests on every push as well, infrastructure tests, integration tests and system tests should be all there in place because if you don't test it and you can't verify it, then you can assume it's broken because you can never be sure that it's perfect.

Yeah. So here in the last slide, I summed it up again. So basically 10 rules:

  1. customer obsessed,
  2. cross-functional,
  3. end-to-end ownership,
  4. keep the hostage,
  5. elimination of reds,
  6. small frequent releases,
  7. transparency dispels myth,
  8. operational excellence,
  9. continuous improvements,
  10. if it’s not tested, it’s broken.

Thank you.