Image for post
Image for post
Photo by Windows on Unsplash

This is a continuation of my SRE Engagement Series Articles, check out Month 1, Month 2 and Month 3

After your first month with the team, you learned how to work with a new system, you understand the bottlenecks and areas of concern from the team. Your second month you implemented observability tooling and taken the initiative to solve some common issues that can be exposed by review of that observability data. The next month using tracking systems and alerts to remediate failures has helped you repair the system. …


Image for post
Image for post
Photo by Matthew Henry on Unsplash

You’re staring at an email from your boss, it says:

Hello engineering team, our customers have seen another outage of the service today. I’ve been informed, if we have another outage our biggest customer will be moving to another service. We can’t afford to lose this account, I’m asking you to find and solve the issues you are seeing with the system.

If you’ve ever seen an email like this you know one thing, you are exhausted and the whole world feels like it’s falling on top of you. …


Image for post
Image for post
Photo by Clément H on Unsplash

Source Code Management Options

After creating your first application, you will want to store this somewhere secure, where in the event you are working with someone else they can review the code and work with it as well. This is where working with a source code management server will come in handy, some of these servers run a tool called “git” which is used to handle tracking the history of your source code.

There are few hosted options I’ve used:
https://github.com/ Very common, has a number of nice features

https://gitlab.com/Similar to gihtub SCM solution, has a lot of newer features compared…


Image for post
Image for post
Photo by Tim Gouw on Unsplash

This is a continuation of my SRE Engagement Series Articles, check out Month 1 and Month 2

After your first month with the team you learned how to work with a new system, you understand the bottlenecks and areas of concern from the team. You’ve implemented observability tooling and taken the initiative to solve some common issues that can be exposed by review of that observability data. Now you need to start making sure things are reliably up and running as intended for consumers of the service, through alerts and on-call rotations.

Alerts

One of the first things you’ll start to…


Image for post
Image for post
Photo by Luke Chesser on Unsplash

This is a continuation of my SRE Engagement Series Article

Given that you’ve been on a team for at least a month, you should have a basic idea about the team, their skill levels, and what their main issues are. Many teams especially those in need of an SRE team have a major issue with instability and outages. We helped identify tasks in our last month to help the team, now we’re going to validate where those issues are.

Observability

The Queen of Observability and head of https://www.honeycomb.io/

There are two ways to deal with identification of issues within the system…


Our story opens to the sound of a small town in the north western half of the country of Merasoft. The small town of Perse on the edge of the coast to the south of the larger town of CeeXpee, where a group of adventures have spent their lives.

Image for post
Image for post
The Town of Perse

Ben the Brave — the Wizard of Perse

John the Jolly — The Bard of Perse

Jessy the Jamin — The Warlock of Perse

Bill the Bold — The Mighty Druid of Perse

Mauro the Slayer— The Great half Orc Warrior of Perse

Martin the Masterful — The Merry Monk of…


Image for post
Image for post
Photo by Barn Images on Unsplash

I’ve met a lot of engineers who have a list of things they’d like to work on, they start working on them and within a few days they don’t seem to finish anything. They end up running into roadblocks or getting discouraged due to the number of tasks these things will take. Well, this just means I want to help them get over these issues and enjoy the work they’re setting out to make. So of course, the first thing all of these engineers need it a true software build-validation-deployment setup.

Of course you can do all of this locally…


Image for post
Image for post
Photo by Austin Distel on Unsplash

It was a few years ago that I was working with another engineer who told me something that made me feel pretty terrible. I was talking about a conference that I was excited about and how I was looking into going, it was about a certain tool a lot of folks are using in the Container space. When I told them about it, they said:

“That sounds like a glorified plumbers convention, a whole bunch of people talking about putting pipes together instead of software.” — Ex Coworker

It got me thinking about how on earth they thought that was…


So, you’ve got your first server from awesome cloud provider, you’re looking at installing software on the server though and you’re interested in how best to do that.

Well that’s where “frying” and “baking” will come in handy. Which is an example of two different methods for installing software on the server you have just gotten access to.

Image for post
Image for post
Photo by freestocks on Unsplash

Frying

We’ll start with the first method, “Frying” which is an example based on how you would cook a meal for yourself. You start to heat the pan and as you have the heat turned on you start throwing things in the pan…


Image for post
Image for post
Photo by Taylor Vick on Unsplash

I work as a Site Reliability Engineer, which means we have different “Engagements”, working closely with a team for a period of time, throughout our career and need to quickly ramp up to different teams quickly. My process for learning and connecting with teams is something I’ve grown over time and I believe it’s worth sharing.

Investigation:

At the start of an engagement you will find yourself walking through the door of your first meeting to a completely new system you have no background in. Even if you understand the language and tools that the software is written in, it will…

John Stupka

Coffee Drinker, Chief Engineer of Dancing at Desk, Foodie, HyperText Transfer Operator, Overly energetic and active! 💻 ☕

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store