Most talks about wellness focus on fortifying the people, but that's only part of the story: you need to improve your sources of toil, or else you're only paying attention to the symptoms. This talk breaks down the science behind stress and burnout, and how you can apply that to create an on-call process that supports your team's wellness.
Jaime began his career as a molecular biologist before following his passion for communications, working at DigitalOcean, Riot Games, and Shopify, where he launched the engineering communications function. He co-founded Incident Labs: check out Ovvy Insights, which helps provide teams the right data to improve incident response and return hours for planned work. He has spent two years learning about mental health and mindfulness. He is also an avid lover of dumplings.
Head of Reliability
Holly Allen is the head of reliability at Slack, with SRE, Monitoring, and Resilience Engineering in her portfolio. She is tireless in her efforts to make Slack the software reliable and scalable, and Slack the company a delightful place to work. Prior to Slack Holly worked at startups, DreamWorks Animation, and was Director of Engineering at 18F, a civic tech startup in the US government.
Site Reliability Engineer
Lex Neva is interested in all things related to running large, massively multiuser online services. He has years of Systems Engineering, tinkering, and troubleshooting experience and perhaps loves incident response more than he ought to. He’s previously worked for Linden Lab, DeviantArt, and Heroku and currently works as an SRE at Fastly helping to make sure the Internet keeps running.
VP of Technology Operations
Tony is a 25-year Internet industry veteran who has served in various Network Engineering and Operation leadership roles, including Google and DoubleClick. Tony spearheads the management and operations of all Catchpoint monitoring data centers, supporting Catchpoint’s expanding corporate strategy, delivering stable, secure, and reliable operations.
Site Reliability Engineer
Maira is an Application Engineer at Autodesk, based in Novi Michigan. She is obsessed with learning, but especially with the learning process that accompanies on-boarding monitoring concepts for better site/service Performance and Availability. She has dedicated her past years to site reliability, working with different Synthetic and RUM monitoring tools.
Good morning, everyone. Good afternoon, good evening. Thank you so much for joining. I love these Zoom calls because there's such realness to them. When we're at a conference, it can be so easy to get a little bit trapped in the “this is professional, this is being adult time”. And then there's something really nice just about some of this, feeling, "You know what? I'm not actually in Arizona with the sunset. I'm at home. I'm working off my laptop right now."
I'm so excited to kick off this amazing lineup, especially talking about something that I love so much, which is discussing wellness. And so in this talk, I hope to give you some practical information, some research that I've been doing over the last little while that really has transformed the way, not just how I approach work, but also how I approach my life.
So, let's get started with some story time. A little while ago, I became a manager for the very first time. And what that meant in terms of official capacity, is that I had manager in my name. I obviously worked with people before and led some projects but having manager in my actual title just changed things for me. I felt low key stressed out because I felt responsible to these people who would then report into me, and I felt a responsibility to do a good job in terms of making sure that they did well.
So, I took my journalism background and I did what I knew best. I just asked everyone that I knew in my life who was a manager, "Well, what did you do?" I follow the idea that you should always surround yourself with people who are smarter than you. It's a fantastic way to live because then you're always constantly learning and you're constantly growing. I'm so lucky that I was surrounded by people who figured this stuff out because I did not know half the stuff that I should have known. So I asked them, "How can I be a successful manager?" I got advice around listening and how to break down projects and how to manage stakeholders, which was all very important.
And there's a piece of advice that really just stuck with me so vividly, that when people ask me this question, it's the piece of advice that I share, there's nothing else to share. And it's as a manager, your role is to reduce ambiguity and remove roadblocks. This was huge for me because it didn't really seem like it was my idea of what a manager was supposed to do. I mean, if you think about the TV or the movies that we've seen, a manager barks orders and rallies the troops and tells people what to do. This seemed more of like it was a supportive role. Okay, help people understand what's going on.
This became really interesting to me because when I broke it down in my brain, is how people understand the direction and then remove whatever might slow them down, it was like, "Oh right, this is what people should be doing if you're going to be leading a group of people." Help them know what's going on, but it is really more of a coaching function. Right? A coach can tell people what to do, what the team should be, where they're going and pinpointing things that could help, but once they're on the field, you can't really do much. You can't go on there and actually play the game for people. This was really transformative for me to understand what a successful manager does.
So you might be asking, okay, wait, what does this have to do with wellness? Why are we talking about this when the talk is about team wellness? Well, I think that we end up transferring that idea of what can people individually do, when we should be thinking about structural factors. The same way that I thought a good manager was just someone who just micromanaged, maybe just barked orders. Well, that's not helpful at all. Instead you have to think of the conditions that people are working under, what are their incentives, what is helping them achieve their goals, what's blocking them. That's how we have to think about wellness, and that's really what my talk is all about.
Now, I know I started this talk, obviously a lot of manager talk, but who's this talk for? So yes, if you're leading a team, you may be curious about how to implement more wellness strategies, this talk's totally for you. If you're on a team, this matters. If you're an individual contributor though, I'm going to give you some science that is going to be super helpful. You're going to find that this ... For me, it unlocked so much of my own understanding around how I deal with stress, my own ideas of burnout, how I started to change my life to mitigate some of this. So this is for everyone, honestly.
What do I mean by wellness? Well, I mean, how do we think about work-related mood, satisfaction, stress and burnout? And why do I have an asterisks there? Well, it's because we're living in a very special time right now. The Verge writer, Casey Newton actually tweeted recently something that he had read, is that, "We're not working from home. We're actually living at work." And that distinction, even though small, changes everything, right? We are in the same space that we're familiar with, and yet the context in which we're living in it is wildly different, and we're trying to make understanding of that in this new time.
So who am? I a little bit more about myself. So I've been researching this space now for three years. I've been so interested in this. As Peter mentioned, I studied biomedical engineering as my major. I was a molecular biologist for a little while. And why I bring that up is mainly because I love digging into academic journals. I know, right? This is such a great dinner party anecdote, "Hey, guess what academic journal I just read recently?" But I really do love it. So with this information, I really want to make sure that it wasn't pop science, that it wasn't just this one maybe trend or something, and that it hadn't been rigorously tested. Because even though that can be inspirational, it doesn't always give us as impactful a strategy as we'd like.
I'm also an advocate for mindfulness and mental health. Those things are just really important to me. I think that, especially during this time, we have to understand how we perceive the world and how we can create healthy strategies for ourselves. And lastly, I'm not a doctor. Okay? So even though this talk, there's some science in there, there's some studies in here, I'm not a doctor. I want to very much make that clear and have that caveat going into our talk.
All right. So, stress? In an SRE's life. No. Right? You wouldn't be awake either at 9:00 in the morning or tuning in at 7:00 PM to a topic on team wellness if stress wasn't part of everyone's life. So interestingly, there's a Catchpoint 2018 SRE report and it looked at incident responders and level-
PART 1 OF 4 ENDS [00:08:04]
... and it looked at incident responders and level of stress. And it's really neat, because that was actually inspired by a talk I gave at SREcon before, where I did something on a smaller scale. So it was very cool to be able to see this with a larger number of participants. And what we found was that incidents affect responders. So eight out of 10 survey respondents felt stressed after incidents. I think that that is something that might come off as a bit intuitive. However, seven out of 10 of them felt moderate or high stress. That is pretty substantial. That is maybe surprising, actually, for some of you, that this is more than you might expect.
What does this mean? When we are looking at the impact of stress, we want to kind of look at how it affects behaviors. And half, over half of people said it affected their mood. Just under half that affected their concentration. Again, if you think about this really wild, major incident we're all living in right now, in the beginning, we're like, "Oh, I'm going to learn a new language. I'm going to finish all those projects that I've been having at home." We had all these ideas of what we could do during this time. And if you've done any of them, kudos to you. I have not. [inaudible 00:09:19] even learned three words of a new language. But our concentration has been deeply affected after an incident.
Four out of 10 people said affected their sleep. They could not sleep as well. If you think about it, then, this is starting to create a bit of a vicious cycle. If you can't sleep well, that's going to affect you on the next day and the next day. And four out of 10 people said it affected their ability to be social. And that really matters, because social connectivity is a huge part of how we kind of maintain a sense of self. It's how we maintain a sense of health. It is really important.
And why does this matter? Well, personal wellbeing significantly predicted not only contemporaneous employee performance, so current employee performance, but also subsequent supervisory performance rating several years in the future. So this matters because how people are doing, how teams are doing, don't just have an immediate effect. That effect extends into the longterm. And that's something that we have to think about, and why this topic is so important.
All right. So now here's the fun part. I don't know if it's fun. I find this fun. So during the research, there's actually a recipe for stress. And that was really neat to me, because we use stress so colloquially. "Oh, I'm feeling stressed out." But we don't always break it down into why we're feeling that stress. And so researchers at the CSHS were actually able to break it down to four ingredients. Now, these ingredients may not hit you all the same, all right? So my ingredients, the triggers for me may be different from, let's say, Peter's triggers. And how they affect us can be different. And when we're feeling stressed out, it can be actually one or more of these in combination.
So I'm going ... So novelty, something new you have not experienced before. So this happens to us even in a relatively safe space. Let's say you're given a meal consisting of something you've never eaten before. That newness can kind of be a little bit stressful, depending on the context of what it is. We're constantly trying to process what experience means to us, and something new means all of a sudden there's a lot of uncertainty around that.
Unpredictability. So something you had no way of knowing would occur before. So that, again, it's like, it could be not necessarily new, but if you didn't expect it to happen, that's very stressful for us. We have a mental model of how the world works.
Threat to the ego. So this happens, for example, when you are presenting. Not only if you're new to presenting is that stressful, or maybe something unpredictable happens, but you might wonder, "Okay, how are people thinking about my talk right now?" Or social media. Social media is huge for this, where give any opinion, and someone will happily not only share their own opinion, but also 72 different ways that your opinion is not right. And you're like, "All I really wanted to do was just talk about this flavor of ice cream. You need to settle down." But that can make you feel like something about yourself, your competence is being questioned.
Lastly, sense of control. Human beings really need a sense of control. And especially during this pandemic, when we don't have the sense of control that we're used to, it is very stressful for us. And we usually, as coping mechanisms, try to find ways to create a sense of control.
So this is, next time that you kind of have a stressful moment, or you can think about your last restful moment, you can actually break down that moment into these four ingredients and say, "Which of these actually are kind of affecting me right now?" And it may not always be the same ones each time, but it's really neat to kind of have an idea of, "Oh, right, this is really new to me. That's why I'm stressed out."
Now, when we normally talk about de-stressing, these are the kinds of things we talk about. "Okay, well, I'm going to go on vacation." This is why I've got Arizona behind me. I love Tucson and the climbing there. It's very relaxing for me. Mindfulness and meditation. We talk about exercise and controlled breathing. In fact, everyone just take a deep breath right now. Just breathe in, have that fresh air, breathe out. Breathing really does tell our bodies how we're doing. When we have shallow breaths, it kind of signals to our body that we're in a stress mode. Sleeping better. Sleep is so important. Therapy.
Now, all these are very ... They have science behind them. There are studies behind them. They are really important techniques to use. However ... Oh, actually, I forgot, because it's not on my slide deck, but also listening to Beyonce. So I find that there's less studies, unfortunately, on the efficacy of Beyonce on stress levels. However, I still think that anecdotally, it's very helpful. Pets. This one is actually a real one. People do say pets are very helpful, and this is my dog Taco, and I'm happy to always, anytime I can, share a photo of Taco. Because just looking at his goofy face makes me smile so much.
Now, when we think of these, though, the only thing about it is we tend to think of them in isolation. We think of these isolation, or we think of this in isolation. We think about the individual factors. And then we get, unfortunately, the wrong message, which is, "Just be stronger." I hear this a lot. Whenever I talk about wellness, someone will always ask, "Well, if you can't handle the stress of the job, should you do this job? Maybe you're not meant to be an SRE. And that to me is kind of being like, "Well, you know what? If you want to be a firefighter, don't have a suit. Just go and see how hot, how long you can stand by the fire. And whoever can stand the longest, you should be a firefighter. Everyone else, not for you."
Now, obviously that's a bit facetious, but it's this idea that of course we build a better suit. We give them understanding. We train firefighters. We help them understand the conditions they're going into and then support them through it so they can do their roles. Well, that would be obviously the same in SRE. We can't just say, "Well, if people aren't strong enough, then they should go." That's not a great answer, especially because research has shown that many ... Unfortunately, it's a thing that people get very stuck to. Many workplaces have opted for attempting to enhance their workers' resilience-
PART 2 OF 4 ENDS [00:16:04]
Have opted for attempting to enhance their workers resilience rather than modifying risk factors. I can't see any of you all but I imagine some of you right now are going, "Yeah, that's right." I've been asked to, here are some free yoga classes, de stress. And you're like, do you know what would be really great? Not being assigned 90 hours worth of work in a 50 to 60 hour work week. The yoga classes are fantastic. I'll still do a downward dog and enjoy that namaste, but it'd be really nice if I just wasn't overworked.
What also that means is that unfortunately research shows that situational and organizational factors play more of a role in the workplace than individual ones. So when we're thinking about how to help people, we tend to think of the individual though. Because ideas have individual causality and responsibility, in addition to the assumption, that it's easier and cheaper to change people rather than organizations.
And what's really powerful about this is understanding that actually the environmental, the structural factors, are the ones that matter. We have not only research that's saying this. These are from the researchers who really popularized the idea of burnout. And these are their findings. But also this helps to kind of unpack some of the thinking that we have. Why do we kind of say, okay, individually, you need to get stronger?
We really do believe in this idea of individual causality and responsibility. That if you're so stressed out, maybe there's something about you that made you so stressed out and maybe it's cheaper and easier to change you then, well, the organization is so big. The research says differently. If we actually change the situation or organizational factors, that will do a lot more for people than asking them to get stronger.
What are some of these organizational factors? This is a very, there's a lot of information on this slide. I can upload it later to Slack. You can take a screenshot of it. I'm not going to go through all of it, but these are some of the organizational and structural things that you can look at.
And what's interesting, when you go back to this later, is think about how these relate to your ingredients to stress. Okay. So if you're working atypical working hours, maybe that's unexpected. You don't really know how to expect what hours you have. You don't have a sense of control then. And who works typical working hours? Well, if you're fighting incidents, your work hours are all over the place.
Things like job demands, imbalanced job design is really about not being able to understand or be able to see what kind of role you have. Occupational uncertainty is this idea of unpredictability. And not feeling secure in it, no sense of control. If we're looking at those ingredients of stress. Lack of value and respect in the workplace is the threat to the ego. So what's really cool is we can take these ideas and map them together to have a better understanding of how this impacts people.
Now, what does burnout look like? Now, many of you probably will recognize this already. And for those of you who haven't, that's fantastic. Because it is difficult to go through burnout. There's exhaustion, there's cynicism, there's chronic negative responses. There's ineffectiveness. And what's really insidious about burnout is that it is a vicious cycle. If I have chronic negative responses to things and I'm ineffective in my role, I'm going to feel even less capable to continue doing what I'm doing, and that can exacerbate the situation. This is something that we have to think about really carefully, because it begets itself.
What are some common factors for burnout? And again, you're going to see when talking about these, they're going to overlap with your ingredients for stress. Work overload, your job demands exceed human limits. Lack of control, the inability to influence decisions that affect your job. Insufficient rewards. So insufficient financial, institution, or social rewards.
Break down of community. Okay. If you have an unsupportive workplace environment, that's very difficult. Absence of fairness. We are very social creatures and for a social contract to work, we need a certain assumption of fairness. Lack of fairness in decision making processes really affect us and stress us out. And value conflicts, mismatch and organizational values and the individual's values.
Understanding those factors were good, but recently my friend, Denise Yu, was writing an essay. She's contributing to a collection of essays that me and my co-founder are co-curating on SRE. And in her essay, she wrote this line that gave me an aha moment. Burnout is in part caused by unresolved feedback loops.
And this was based on the book, Burnout. And there's a great XOXO talk, which I will share again in the Slack afterwards. But this kind of blew me away. It was really revelatory for me because if I think about the times that I felt burnout, and then emotional exhaustion, it's usually because the signal in the feedback loop either is too delayed. So if you try to work on something where the payoff is in two to three years, for example, and you can't see immediately where that's going, it's very unnerving, and exhausting. Because you can't tell if you're going in the right direction.
Or, the feedback loop is too weak. If that signal is too weak, then again, we're very unsure about what's going on and. And see, if you go back to some of these common factors for burnout, you even think of these in feedback loops. And we can think about how good a feedback loop is, and whether or not then that contributes to burnout.
And interestingly, I always think of feedback loops as a huge part of SRE. With our SLOs and SLIs and monitoring, we're constantly trying to have these feedback loops. Another thought that came to mind were the words that, it was the advice I got on how to be a successful manager. Reduce ambiguity and remove roadblocks.
If we think of burnout as unresolved feedback loops, if we reduce ambiguity, then we're helping to tighten that feedback loop. When we remove roadblocks, it means that people can then put in effort and see the return on it. Again, strengthening feedback loops. This is why I had this as the start of my presentation, was this understanding that part of our, as a team, even if you're a manager, an IC, or you're just you're even if you're not directly with the team, I think should be our goal. We should be reducing ambiguity for each other and removing those roadblocks because we get a chance then to tighten feedback loops, to help reduce things like stress and burnout.
Now as SREs, what can we, how can we think of this in an SRE approach? First of all, monitoring. Okay? I know all of you monitor your systems relentlessly. You monitor them to the point where you've got way too many alerts. You have monitors on everything. You're looking at everything, you've got your golden signals. How often are you thinking about your people though? How often are you monitoring people? Not in the sense of surveying them, but just asking for feedback. Once a year, imagine if you only got data on your system once a year, that would be inconceivable. Even once a quarter is not enough.
I'm not saying you need daily updates from people, but we need to think of tightening those feedback loops and doing a better job, especially when-
PART 3 OF 4 ENDS [00:24:04]
... think of tightening those feedback loops and doing a better job, especially when we're putting people in stressful situations. If we think back to just how stressful an incident is, we need to understand our feedback loops and do monitoring based on that cadence of incidents.
Let's think about toil. Toil, if you're unfamiliar, is a repetitive, predictable, constant stream of tasks related to maintaining a service. That's from the SRE Workbook, and the most pragmatic sense, which I'm sure all of us understand right now, is doing the dishes, right? We're all at home and if you want to make more courses, then you're going to get more dishes. If you want to have more people over, if you're able to expand your bubble right now, you're going to have more dishes.
Unfortunately, you can't really find a way to be more effective with your dishes. I mean, you can. You can all decide to share off of one plate if you'd like, if you really don't like doing dishes. But generally that's going to scale with the number of courses that you make and the number of people that you have over. We found ways to automate it, you can definitely put a dishwasher in. But it's work that all it does is it brings the system back to normal. You need to do the dishes so that you can have a meal again next time so you can serve on something. That's toil.
What's interesting is Kurt Andersen from LinkedIn did a toil survey at LinkedIn and was able to try to understand what level of toil people are in. From the Google book, we've heard that toil is bad because it leads to stress and burnout. It's work that people don't necessarily want to do. People want to work on product work. They want to work on new things. Maintenance work is not for everyone. It can be very stressful. So why not put out a survey? It doesn't have to be very, very intricate. Just have people estimate it, it starts a conversation. It has people thinking intentionally about where their work is going, which I think is very important.
This is from a LISA 2018 talk, so the link is at the bottom. You can look it up, and then you can hear how it worked at LinkedIn.
Let's go back to the recipe for stress. Remember the four parts are novelty, unpredictability, threat to ego and sense of control. What you can actually do is ask for feedback. Remember how at the beginning of the presentation I said any time... Well, the next time you have a stressful situation, try to break it down to the four things and say, "Which ingredients is this?"
Now from there, you can actually try to then, once you know which ingredients are involved, you can start trying to think of action items to fix that. For example, if an incident feels too novel, if everything feels too new in the process, maybe you should implement game days so that instance don't feel as new, right?
Maybe people feel a lack of control, or they're worried about how they're going to be perceived during an incident. Let's say it's 3:00 AM. You're tackling an incident and you don't know who to page or whether or not it's okay at 3:00 AM to wake someone up. That's very stressful because you worry "If I get this wrong, I'm going to be perceived lesser because I made a mistake. I woke someone up who didn't have to be woken up. And I don't know if I'm able to do this." Well, if you can see that as a stress, you then go, "Okay, we need to put in, in our runbooks, we need to do a better job of letting people know who can be paged when." Don't make it as much of a gut feeling. Be able to find ways to reduce those ingredients for stress.
I'll hold this up for another second in case you want to screenshot this as well. This is really interesting.
Then let's not forget that this goes both ways, all right? We should also be looking at happiness, positive wellbeing and job satisfaction. What are the structural factors contributing there, and how can we then make more of that happen? This is 25 minutes. I love being able to have 25 minutes to talk about it. I could talk about this for hours. There's so much more to talk about. There's so much stuff around more strategies, more ideas around wellness, but I wanted to give a taste of all this.
The answer is, though, yes, you can improve your team's wellbeing. Your wellbeing and mental health matter. We can figure out how to intentionally create positive effects, and let's build systems that include humans. We do so much work around monitoring our systems and caring about the health of the systems, and we tend to think of the people as just there to service the systems and that people are either... They don't get affected by it, or they can be interchangeable around this. We work with sociotechnical systems, which means we have to think about how the people are involved. That's what this is. This is actually part of our responsibility.
As I mentioned, I love talking about this stuff. If you are an SRE manager, my cofounder and I are so curious about how you approach your work. We have been talking to a lot of managers and kind of understanding how they break down the work and how they see it, and we're always interested to hear how people approach how they do things. So please find me on Slack if you are interested in nerding out about this stuff or email me at email@example.com. Would love to be able to chat more about this stuff.
Thank you. You can say hello to me at firstname.lastname@example.org. For more Beyonce and Taco, you can find me on Twitter at Jaime Woo and... Oh. And then lastly, I was just going to share, there's a zine that my cofounder and I do all the post-incident review. You can find it at zine.instantlabs.io, and Denise Yu does lovely illustrations there. All right. Thank you so much.