• Home
  • About 404TS
  • Contact

404 Tech Support

Where IT Help is Found

  • Articles
    • Code
    • Entertainment
    • Going Green
    • Hardware, Gadgets, and Products
    • Management
    • Network
    • News
    • Operating Systems
    • Security and Privacy
    • Software
    • System Administration
    • Talking Points
    • Tech Solutions
    • Web
    • Webmaster
  • Reviews
  • Media
    • Infographics
    • Videos
  • Tech Events
  • Tools
    • How do I find my IP address?
    • Browser and plugin tests
  • Get a Technical Consultation
You are here: Home / Articles / Management / Handling incidents, not blame

Handling incidents, not blame

2015-04-30 by Jason

There’s a great article from 2012 titled Blameless PostMortems and a Just Culture. It was written by John Allspaw, a SVP of Technical Operations at Etsy. The article tackles the topic of handling errors and incidents after they have happened. It’s a great article and I recommend that everybody in IT read it because it has the power to change your team culture for the better.

The premise of the article states that failure happens and it’s going to happen with complex systems. Humans will also make mistakes and you can’t just replace the mistake-making humans with the latest model of non-mistake-making humans, the traditional “Bad Apple Theory”.

Instead of focusing on who’s to blame for causing an outage, the focus should instead be on learning from the mistake and preventing it in the future. To accomplish this, Allspaw uses blameless postmortems. This establishes a Just Culture, balancing safety of the systems with accountability. If an engineer is fearing punishment or retribution for trying to do their job, they are less likely to provide all of the details needed to learn what truly happened in the situation.

With a blameless postmortem, the management team is trying to ascertain a timeline of events, the actions taken, results of those actions, the engineer’s expectations of those actions, and any assumptions made in reaching these decisions. Getting the full picture of the incident will help discover if there was a fault in the logic, bad information, or the system reacting differently than documented.

Firing the person that has made a mistake and learned from the event is doing the organization a disservice to replace them with an engineer that has not made the mistake nor learned from the event. Only by understanding the individual, technical, or organizational reasoning behind decisions that lead to problems, can it be expected to fix the true cause of these outages.

 

A culture of “name-and-blame” along with “cover your ass” does not build teamwork and leaves management unable to manage. The person that has learned from an outage is unlikely to repeat it and should be trumpeting the solution, a better method, or correcting others’ logic to prevent that same situation from arising.

One option is to assume the single cause is incompetence and scream at engineers to make them “pay attention!” or “be more careful!”
Another option is to take a hard look at how the accident actually happened, treat the engineers involved with respect, and learn from the event.

Along with the Just Culture, blameless postmortems, and this article, Etsy also created Morgue on github as a software component to hold incident postmortem details.

blame

Filed Under: Management

Trending

  • Gmail will block .js attachments
    In Security and Privacy, Web
  • AV-Test announces 2016 Endpoint Protection awards; Gartner updates antivirus Magic Quadrant
    In Featured, Security and Privacy, Software
  • Jeff Bezos’ journey
    In Infographics

Latest Media Posts

Find Out Where To Download SNES ROMs

Find Out Where To Download SNES ROMs

Multifunctional Video Conversion Tools – Wondershare Video Converter

Multifunctional Video Conversion Tools – Wondershare Video Converter

  • Popular
  • Latest
  • Today Week Month All
  • How to ‘Unblock’ multiple files at a time with PowerShell How to 'Unblock' multiple files at a time with PowerShell
  • Command line to take ownership and change permissions Command line to take ownership and change permissions
  • Increase IIS Private Memory Limit to improve WSUS availability Increase IIS Private Memory Limit to improve WSUS availability
  • Creating and editing views in phpMyAdmin Creating and editing views in phpMyAdmin
  • Configure Outlook to recurring appointments for the last weekday of the month Configure Outlook to recurring appointments for the last weekday of the month
  • How to Purchase Cryptocurrencies? How to Purchase Cryptocurrencies?
  • Top 6 necessary aspects to consider when hiring Angular developers Top 6 necessary aspects to consider when hiring Angular developers
  • Full guide on drawbacks and benefits of Node.js for making the perfect choice for your business Full guide on drawbacks and benefits of Node.js for making the perfect choice for your business
  • Benefits of End-To-End Testing That Will Match Company Expectations Benefits of End-To-End Testing That Will Match Company Expectations
  • 3 Key Features of Pets Health Monitoring Systems 3 Key Features of Pets Health Monitoring Systems
Ajax spinner

Elevator Pitch

404 Tech Support documents solutions to IT problems, shares worthwhile software and websites, and reviews hardware, consumer electronics, and technology-related books.

Subscribe to 404TS articles by email.

Recent Posts

  • How to Purchase Cryptocurrencies?
  • Top 6 necessary aspects to consider when hiring Angular developers
  • Full guide on drawbacks and benefits of Node.js for making the perfect choice for your business

Search

FTC Disclaimer

404TechSupport is an Amazon.com affiliate; when you click on an Amazon link from 404TS, the site gets a cut of the proceeds from whatever you buy. This site also uses Skimlinks for smart monetization of other affiliate links.
Use of this site requires displaying and viewing ads as they are presented.

Copyright © 2023 · Magazine Pro Theme on Genesis Framework · WordPress · Log in