IT Operations Retrospectives

I was reading Quality Matters: The Benefits of QA-Focused Retros which got me thinking about this style of activity in the context of IT operations.

It feels like I’ve been seeing lack of ‘retrospective’ activities in this context in organisations I have worked at (both as consultant and employee) for years a very long time indeed.

Probably before I became aware of the term ‘retrospectives’ I would have characterised this activity as a ticket review, a look back through service tickets, incident reports etc. to look for patterns of failures or recurring failures of the same type, or different but similar, i.e. issues with a shared or overlapping root-cause, the idea being, lets learn from what went wrong. Not exactly rocket-science you might say.

Without diving too much into the ‘why are retrospectives MIA’, it’s relatively easy to see how small teams, who are frequently firefighting fail to see the value in looking back. It can be very time-consuming if you’ve fallen into the firefight cycle with its often accompanying lack of detailed reporting around what you’ve fixed and why it broke.

A different take on the Agile retrospective

It’s widely held that the idea of ‘retrospectives’ comes from the Agile world, in Operations whilst we can, and do work closely alongside Agile development, we need to consider how an Ops-focused retrospective may have a different set of outcomes.

It can be hard to see how we can adopt an agile sprint retrospective for operations use when we think about ‘stop-start-continue’ nature of the Agile sprint, i.e. where each team member involved considers what the team should stop, start or continue doing, because our needs in operations are somewhat different.

There is a pretty straightforward answer to that point, which is to do something different. The outcome of an operations focussed retrospective should rather be aimed at steering how we operate our systems in the future, how we plan and execute preventative work, and how we create feedback from our experiences running platforms.

So What is the benefit?

Our operations retrospectives have considerable value in being used to improve the feedback cycle to development and other stakeholders. Because Operations tend to be at the sharp end of system failures (both customer visible and responsible for the break-fix activity), operations teams are in an ideal position to do some analysis and thinking about the nature of problem which are being fixed.

Examples what could be incorporated into feedback are things such as lower human interaction with deployments, use of version checks, etc.

Ops teams should push for prioritisation of testing and have good practices around log management.

Of course, a lot of this is predicated on operations teams having enabling information and subsystems like version control (git, svn, mercurial et al), centralised (aggregated) logging etc in place.

But, even if we only simple have a ticket management system, a commonplace tool for operations, we should still be looking at our records and checking for patterns of failures, making the most of what we have will enable us to make better arguments for enhancing tooling and techniques which power retrospectives.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s