Thoughts from Pipelineconf 2017

Whilst attending Pipeline Conference I noted down things which I found to give food for thought. To me this is the acid test of a talk – does it create a spark of thinking? Automation and Autonomy Dan North (@tastapod) gave the opening keynote “Ops and Operability“. Dan talked about both Ops and Dev in…

via Pipeline (the Continuous Delivery) Conf 2017: Accountability, Socrates, Multicasting and Serverless Security — Blog – Skelton Thatcher Consulting

Work In Progress Limits

WIP Limits AKA [the] Context Switch overhead [problem]

The idea of suggesting that we do less is often a tricky one to get our head around, from a business perspective when we are under pressure to get things done, saying “no, this item has to go in the backlog” is not always easy, this is especially prevalent where we are firefighting.

So, why do I say “AKA context switch overhead”?

Digging in. Before I knew about ‘WIP limits’ as a thing, I used to explain this idea that increasing the number of things we were doing at once would lead to loss of effectiveness at delivery. My audience was largely Unix systems admin and network engineers so it was a bit technical, it went something along these lines..

“Think of a time slice scheduler in operating system terms, in our OS it has one job; to divide fixed CPU resources amongst the processes demanding CPU time on your machine. Every time the scheduler switches the CPU to work on a different process there is a fixed small amount of time required to change ‘context’ to the new process (think registers, memory, paging etc)..”

Often the penny would drop at this point in the conversation, people would begin to see that the overall throughput would be lower because the ‘end to end’ or ‘cycle’ processing time for any given process would be longer due to more time spent context switching and less on execution.

The effect of the number of processes on the cycle time

We can show the time scheduling of our imaginary computer mathematically (I hope) using the information below to produce a number which represents this ‘execution time window for processing’ or ‘cycle time’:

Execution time window : e
Available time: T
Number of processes: n
Delay for context switch: d

Using the formula ‘e = (T / n) – d‘ we can see the amount of time a process gets on the cpu for execution.

Lets plug some numbers into this equation to provide a simple example.

e = (T / n) – d

e = (10 / 10) – 0.1 = 0.9 seconds
e = (10 / 40) – 0.1 = 0.15 seconds
e = (10 / 80) – 0.1 = 0.025 seconds
.
e = (10 / 100) – 0.1 = 0 seconds !

As we can see our window of execution get smaller and smaller until it becomes zero, the way around this in our OS would be to introduce a priority scheme, which ranks processes in importance of execution (think of the Unix ‘nice’ command) such that the core system will always keep functioning, the side effect is that we will end up with some processes which *may* never get any CPU time. This happens to be a method we can readily integrate into our work planning, i.e. task and project prioritisation.

The effect of number of processes on throughput

Another part of the exercise is to look at the effect of increasing the number n , of processes we are running on the total available ‘processing time’, i.e. the time in which processes are executing on the CPU.

Using formula ‘processing time = T – (d * n)‘ we can see the impact on total available CPU time for execution

10 – ( 0.1 * 10 ) = 9 seconds
10 – ( 0.1 * 40 ) = 6 seconds
10 – ( 0.1 * 80 ) = 2 seconds
.
10 – ( 0.1 * 100 ) = 0 seconds !

As we see from the above, at 100 processes we are effectively bogged down in context switching only, there is no time for processes on the CPU to execute, meaning effectively zero throughput.

Again, we avoid this in both OS terms and work planning by introducing prioritisation.

Back to WIP limits

To illuminate our understanding of why limiting WIP is necessary we need to think in terms of flow and time to complete a task or activity. It is reasonable to assert that as humans switching our focus from one thing to another comes with a time penalty, a period in which we start thinking about the new or next topic before we are actually doing much execution, this could be expressed as ‘it takes a short while to get going on a different piece of work’.

Recognising that facet of our own mental capabilities is key to grasping why we should limit work in progress.

To illustrate WIP limits affecting flow and throughput there is a great video by David Lowe (@bigpinots) here.

Atlassian also talk about ‘flow’ here, having enough understanding to be able to talk about why we want to limit concurrent work in progress and explain why it’s necessary can be crucial to convincing others.

Engineering Human Comms

This short post came about as I was musing on common themes I am seeing through my own experiences across a variety of companies. I am attempting to crystalise some of the ways in which I think about what communications means, and how we begin to think about changing it.

It may come as something of a surprise to learn that a lot of people in businesses complain about poor communication.

Even more so to me, that it is often the very people with exposure to, control of, and repsonsbility for the mechanisms providing access to communications i.e. people in technology. I’m thinking about telephones, email, wiki’s, twitter and social networks.

I supposed that this may lead one to think people in tech were not very good communicators, now when have I heard that before..?

It is intersting to grapple with this problem, I like to characterise it in the following way:

  • What – what is the message, fact, opinion, i.e. content?
  • Who – think about who we want to notice what we’re saying, and if they are the right people?
  • How – does your intended audience have a mechanic for hearing what you’ve got to say, i.e. what mechanism provides them a convenient way to consume your communications?

If we do a paper exercise and jot down some of the answers from the team around the above three points, we often find some interesting discrepancies, ranging from the technical detail to the actual interest of the audience.

Atlassian recently published a piece on using blogging via confluence to beneficial effect, their article is well worth a read especially if you are already a confluence user. The article is firstly focussed on the benefit of internal blogging, but I would argue that making time in the schedule for public facing content is highly valuable, it might even help you to attract better engineers, as it demonstrates publicly a capability of communication with the tech group.

There is nothing new or revolutionary about this approach, we can still see the Sun Microsystems site, arguably pioneering the sharing of engineering team information publicly, with their now defunct playground.sun.com site (the earliest Internet archive snaphot all the way back from 1996 is here).

Other examples are the thetrainline.com who have been pushing out systems and software engineering articles at engineering.trainline.com since late 2012, and of course there is the requisite Netflix reference to cite as well a variety of their articles can be seen at their techblog.netflix.com site.