WIP Limits AKA [the] Context Switch overhead [problem]
The idea of suggesting that we do less is often a tricky one to get our head around, from a business perspective when we are under pressure to get things done, saying “no, this item has to go in the backlog” is not always easy, this is especially prevalent where we are firefighting.
So, why do I say “AKA context switch overhead”?
Digging in. Before I knew about ‘WIP limits’ as a thing, I used to explain this idea that increasing the number of things we were doing at once would lead to loss of effectiveness at delivery. My audience was largely Unix systems admin and network engineers so it was a bit technical, it went something along these lines..
“Think of a time slice scheduler in operating system terms, in our OS it has one job; to divide fixed CPU resources amongst the processes demanding CPU time on your machine. Every time the scheduler switches the CPU to work on a different process there is a fixed small amount of time required to change ‘context’ to the new process (think registers, memory, paging etc)..”
Often the penny would drop at this point in the conversation, people would begin to see that the overall throughput would be lower because the ‘end to end’ or ‘cycle’ processing time for any given process would be longer due to more time spent context switching and less on execution.
The effect of the number of processes on the cycle time
We can show the time scheduling of our imaginary computer mathematically (I hope) using the information below to produce a number which represents this ‘execution time window for processing’ or ‘cycle time’:
Execution time window : e
Available time: T
Number of processes: n
Delay for context switch: d
Using the formula ‘e = (T / n) – d‘ we can see the amount of time a process gets on the cpu for execution.
Lets plug some numbers into this equation to provide a simple example.
e = (T / n) – d
e = (10 / 10) – 0.1 = 0.9 seconds
e = (10 / 40) – 0.1 = 0.15 seconds
e = (10 / 80) – 0.1 = 0.025 seconds
e = (10 / 100) – 0.1 = 0 seconds !
As we can see our window of execution get smaller and smaller until it becomes zero, the way around this in our OS would be to introduce a priority scheme, which ranks processes in importance of execution (think of the Unix ‘nice’ command) such that the core system will always keep functioning, the side effect is that we will end up with some processes which *may* never get any CPU time. This happens to be a method we can readily integrate into our work planning, i.e. task and project prioritisation.
The effect of number of processes on throughput
Another part of the exercise is to look at the effect of increasing the number n , of processes we are running on the total available ‘processing time’, i.e. the time in which processes are executing on the CPU.
Using formula ‘processing time = T – (d * n)‘ we can see the impact on total available CPU time for execution
10 – ( 0.1 * 10 ) = 9 seconds
10 – ( 0.1 * 40 ) = 6 seconds
10 – ( 0.1 * 80 ) = 2 seconds
10 – ( 0.1 * 100 ) = 0 seconds !
As we see from the above, at 100 processes we are effectively bogged down in context switching only, there is no time for processes on the CPU to execute, meaning effectively zero throughput.
Again, we avoid this in both OS terms and work planning by introducing prioritisation.
Back to WIP limits
To illuminate our understanding of why limiting WIP is necessary we need to think in terms of flow and time to complete a task or activity. It is reasonable to assert that as humans switching our focus from one thing to another comes with a time penalty, a period in which we start thinking about the new or next topic before we are actually doing much execution, this could be expressed as ‘it takes a short while to get going on a different piece of work’.
Recognising that facet of our own mental capabilities is key to grasping why we should limit work in progress.
To illustrate WIP limits affecting flow and throughput there is a great video by David Lowe (@bigpinots) here.
Atlassian also talk about ‘flow’ here, having enough understanding to be able to talk about why we want to limit concurrent work in progress and explain why it’s necessary can be crucial to convincing others.