As I stood in the line at the grocery store last week, I was reminded of a really difficult math class I took back in the dark ages: Queuing Theory.
First requirement on day 1 was to learn to properly spell ‘queue’.
Much of the rest of the semester was a foggy blur slogging through some complex mathematical calculations. These were the days when mathematicians were working on the handling incoming calls on the newfangled telephone systems, the speed of long-distance telecom transmissions and speeding up operating systems. What was the best way to handle call volume, increase throughput and increase the speed of the packets or calls?
Turns out, the one thing I remember best from the class is the MOST useful tidbit for organizing the waiting line at the bank and the grocery store. Throughput is maximized with one queue (one waiting line) and multiple servers, assuming that all the servers can handle all of the requests, and most of the requests are the same size and generally ‘small’. Bigger requests (people with 2 grocery carts of stuff) really bog down even the one server that they visit. And specialized servers (checkout lines) create complexities in handling special requests. Consistency is the key, with one queue and multiple servers.
So why was I thinking about this? First, as much as I hate to stand in any line of 20+ people, I’m amazed by how fast the line flows at the grocery store when we all stand together and just wait for the next available checkout line. And I can never land in the ‘wrong’ line as I’m always handled by the next available person. And, the lines flow really fast when all of the items (mine and the person in front of me) have their bar codes ready for scanning.
And… I was thinking about a related conversation with a colleague who recently told me about her software team which had 13 ‘products’, 13 jira projects (boards), lots of small teams, and only about 40 team members in total. And, she had already told me what some of those ‘products’ were and they seemed like features of one main product to me. Her jira boards were all independent, and she had no over-arching view of progress or impediments. No dashboard. And the teams weren’t getting much done. I got to wondering if the learnings from Queuing Theory might be part of the answer.
I went digging to see if I was crazy and found a few old, and some very mathematical articles. Berteig, in 2006, and the LeSS team (year unknown) are just two that I read, but there are more.
The reason the one queue, multiple servers works is because we are optimizing the whole system for total throughput, and not treating any of the servers as ‘specialists’. When any one of the servers is available, they handle the next person in line. If any one of the servers has to go ‘offline’ for a short break, the system just adjusts to one less server with no change in the queue itself. If the queue becomes exceptionally long (Black Friday sale day), management can easily add a few more servers to handle the extra load.
The servers (the checkers handling the checkout process) can handle nearly all of the people that come through the line. And, to further speed up the checkout process, the items in the basket meet the ‘definition of ready’, i.e., all items are pre-tagged with their price. They can be quickly scanned and pushed through the line. Any few items which don’t scan can be quickly handled with a lookup directly by the checker.
So how do we apply all of this to my colleague’s business problem? We met to discuss some options, some experiments, and some metrics to check.
Experiment 1 – Criticality: Put the backlogs back together and require that the items be stack-ranked, in one prioritized list. Consider the prior list of ‘products’ to be epics/features of one product – the one thing that customers pay for. This ensures that the business agrees on the criticality of the items, and all of the teams pull off the same prioritized list. We ensure everyone is working on the true business-critical items. Specialists go where they are needed and when needed. We focus on whole-system output, not resource optimization.
Experiment 2 – Readiness: Get the backlog items better prepared before being eligible for the teams to take into the sprint. Scrum 101. Definition of Ready. If the Product Owners won’t do it, move someone into that front-end role to assist the PO(s) to prepare the items. Get the ‘barcodes’ ready to ensure the teams (the servers) can clearly process the item when they get it.
Experiment 3 – Focus & Stability: Once the items are taken by the team into the next sprint, the team is allowed to focus and finish. Management only gets to reprioritized the items on the backlog. No stealing and swapping the items in the grocery carts once the checker has begun checking. Can you imagine the chaos at the grocery store? Why do we do this to our software teams when we know better?
Experiment 4 – Measuring: Rather than measuring resource utilization and trying to maximize how many things the team is working on at once, start measuring the done stories (and production bugs) each week, and each sprint. If you do story pointing, measure the points completed. If not, just measure the number of stories. And slice and dice the data. If you can categorize your stories into types (by epic, by size, by requester, by feature, etc.) Find out which ones go quickly, and figure out why. Which ones go slowly and why? Deeply analyze the ‘factory flow’ and figure out where the slow spots are, and which types of items are slowest. Then you can target the most troublesome items.
I’ll be checking back in with my colleague to see which experiments she runs, and what her analysis shows.
Do any of these conditions sound familiar? I’d love to compare stories and see if you found other root causes and other solutions.
Need some help analyzing your own team’s throughput (or lack thereof) and increasing the total value delivered? Contact me for assistance. I actually did pretty good in this class.