A workflow is a repeatable pattern of activities that systematically organizes actors to transform information, materials and energy into products. The phases of the Rational Unified Process are an example of such a pattern, and are depicted in Figure 1. Each of these steps requires time and effort to complete, and creates the intermediate work products necessary for shippable features to be developed.
Anytime prior steps produce inputs at a rate faster than the assigned workcells of a step can process them, a queue of work waiting to be processed will begin to accumulate upstream of the step. This queue becomes a bottleneck for downstream steps that, unless absorbed, will result in delays to customers, and will likely require additional resources just to status them. Batches of features are often injected into such processes at a granularity imposed by business authorization cycles, rather than at the pace of customer demand. This has the effect of delaying how long customers must wait for their needs to be addressed.
The growth in the size of such queues signal an imbalance between the capacity of the work cell and the throughput the overall production system needs in order to produce a batch of value within a deadline; as such queues grow, the end-to-end performance of cooperating agents is likely to deteriorate. This often results in external pressure from customers or sponsors, and may force a reallocation of resources (after inevitable delays) that strain the patience of customers, and frustrates the best efforts of those who must help triage the symptoms everyone must deal with.
Customers often balk at limiting the work they can submit for processing. If cycle times are short enough, customers can be convinced that flexibility and predictability are far more important than being on a list that never gets enough attention. Queue limits require work to be prioritized upstream before it is put into the input queue, and focuses resources on work that is suitable for processing. Resistance to prioritization is often a result of concerns that such priorities will not actually accelerate flow; they've had priorities in the past, right? But once customers come to understand the implication of applying priorities to actionable work alone, and recognize that work in that form will flow more quickly through the production system, they will be more willing to make shifts in resource allocation that will enable higher valued work to be delivered on time.
Congestion is further complicated as problems are encountered in a project, as they inevitably will. This triage provides feedback to earlier stages in the production flow, though less efficiently since symptoms, rather than issues, are what is being detected. This adds to the backlog of work that must be done, and can lead to a non-linear degradation of performance. While adding resources may address these queues, there will be an initial dip in performance once new resources are added, until the work cell can absorb the new person and operate effectively together.
This is similar to what happens in rush hour traffic when a highway with four lanes is reduced reduced to three. This is only a 25% reduction in the channel capacity provided by four lanes, but the choke hold that it creates results in a 2-3 fold increase in the time it takes for vehicles to travel a given distance. Such a choke point causes congestion, resulting in decreases in transit times for all involved. As this congestion increases, so do the likelihood of accidents and waste. We are pressured to leave earlier when it is recognized it will take longer to get somewhere. As these time pressures increase, so does the likelihood that inputs or outputs will not meet specifications, since there is a rush to 'get in line' whether the work is actionable or not. When work isn't actionable, these latent defects inject further drag on the capacity available in the system, since each processing step is burdened by the need to sort out problems that were injected upstream. The time required to discover these defects depend upon the effectiveness of defect containment by upstream methods and the diligence of the team in discovering and resolving them. Since queues are involved in these situations, as capacity saturates, it will be longer and longer between when a defect is injected and when it is discovered. This increases the rework that will need to be re-done, and makes tracing symptoms back to root causes more problematic.
Manufacturing, whether done as build to order or build to stock, strives to replicate a design within acceptable tolerances over and over. In contrast, in an engineering value stream, each problem is new, and the information needed to satisfy requirements may involve experimentation and evaluation of alternatives in order to arrive at an acceptable balance within the target configuration's design space. In manufacturing, as queues double, the result is visible as inventory and can be immediately acted on to protect the bottom line. In engineering, since the equivalent raw material for engineering is information, and the work products may be intellectual, rather than physical; such inventory is intrinsically less visible, so its doubling isn't noticed as quickly, and its consequences can be greatly under-estimated.
Ultimately, in both engineering and manufacturing, each operational step must protect itself against variations in the performance of upstream tasks. This variation is the primary constraint that arises in achieving end to end throughput of features in engineering projects, and often is the consequence of some upstream event that was expected to produce information that has not happened yet. Traditional project plans often expect deterministic performance from each step. However, the duration of each step is inherently uncertain, due to risks of underestimation, change, delays in work being actionable, and variation in individual performance. Such variation is a result of both random and assignable causes; if you can identify it or classify it, it is an assignable cause. Such causes arise from failures in execution, and must be eliminated through improved design & assurance methods. Random causes instead underlie differences in complexity and duration for different features, and reflect interactions across actors, and the resulting system dynamics. As a result, to reduce random or change causes, the system itself must be changed. Task durations should thus be expected to exhibit significant variation when attempting to do new things. Organizations can choose to understand and embrace this uncertainty (so that it may be shaped) or ignore it (and accept the need to react to it once it is discovered).
Not every type of this work moves through production systems at the same rate or in the same pattern. Multi-skilled team members can work across processing steps to respond to growing queues as they arise, and help the team to maintain a throughput cadence. Visual indications are effective in such situations to distinguish between the jobs which are time-critical to customers, and those which are driven by internal objectives and may be more flexible. An example of a longer-term improvement that is desirable but not required such as efforts expended on retiring technical debt within a particular timeframe.
Michael Cusamano suggests that fundamentally different process models are needed for different kinds of products. For high-price, uniquely designed systems, the primary process strategy should involve full satisfaction of user requirements and application of domain expertise using carefully customized tools. For middle-priced systems, a software factory process that balances user requirements with production costs, operating under a standard development process, is more appealing. For low-end, mass replicated systems, a process that maximizes functionality for the average user is more appropriate.
Regardless of the way that value is assessed, the importance of a particular job can vary widely. For example, one customer might choose to prioritize quality as the most important objective, cost as the second most important objective, and delivering on schedule as the least important objective. A different customer or business scenario might instead call for prioritizing schedule as the most important objective, quality as the second most important objective, and cost as the least important factor to take account of. In fact, the same customer often prioritizes these criteria differently at different times over a product's lifecycle.
This makes prioritization the most difficult of these steps, but unfortunately, the need is often met by claiming that all work has equal priority. The consequence of such a decision is that things neither important nor urgent can stand in the way of things that are both. Rothchild suspects prioritization challenges may be due to the changing perspectives that different stakeholders will have, depending upon the environment in which they are operating. Consider how the relative value changes for the same product under different customer scenarios during a product's lifecycle:
|Customer scenario||Primary objective||Quality||Cost||Flow time|
|Develop & launch new product||Begin realizing return on investments||Medium||Medium||High|
|Address critical in-service problem||Reduce down time||Medium||Low||High|
|Incorporate new features||Avoid impacting existing users||High||Medium||Medium|
|Implement maintenance updates||Minimize operational support costs||High||High||Low|
Once a team has begun to follow an effective approach to managing throughput with a mix of jobs with different priorities, it should then be possible to develop a sense of the cycles, effort, and flow time it takes to meet each of these objectives. It is only after a group's behavioral patterns can be stabilized that stakeholders should commit to satisfying a mix of performance objectives. At that point, only work which is ready for processing should be accepted, and there should be a limit of how much of this work is allowed into processing at any one time. When the batch size of work is limited to a single feature, the time it will take will depend upon the productivity of each work cell in producing the necessary outcomes. Planning for this throughput must initially assume all resources, material, and information are available exactly when needed, resources can be utilized at full capacity, and each activity produces defect free work every time. Since these conditions are almost never achieved, commitments should consider the risks inherent in meeting them, and ensure those are made visible and actively mitigated.