Software Evolution

In this post I want to talk about how individual software programs ‘evolve’ over time and what determines their longevity, particularly with regard to the integration of commercial off the shelf software (COTS).  The seminal work in this field was conducted by Meir Lehman at IBM in the 1970s.  Studies have shown the average software program lifespan over the last 20 years to be around 6-8 years.  Longevity increases somewhat for larger programs, so that for extremely large complex programs (i.e., over a million Lines of Code – LOC) the average climbs as high as 12-14 years.  This increased longevity for large programs is related directly to the huge cost and inconvenience to an organization of replacing them.  Nonetheless, 12-14 years is not very long when one considers the risk and the investment of time and money required to develop and maintain a 1M+ LOC program.  Over the lifetime, such programs can easily cost $10’s or $100’s of millions.
For very small programs the average programmer can be expected to develop as much as 6,000 LOC/year, for very large ones the figure drops off drastically.  During the maintenance phase (i.e., after the program is delivered), for large programs, productivity has been shown to be a tiny fraction of that achievable during development (as illustrated in the Function Point graph to the right).  Mitopia® has built-in code analysis tools, used on a regular basis since 1993 to measure and analyze the code base.  Historical results show average programmer  productivity over the past 5 years to be over 30,000 LOC/yr. – off the chart even for a simple 10K LOC program according to COCOMO, and 3 times the maximum for initial development of 1M LOC programs, never mind maintenance.   Just to be clear, the LOC measure has little to do with real functionality, indeed most Mitopia® customization now occurs through scripts and dynamically interpreted languages which have an expressivity orders of magnitudes higher than any compiled code we might discuss herein, and which MitoSystems does not track in our historical LOC data.
Mitopia® is a 1M+ LOC program, and looked at from this perspective, at 22 years old now, it is into geological time scales in the software world.  Although most of the inner workings have been replaced (sometimes more than once), throughout that time the same essential coding paradigm and initial functionality (expanded and optimized with time) has remained.  Few programs survive for so long while remaining relevant and on the cutting edge of technology throughout their entire life span.  Change is the only real constant in our technological world, and change is the corrosive force that limits software longevity (see here).  Even today, Mitopia® is in many ways a technology without peers or competitors, so it may be instructive to look at the reasons why it has been able to avoid the standard software fate and remain relevant, particularly in an application space (data integration, analysis, & intelligence) that is characterized more than any other by change.
First let us eliminate from our considerations the ‘trivial’.  The current life expectancy 6-8 years (perhaps half that for mobile apps), is made up almost exclusively of ‘small’ programs (less than 100K LOC) that build upon existing functional libraries to deliver a product that by its very nature has a short viability.  The predominance of ‘tiny’ mobile apps in todays statistics is driving longevity figures down, and we cannot consider such simple systems or their market dynamics as relevant to our discussion of software survival and evolution into old age.  Small programs are easy to replace or plagiarize; life in this realm is short and brutal – breed fast, die young.  There is little need for a serious ‘maintenance’ phase in such software, the life-cycle is too short to bother; easier to replace the functionality entirely.

By contrast, in larger programs it has been shown that over 90% of the cost and effort is involved in the maintenance phase, not in development as most people think.  Big complex programs, rather than being built from the ground up (an intimidating and usually impractical strategy) tend to be assembled by ‘integrating’ a variety of building-block technologies or (commercial off the shelf software – COTS).  This means that the developer is free to focus on the unique ‘glue’ that is the specific application program being developed, while ignoring the complexity of building complex underpinnings like a relational database, GUI frameworks, functional libraries, and the many other familiar subsystems that make up any large program.  This COTS approach is favored by all the large systems integrators.  Unfortunately even though the US government (and others) have favored this approach over proprietary development for decades, people now realize that it comes with a serious downside:

Wikipedia: Motivations for using COTS components include hopes for reduction of overall system-development and costs (as components can be bought or licensed instead of being developed from scratch) and reduced long-term maintenance costs. In software development, many regarded COTS as a silver bullet (to reduce cost/time) during the 1990s, but COTS development came with many not-so-obvious tradeoffs—initial cost and development time can definitely be reduced, but often at the expense of an increase in software component-integration work and a dependency on third-party component vendors. In addition, since COTS software specifications are written externally, government agencies sometimes fear incompatibilities may result from future changes.
Besides being a silver bullet, COTS solutions raise some issues. A major issue with any COTS solutions is with the stability of the vendor. Vendors can go out of business, can be purchased by other companies or completely drop support for a product. This could be devastating for a customer that has purchased a COTS solution. With a COTS solution you also receive unknown quality. The purchased COTS solution may not perform properly in a customers business environment. Another unknown with COTS is its ability to integrate with other systems. The COTS solution may integrate with a certain system but that integration could be hindered by the external systems current version for example.
All these effects (and others) combine to limit the practical lifetime of any large program that is substantially dependent on COTS.  In this context COTS includes dependence on multiple programming languages (often ‘fad’ languages) and linked third-party libraries, since these too have limited lifetimes.  I have posted on this subject before.  Because of the web, such short-lived technologies are being integrated more and more commonly today, and this may have driven reductions in software lifetime and increase in program risk (i.e., the chances of failure during the development phase – see discussions here).  Reliance on the underlying operating system itself, over geological time frames, can also be a destructive source of change.
Recent research, particularly the COTS-LIMO model (see diagram above), has provided a formalized understanding of exactly why all these effects occur during large program maintenance.  The model shows that as the number of COTS incorporated into a system increases, the volatility and exposure to change caused by the need to constantly adapt the COTS ‘glue’ within a program can easily overwhelm the programming staff and indeed consume/exceed all of their available time.  The drop-off in programmer productivity is also explained by the fact that this constant exposure to change through COTS glue can wipe out any benefit gained through long-term programmer experience with the program structure, metaphors, and architecture.  This constant treading water effect also drives programmers to leave the project seeking new challenges, which in turn contributes to software death through entropy introduced by later less experienced maintainers.
This then is the major effect that causes the untimely death of large software programs.  I call it the ‘COTS cobbling’ problem, and have spent much time over the years trying to warn government agencies of the dangers – a futile effort of course (see first post on this site for an example).  The dangers are not by any means limited to premature project death, they also include a functionality decline to the ‘lowest common denominator’ driven by COTS platform and compatibility constraints, not by actual needs.  The ‘bermuda triangle‘ problem is perhaps the most common and obvious example of the broader COTS cobbling issue.
While the exact number of divergent COTS technologies necessary to kill/derail any given program through change and software entropy within a specified time period varies somewhat depending on the domain and the COTS chosen, the break-even point where maintenance no longer makes sense past a decade say, may well be around 4 as illustrated in Figure 1.  For a two decade lifespan, the number may be 2 or less.  One of these is often the underlying ‘fad’ programming language(s) – (see long term language ratings – here).   Another is always the underlying OS; even operating systems change fundamentally over software geologic time scales. Virtually no large programs developed in the last two decades are dependent on so few COTS components, and so few have/will have lifetimes much beyond a decade.   Mitopia® too made the COTS cobbling mistake during early development, however, for various reasons we survived – complete with code analytics to document the entire process.  This data starkly illustrates the issues, and is summarized in the chart below:
Figure 2 -Evolution of the Mitopia® code base 1991-2013 (click to enlarge)

The graph shows the total size of the code base over the 22 year period, broken down by the various subsystems that make up the entire Mitopia® environment.  Various critical ‘change’ events are annotated on the diagram.  Overall, a key thing to note is that excluding the early code mountain ranges, long since eroded, the rate of new code creation has remained roughly constant over the entire life-cycle (Lehman’s 4th law).  This is despite the fact that during initial development there were approximately 20 times as many full-time developers working on the code as there are today.  The fraction of the code base implemented as fundamental application-independent abstractions within core Mitopia® code (the purple area of the graph) has risen from an initial value of around 5%, to just under 50% when MitoSystems itself took over in 1998; it is around 85% today.  In that same time frame, application-specific code (i.e., not abstracted and generalized) has dropped from 95% at the outset to just 0.3% percent today.  This is a key indicator of a mature and adaptive ‘architecture’ rather than a domain-specific application.  This universal use of, and familiarity with, a few very powerful abstractions to do virtually everything has been responsible for the constant increase in programmer productivity (up to 30K LOC/yr. today from just 7.5K LOC/yr. prior to ATP) – Lehman’s 5th law.  This change has allowed software team size and costs to drop by such a large factor (20x) while total productivity remains unchanged or even increased.

Lehman’s “Second Law” of software evolution (compare with 2nd law of thermodynamics) states:

“The entropy of a system increases with time unless specific work is executed to maintain or reduce it.”

The gradual consumption of other subsystem code by the core abstractions within Mitopia® depicted in Figure 2 shows that MitoSystems approach to software evolution has always been driven largely by the desire to reduce software entropy.  It is this fact more than any other that explains program longevity.  Mitopia@ is what Lehman defines as an E-program, and as such is governed by all 8 of Lehman’s laws.  Many of the other posts on this site address the core Mitopia® abstractions and how their expressive power allows broad applicability and removal of domain-specific code edifices.

At this point it must be said that the design documents produced by the end of the two year design phase that started in 1991, already specified many of the most basic abstractions in great detail, so one would expect that there would be little or no wasted or duplicative code developed in the other subsystems.  It is the ultimate elimination of this waste/duplication that partially accounts for the jagged peaks that characterize the first half of the graph (i.e., the first decade of software life – by which time most programs have already died).  The truth of the matter is that initial development of low level Mitopia® abstractions themselves took nearly four years to complete, and in the mean time other subsystem developers were forced to create alternate piecemeal solutions themselves, sometimes by incorporating COTS, in order to move their code bases forward.  The pressure to deliver concrete functional capabilities to the customer, despite prior warnings of how long such a system development might take, had much to do with this early mountain formation.  I myself got pulled into implementing the core abstractions despite my planned intent to stay out of the code entirely.  This in turn caused a lack of competent technical supervision elsewhere during this period.   As we all know programmers are infinitely harder to herd than cats, so the ultimate result was a COTS cobbling approach early on.  It was never the plan, but it happened anyway.  Mea somewhat culpa.

As the chart shows, by around the year 2000, the code base had swollen so much it was larger even than it is today.  Worse than that, because of the large diversity of approaches in the code, and particularly the large number of COTS and external libraries involved in the non-core subsystems, programmer time was almost entirely consumed by maintenance activity for the reasons already discussed.  As a result, little or no actual forward progress in terms of non-core functionality was occurring.  Meanwhile the core Mitopia® code which was (and remains) wholly independent of any other technology (except the underlying OS) continued to expand in capabilities.  This dichotomy prompted a detailed examination of the underlying causes, and resulted in identifying ‘COTS cobbling’ as the fundamental issue and enemy of productivity over the long term.

In 2000 we decided we had to tackle this issue, since we could no longer tolerate the entropy and maintenance workload caused by so many diverse technologies.  First to go was the custom database library (InsideOut) and wrappers used internally by the audio/video subsystem (pink area labelled MitoMovie™) to track external equipment and allocation for video capture and streaming.  This was integrated into the Oracle database used elsewhere at the time.  The continued development and maintenance of our own video CODEC, ATM network driver, and streaming technology (known as DVS) also no longer made sense, so we re-engineered to use QuickTime.  Moreover, for a couple of years (starting in ’99) we had been forced to run in 68K emulation on Apple’s new PPC machines (there were heterogeneous installations at the time) because we used a purchased ‘stemming’ library that was implemented using the 68K processor; a PPC replacement was not on the cards.  We were thus forced to develop our own replacement stemming technology.   The first precipitous drop in code size caused by these changes can be seen in the chart during the year 2000.

In 2002, our ability to continue running under Mac OS-9 came to an end, as we knew it ultimately would.  Apple had moved entirely to the OS-X architecture, and while the ‘Carbon’ APIs from OS-9 were still supported, three of the largest and most heavily used external COTS chunks were dead in the water with the change in OS.  The Oracle database, while supported (somewhat) under OS-9, was at the time unsupported on OS-X with no plans to change this in the foreseeable future.  The same was true for the Personal Librarian Software (PLS) we had incorporated into our database federation to handle inverted file text searching across languages.  Equally disastrous was the fact that our internal MitoWorks™ GUI framework no longer made sense in an OS-X environment, and a replacement would be needed at the same time as all the other changes.  2002 constituted the single most disastrous illustration of the danger of COTS to life expectancy.  If these components had only been our own code, all we’d have had to do would be re-compile them.  It was a harsh but useful lesson.

For most projects, such a combination would have been instantly fatal, however, due to the unique nature of our primary customer, we were able to embark on a 1.5 year re-implementation of all three obsolete components which was responsible for the single largest drop in non-core code (see chart from 2002-2003).  In this one transition we shed around a third of the entire code base while coming out of it at the end of the transition with enhanced capabilities, vastly improved performance, and a code structure now almost entirely implemented as core abstractions.  We were now dependent on just 3 external COTS components (down from closer to 10 at the outset).  The remaining dependencies being the underlying OS (now OS-X), the replacement GUI framework (known as Genesis), and a GIS library (which itself had replaced an earlier choice when the original vendor went out of business).  Best of all, after the transition, programmer productivity increased dramatically and the time spent treading water in maintenance dropped proportionately, so allowing a subsequent up-tick in the rate of introducing new functionality into the architecture.

Throwing out the relational database and inverted text engine turned out to be one of the best things we ever did since it lead to the MitoPlex™ and MitoQuest™ core technologies, both built directly on the core Mitopia® extractions.  The resulting code simplifications were only outweighed by the massive increases in performance and scaling we were able to realize at the same time.  The same was true of switching to the new GUI platform which ultimately delivered improved visualizers, and enabled rapid development of the data-driven GUI approach to previously unthinkable levels.  The combination made the architecture ‘adaptable’ to new domains in time frames that were fractions of those required earlier.

The whole experience, while traumatic at the time, made it clear that COTS cobbling is the enemy not only of productivity, but also adaptability, functionality, performance, generality, cost flexibility, and a whole host of other metrics that we had previously though to be immutable facts of software life.  Not so, poor scores on these dimensions over time turn out to be largely driven by the compromises that COTS dictate.  It became clear that we should not stop there, but should re-examine the three remaining COTS items to see if even they could be eliminated to achieve greater performance.  COTS programs are S or P programs (in Lehman’s language), not E-programs.  Depending on S or P programs will usually be the undoing of an E-program.  I would propose a 9th law to add to Lehman’s which is as follows:

Proposed 9th Law:   “Each significant external software component that an E-program relies on for internal functionality, will incrementally reduce that E-program’s life expectancy.”

In other words, its not just about how you structure and maintain your own code, longevity is also just as much about the number and type of the external technologies you rely on.  External reliances introduce as much if not more entropy than sloppy maintenance changes ever could.

Figure 2 above from 2003 onwards shows a smooth process of increased programmer productivity, core abstraction functionality, and gradual independence from the remaining COTS components.  Such independence remains our top technical goal.  It took 20+ years of experience on a single code base to realize how critically important this is.  Alas, few programmers ever work on one large code base for so long, mainly because few projects survive this long, which in turn is because the developers rarely get to see how damaging external COTS can be over a multi-decade software lifespan.  It is a vicious cycle that few can even perceive, and which by its very scale prohibits serious academic investigation.  Trust me though, it is real.

Over the years following 2003, we implemented our own ontology-based GIS subsystem entirely built upon core Mitopia® abstractions.  One more COTS component bites the dust.  Other minor dependencies have also been systematically eliminated.  Starting in 2008, a new core-abstraction based universal renderer approach has incrementally replaced the use of the 3rd party Genesis GUI framework through a kind of background coding task when time permits.  At present, both renderers remain simultaneously active within a single instance of Mitopia® (through the new GUI interpreter abstraction).  Different widgets can pick different preferred renderers (there are now others) so that at this point of time Genesis is used only in a few places (mostly visualizers).  Soon it too will be optional, and the last COTS dependency, other than the underlying OS, will be gone.

The OS will undoubtably be next as Apple’s tendency to deprecate C APIs in favor of Objective-C replacements becomes sufficiently irritating that it will ultimately force a full scale elimination of OS dependencies.  MitoSystems sees Objective-C as just another ‘fad’ language (i.e., COTS) to be avoided, even if platform mandated.

The moral of the story?  If you want your program to live to a ripe old age, don’t trust COTS…or is that cats, I can never be sure.

Comments are closed.