Built-in Module Tests

bigstock-Code-bug-13044284-300x225In the previous post we discussed coding standards, and in this one we will examine the equally important subject of module/unit tests, and in particular the built-in module testing framework that Mitopia® provides, and which MitoSystems’ coding standards mandate.   A distinction is generally made between ‘unit tests’ and ‘module tests’, the former is normally a development phase activity performed by the developer on a per-function basis while the latter is normally though of as something that happens after initial development and may be repeated regularly to ensure the module still performs as expected during maintenance.  Module tests tend to test operation of an integrated set of functions more than unit tests do.  In either case, these tests tend to involve running the target code within some kind of specialized testing harness which is quite distinct from the actual program the code is designed to be part of.

In Mitopia® we take the position that unit tests and module tests are one and the same, and that rather than being distinct from the target application, they are always part of it, and can in fact be invoked at any time from within the Mitopia® application itself.  This is a fairly radical departure from normal approaches, and so perhaps we should start out by describing the reasoning behind this choice of approach.

The following are some key advantages of having module tests available in-place in the target application:

  • If the tests run within the actual application environment, they better represent what will happen in the real thing.  Isolated test harnesses tend to simplify things such that test results may not reflect real operation in place.
  • By running within Mitopia®, test programs have access to all the API calls and all the debugging facilities that are built in.  This also obviates the need to build custom harnesses which might otherwise discourage module test development.  Indeed by having the module test in the same source file, it becomes perhaps the most expressive form of documentation as to how particular functions are invoked and should operate.  This is critical to someone trying to debug in an unfamiliar module since they can simply execute the module test and step through operation of the functions they are interested in without having to figure out how to create such a single stepping opportunity another way within the real system.  The result is that there is less temptation to put a ‘temporary’ change into the system in order to investigate something, and then forget to take it out.  In a large system under active development, entropy from forgotten test statements (or temporary commenting out) can be one of the largest sources of new bugs.  The built-in module test reduces this effect.  As an aside, Mitopia® coding standards also mandate that any temporary code change made for testing purposes MUST be surrounded by an #ifdef TEMPORARY conditional compile block.  The value TEMPORARY is always defined (for development builds), however the rigor of always adding the #ifdef TEMPORARY makes such things trivial to find later if distracted, and has no doubt saved man months over the project lifetime.
  • Mitopia® coding standards mandate that the module test code be in the same source file as the module itself which means module tests are far more likely to be maintained and used that would otherwise be the case.  They also change as the code/API changes and thus help to document the consequences of the change.
  • The ability to run a module test at any time within the actual target environment being debugged as a whole is an invaluable comprehension and fault isolation aide and it removes all questions of if the same code involved in the module test is running in the real system.
  • As with all things, the Mitopia® philosophy is that everything one might ever need be part of the application itself, even if not normally executed.  This applies to module tests just as it does to help tools, code analysis, utilities, and all other things that need to ‘follow the application or development environment’.  Build them in so they don’t get lost over time.
  • Being in the same source file and standardized (see below), the unit/module test becomes the quickest way for the developer of a new package/abstraction to test any new functionality they have written even before the abstraction is complete.  This then becomes how they initially test their code, and as the abstraction fills out, they add additional test steps at higher and higher levels of functionality and integration.  The end result is that the first few test steps in a module test sequence tend to be closer to what one might expect from a unit test, while later and later steps transition to an integrated module test.  The final module test is thus a combo of a unit test and a module test, and is so useful for developer testing that all new features are automatically added to the test suite, as that is simply the easiest thing to do to get them working.  In the end, the module test is a very effective detector for anything that might be broken by a maintenance change.  Even many years down the line, it is simpler to add some new edge case found in the field to the module test and find/fix the bug that way, than it is to recreate the necessary conditions in the real system.  Moreover, being in the same source file gives access to static routines also and so avoids having to publish things that shouldn’t be simply to support testability.
  • The whole trick is to make the module test and its maintenance easier to use for the developer than any other way they might try to step through their code.  By doing that, one guarantees good module tests and that in turn guarantees robust and reliable code.  Developers are always in a hurry, so if they think something else will be quicker than adding an additional module test step to test new functionality, they’ll usually do that instead.  So we have to make this incredibly easy to code, and use, more so than any other approach they might try.

 Mitopia’s Module Test Suite

The basic concept behind Mitopia’s module test approach is that every test step should convert the results of code execution to some form of human readable string, and that by defining an ‘expected’ value for this string and comparing it to the actual ‘achieved’ value, all test steps can be reduced to a standardized form and the entire test step registration and execution process can be formalized and controlled within Mitopia® so that all test steps remain available at all times.  If the two string match, the step passes, if they don’t match, the framework can print out the expected and achieved, and the tester can easily see where any why they don’t match.  Now the first time someone unfamiliar with this approach hears this strategy, they invariable say well … not everything can be converted to a string like that, what about if its a comms. thingy, multi-threaded, causes errors, or perhaps is user interface stuff, how do you convert that to a string?  My answer is that: all of these kinds of things are module tested within Mitopia®, and we have found ways to convert all of them to meaningful strings.  It is really very simple.  So, given that premise, lets go ahead and describe the basic structure of a module test and the API’s one uses to register and execute it:

FnPopupThe screen shot to the left shows the final portion of the function popup within the development environment for the Lex.c package.  Mitopia® coding standards dictate that every module have an initialization and termination function which must be named XX_Initialize() and XX_Terminate() where XX_ is the package prefix, it also mandates that these functions be at the end of the source file.  Part of the reasoning for this is that this allows the initialization function to ‘see’ and register the entirely ‘static’ module test – which the coding standards dictate should appear immediately above the Initialization section as shown in the screen shot.

We can see from the screen shot that for the Lex package, the module test comprises 19 distinct test steps.
RegisterMTThe module test itself is registered with the suite by calling DG_RegisterModule() as illustrated above.  The function LX_CALLtest() provides the interface to the standard module test suite, and the second string parameter is the module name.  This registration allows the module test to be run at any time from the built in Administration window simply by choosing “Invoke a Module Test” and then picking the name of the module to be tested as illustrated in the screen shot below:


InitializationTThe screen shot to the right illustrates the code necessary to register the various test steps with the test suite so that it can call them.  Note that LX_testInitialize() defines a global using Mitopia’s OC_MakeGlobal() function to reference the test context handle ‘cHdl‘.  This makes the test context available to other code and threads throughout the system.  In particular, should any test step cause an error report (in this thread or any other), the error logging facility will automatically insert the error details into the current ‘achieved’ string as a result of the module test suite registering a custom error handler for this purpose.  This will thus make a test step that is not expecting the error fail.

LXtestContextEvery package defines its own unique module test context type (in this case LX_testContext) and these always start with the defined Mitopia® type ET_TestContext which is used by the module test suite to perform all its functions (see definitions below).  In particular this structure is used to track the registered test steps and hold the expected and achieved strings for each step.  Specific modules may add additional fields and structures to this generalized test context as required by their own module test code.  In the case of the Lex test, it adds a lexical analyzer DB and a large buffer used internally by certain steps (see above).


Given all these definitions, we can now look at some simple early test steps for the Lex package (remember tend to be like function unit tests):


As you can see each step sets up the expected string by copying a constant into the ‘c.expString‘ field of the context, it then performs a sequence of actions using package functions in order to general a matching sequence by testing logical operation of the package.  When the test is run, the framework invokes each step in sequence and compares expected and achieved strings when complete, outputting a single ‘.’ to the console for a successful test, or the mismatching test step and expected/achieved for a failure.  It also monitors for memory leaks caused by the test.  A successful run of the Lex module test therefore looks as follows:


Obviously, the test steps above are relatively simple.  To illustrate more complex module level tests, the screen shot below shows the first step in the Parse package test which is in fact testing creation and operation of a full C expression parser by comparing its results to those obtained from the equivalent C program.  This single test step tests a broad range of Parse.c functionality as well as correct operation of the underlying Lex.c package and many other features.  In fact, module tests if correctly targeted at the uppermost functions (which in turn require lower level functions to operate correctly) can consist of a relatively small number of steps  in order to provide broad coverage.  Remember also, as the Parse example illustrates, that because Mitopia® code is so heavily layered on lower level abstractions, module tests for dependent abstractions (in this case Parse) further confirm correct operation of lower level abstractions (i.e., Lex) within the actual target environment.  The full suite of registered module tests if run (and passed) thus provides a fairly high degree of confidence that any given change made during maintenance did not break something else.  Indeed such tests in Mitopia® are also fundamentally a major part of integration testing since they all occur within the real application including of course access to persistent storage, GUI, and everything else.  Some Mitopia® module tests further up the abstraction pyramid wipe the LOCAL server content (after a suitable warning to the user) and then mine and persist significant data sets, which they then use to test both client and server side operation.

ParseTestsAs can be seen, for a relatively small one-time effort in creating and registering the test step, the developer (and maintainer) using the Mitopia® module test suite gains a permanent way of testing that all functionality in the module is working as intended, an easy way of stepping through code to understand it, and documentation of how package functions can be combined and invoked in order to accomplish complex things.  Best of all, none of this is ever lost as developers leave or are changed.   Mitopia’s built-in module test suite thus represents a key pillar making the code base resistant to ‘entropy’ and bugs caused by ill conceived maintenance actions on the code.  As we all know, most bugs are introduced during maintenance, and most of the cost of a piece of software  is actually incurred hunting these bugs during maintenance (some estimates go as high as 90%).

Built-in module tests should therefore be a required feature, enforced by coding standards, and integrated into the actual product, in any software designed for longevity.


Coding Standards


Allen Holub’s excellent book on C programming standards

Much has been written on the subject of coding standards and conventions over the years, and they are often the subject of vigorous debate.  In this post I want to give a brief overview of the MitoSystems (C language) coding standards and the philosophy behind them.  First let us be clear: the purpose of coding standards is to improve the readability, reliability, and maintainability of the code base.

In any large project such as Mitopia® wherein multiple people other than the original author must be able to rapidly understand and trace through code during a debugging session, the most important thing one can do to facilitate this is to maximize the degree to which all the code looks basically the same, and uses the same commenting, indenting, file organization, spacing, and underlying libraries and techniques.  In a large code base uniformity is critical to efficiency, reliability, and comprehension.  Uppermost of all these is comprehension by both the author or more likely other code maintainers.  It is during maintenance that most ‘entropy’ and bugs are introduced into code through poor comprehension of the side effects of what appears locally to be a safe change.

For this reason, arguments commonly put forward that coding standards inhibit creativity and need not be followed by star programmers (these arguments are most often put forward by these individuals themselves), are merely showing a lack of professionalism on the part of the proponent, and a failure to see things in the larger ‘picture’.  I have posted before on the relative irrelevance of the programming language in solving these goals, as I have on the relative irrelevance of the language metaphor, and the dangers of ‘COTS cobbling’ in this regard.  There are no magic solutions to maintainability and robustness, in the end it it all up to the programmers and the degree of design and coding rigor they apply – from the bottom up, no shortcuts.

The truth is that for large long lived projects, maintainability (and thus project longevity – through resistance to entropy) is driven primarily by four things:  (1) Good requirements and initial design, (2) Development of well designed and layered generalized abstraction libraries organized as packages, (3) Good coding standards, rigorously enforced, (4) Pervasive (and maintained) module tests for all packages (5) Other adaptivity techniques (see other posts on this site e.g., here).

We have discussed (1) and (5) in previous posts and have illustrated (2) throughout this blog, so in this post I want to look at the important subject of coding standards (3) in more depth.  We will look at Mitopia’s module test approach (4) in a future post.


The Basics

We will use the simple function above to illustrate a number of MitoSystems’s basic coding standards:

  • Every practical compiler warning should be turned on and treated as an error requiring correction before the code can execute.  This practice alone, forces the code to be more explicit and maintainable and avoids time wasting in debugging.  Static code analysis should be run regularly.
  • Code relating to the same abstraction all goes into the same source file forming a module or package, and every function (internal or external) starts with the identical 2 or 3 letter prefix (in this case LX_ for the package Lex – Lexical analysis).  This convention rapidly clues the reader into what kind of thing a particular call might be doing (for example SS_ is the prefix for the searching & sorting package).  All internal package functions (i.e., those not exposed outside the package – the bulk usually) are declared ‘static’ and where possible appear before they are used so that no separate function prototype needs to be declared (and maintained).  The average number of source lines in all files (code and header) for Mitopia is around 3,000 some complex packages can be up to 15-20 thousand lines.  Encapsulation (and hiding) into logical packages is the goal, regardless of the resulting file size.
  • Every function is preceded by a standard function header comment giving the function name, description, notes, keywords (optional), and related functions (see also).  Before the function is even written, this function header and the description of what it does and how should first be completed.  Having to explain something in English goes a long way to clarifying exactly what it is supposed to be for.  Simple private functions may scrimp on this description.  This description should be updated whenever changes to the code require it.  For the investment of a few minutes prior to writing the code, one saves the author and those that follow hours of wasted time.  The function header comment contains key alignment marks (the ‘-‘ chars as opposed to the ‘+’ chars) to facilitate the creation of aligned prototypes and comments.
  • Every function prototype is declared in a column aligned form (as illustrated above) where the ‘//’ comment marker occurs at column 80.  The prototype declares the function parameters one per line with a standard line comment following that uses the convention ‘I:’ or ‘O:’ (or ‘IO:’) to define if the parameter is an input, an output, or both and then gives a parameter description.  The standardization of these function prototypes and the function (and package) headers is leveraged by code built into Mitopia® itself to auto-generate the API documentation from the actual source code (much like the Javadoc and Doxgen tools).  This Mitopia® tool generates all public code documentation (as a web site) whenever needed, and this approach ensures that the documentation always matches the code (which is rarely the case otherwise).  Note the philosophical point that the tools you use should be actually within the program you are working on where possible, that way they stay up to date and don’t complicate your build scripts etc.  For an example see below – note the auto-hyperlinking within the documentation by recognizing function names, the automatic arrangement of documentation by packages and project/subsystems, and the integration with other forms of documentation – all done automatically.  Other Mitopia® internal tools use these standardized headers to perform code analysis and generate metrics (e.g., historical code metrics).  The entire focus is to drive everything from the code, which is in reality about the only thing one can force developers to update properly.


    Example auto-generated documentation from standardized headers/prototypes

  • Non-static function prototypes are accumulated (organized by package) into a SINGLE private function headers file for each subsystem (e.g., MitopiaFuncs.h).  Non-static constants are similarly organized into a single file per subsystem (e.g., MitopiaConsts.h), as are types (e.g., MitopiaTypes.h), and globals (e.g., MitopiaGlobs.h).  We found that the standard approach of one header file per source file led to conflicting definitions, proliferation of trivial files, and an inability to quickly find things.  This was why we required header file unification on a per subsystem (e.g., Mitopia, MitoImage, MitoMovie, MitoScript, MitoSphere, etc.) – it vastly improves comprehension and reduces inconsistencies (since the compiler will forbid multiple conflicting declarations in the headers), it also means that only a single header per subsystem needs to be included in source code.
  • Using these unified header files, a custom MitoSystems tool automatically creates a single public header file during build containing only those portions of all declarations in the individual files that are not with an ‘#ifdef INCLUDE_PRIVATE‘ conditional compilation block.  Similarly a unified private file is created containing all content.  The private header is used within the subsystem, the public header is used elsewhere.  This automated approach thus ensures that maintenance changes can never create inconsistencies in headers (or documentation) which is otherwise one of the primary causes trouble downstream.  All this from a few simple rules regarding header files, function prototypes, and function/package commenting standards.
  • Every function starts with an ENTER() macro and ends with a RETURN() macro.  These macros are leveraged to provide a host of benefits.  The ENTER() macro for example (which must be the first statement of the function) internally declares necessary variables and implements logic that provides the full stack crawl capability associated with all Mitopia® errors.  This same capability is utilized in Mitopia’s built-in leak checking technology to tag allocations with a crawl where allocated.  This makes leak hunting and resolution almost trivial within Mitopia®.  Setting various debugging options also allow this macro to provide pervasive thread debugging, full execution profiling of function calls and execution times, stack depth monitoring, and many other capabilities critical to rapid fault isolation.  The matching RETURN() macro is critical to much of this functionality and also effectively enforces the requirement that there be only a single return statement (at the end of the function – labelled either ImmediateExit or Good/BadExit by convention) for all functions.

    details of the RETURN macro

    It does this by redefining the C reserved word ‘return‘ so that it cannot be used in the code directly, and so that if used twice in the same function, it will generate a compiler error/warning.  Experience has shown that having multiple return statements dotted around a function is the #1 cause of leaks and other unintended side effects introduced by subsequent maintainers that have not fully understood all possible execution paths.  By enforcing a single exit, all these debugging features are possible and it also makes all functions look the same and perform their cleanup in a standard place.  Moreover, notice the three definitions RETURN_ret, RETURN_void, and RETURN(res).  In effect these enforce another standard which is that the return value for ALL functions must be either ‘void‘, or ‘ret‘ (since RETURN(x) where ‘x‘ is anything else is undefined).  Figuring out what a function is returning is a key time waster when stepping through code.  Through these macros, this problem goes away, the function return value is always called ‘ret‘ (you can’t even return constants like true or false!).  Again this standard is focussed on the goal of rapid code understanding, and the ENTER()/RETURN() formalism is a huge part of enforcing this standardizations in a way that programmers cannot simply work around.


    Common pattern for function exit handling

  • Code indentation is by one tab character per block, ‘{‘ and ‘}‘ always aligned vertically with each other and with the outer code indent (i.e., no ‘leftie’ (per K&R), ‘uppie’, or ‘innie’ forms – MitoSystems code is always ‘outie’).  This requirement is made to facilitate rapid scanning of complex blocks of code for block structure without having to waste time looking for the matching ‘{‘ or ‘}‘ mixed up with any other tokens.  They are on their own line and always vertically aligned.  Once again understanding the block structure of the code is a key time waster during debugging of unfamiliar code and this convention minimizes this.
  • Variable names and parameters use ‘camel case‘ first char always lower, constants (#define) generally start with ‘k’ and use camel case, or if complex or fundamental in some way, they are all upper case.   Types generally start with an upper case character and also use camel case, they may include the package prefix.  Low level built in types (e.g., int32, anonPtr, etc. are an exception).  This allows easy and rapid recognition of what something is.  Variable names need not be over long since given other coding standards, the programmer can rapidly identify their purpose, and making longer names simply ‘spreads out’ the code making it harder to see the meta-structure.
  • No space between function names and their parameters, however except for unary operators, there should be a space between operators and their operands.
  • Block comments (i.e., /* and */) are discouraged in favor of line comments throughout.  This allows block comments to be trivially used on a temporary basis to comment out any chunk of code.  Line comments (i.e., // ) except for function headers and very occasionally large block discussions always start at column 80 and describe in English the purpose of the statement only if necessary to clarify.
  • Blank lines between lines of code should be kept to an absolute minimum.  Combined with the commenting standards, this ensures that the maximum amount of code can be seen at once (a large screen is required for the comments) and ensures that comments don’t break up the obvious structure of the function since they are pushed off to the right thus allowing pure compact and standardized indented code on the left side.  Note that some lines of code may extend well past the 80 column mark if this makes the overall structure of the code easier to understand at a glance (primary related to block indenting).  The effect of this, when combined with code ‘coloring’ provided by the development tools (in the example above types and functions are blue, strings red, macros/defines brown, and comments green), is to greatly increase the speed with which it is possible to understand ‘at a glance’ what a function is trying to do, and this is of course critical to operating effectively within a vast and largely unfamiliar code base.  Comments embedded within code degrade this ability and should either be moved to the right (if brief), or to the function header “Notes:” (if more extensive); this allows them to be looked at only if/when the reader wants to and so increases speed of understanding.
  • Where a function may take parameters specifying various optional behaviors, the last parameter to the function should be an integer ‘options‘ parameter and the various options mask bits are defined (along with the function prototype in the ProjectFuncs.h header) by constant bit masks (e.g., kDoSomethingSpecial) which can be added together in order to specify multiple options.  This makes it simple for the unfamiliar reader to find such options, and also avoids having to change the external API whenever a new option is added.
  • No static globals are allowed, all globals for a subsystem should be referenced via a single global pointer which references a globals structure.  This allows all such globals to be easily found and examined in a debugger and ensures they are all cleaned up as required.
  • Declarations within inner blocks of a function should not be used, only the initial declarations list for the function should be used.  Once again this ties into the unified cleanup solution imposed by the ENTER() and RETURN() macros.  The only allowable exception is variables associated with conditionally compiled testing code within the function.  Do not initialize local variables to anything other than a constant in the declarations section.
  • Labels always start at column zero.

Bottom line for all these basic standards and conventions is they make the code easier to understand and test, and they automate completely the task of making documentation match code, as well as various other maintenance tasks that might otherwise be avoided by the lazy.

Packaging, abstraction, and run-time standards

In addition to the basic MitoSystems coding standards described above, Mitopia® itself imposes a number of of additional rules to further enhance reliability and programmer productivity within a large code base.  The paragraphs below discuss some of these standards.

All code should utilize the standard abstractions provided by Mitopia® to manipulate data structure aggregations.  These standard metaphors include the flat memory model, types and the ontology, string lists, lexical analyzers, and the database/persistent memory abstractions.  In particular, the creation of any memory resident structure that contains pointer links is strongly discouraged.  The underlying abstractions are powerful enough to represent anything one might need and are highly optimized.  By using them for all things, we make the operation of most areas of the code immediately familiar by analogy with other known code based on the same abstractions.  Of course the minimize the code size and avoid introducing low level bugs also.

The code should be designed throughout to be as platform and architecture independent as possible (e.g., endian issues).  This means overt declaration of the size of variable implementation intended (as in the sequence int16, int32, int64).  This runs somewhat contrary to C norms, however experience has shown that programmers alway have in their mind what size ‘int‘, ‘long‘, or ‘short‘ might be, even if not stated, with the result that code breaks badly over time through range and structure alignment changes mandated by compilers and the underlying processors.  The only constant in this world is change, and change breaks all hidden assumptions.  Better to be explicit in everything – including explicit structure padding if needed to guarantee universal alignment.  The only exception to this rule is the C assumption that ‘long‘ is the same size as a pointer (though that may be either 32 or 64 bits depending).  If in doubt, use 64-bit values for integers and double for real – with modern computers there is really no reason not to.

Further to the platform independent goal, all calls to the underlying OS toolbox (and C library for that matter) must be wrapped (and preceded by the prefix XC_ e.g. XC_NewPtr).  These wrappers are all declared in the single header file ToolBoxMap.h, and the implementations (which simply call the toolbox routine) are all gathered into ToolBoxMap.c.  This structure ensures that all toolbox/external calls emanate from ToolBoxMap.c, and the macros XC_fnName() declared in ToolBoxMap.h and used to invoke are the wrapper functions can then be organized to go through a mapping table.  This in turn allows any and all toolbox calls to be dynamically patched for debugging purposes, or if one switches to an underlying OS that does not provide it.  This allows calling code the be platform agnostic to the highest degree possible.

One key example of use of this technique is to wrap all memory allocators and de-allocators and then substitute alternates that take advantage of the ENTER() and RETURN() formalism in order to implement leak checking and a variety of other critical debugging capabilities.  In this way, all allocations can be fully analyzed by Mitopia® itself without requiring external debugging capabilities.  This is essential in a heavily multi-threaded environment such as Mitopia® where tracing memory ownership becomes a real challenge any other way.  The end result is that you cannot find direct toolbox calls in any Mitopia® code other than the wrappers.

Packages should provide a complete suite of functions to manipulate the abstraction to which they relate, all those functions (public and private) appearing within a single source file.  Externally defined structure types should be avoided if at all possible.  Instead provide additional accessor functions to get at fields hidden within structures referenced via an abstract reference.  Publishing structures is a sin and inevitably leads to problems down the line with client code directly accessing the structure which may of course change later.  Better to hide the data behind the package accessor functions.  The entire Mitopia® code base publishes less than 100 structure types publicly (that is substantially less than one per Mitopia abstraction package) despite having many times that defined and used internally.  On the other hand, there are literally thousands of public API calls grouped into packages.   If data structure is to be published, use the ontology.

Mitopia® code tends to be very dense in terms of the number of functions called compared to the kind of code one might find say in an open source project.  The snippet below is typical:



As can be seen the code utilizes abstraction functions from a variety of packages to do virtually everything, there is generally very little actual complex manipulation done locally.  The function above is part of the implementation of the MitoQuest database server and is running within the servers, however, because even Mitopia’s database abstraction is built on the same fundamental underpinnings, the code looks identical to what a client side function might be doing and calls all the same kinds of packages.  When combined with the other coding standards above, this tends to make all Mitopia® code look similar and instantly recognizable as to its purpose.  This in turn makes all code easy to comprehend and maintain in a way that I would argue is far more effective than the kinds of ‘placebo’ organizing metaphors offered by standard programming languages.