Hidden tagged union concepts underly OOP, so do we really need OOP?

In an earlier post (here) lamenting blind unthinking use of Object Oriented Programming and espousing OOP alternatives, I made the following statement:

“What in my opinion is the greatest feature of OOP? It is the ‘tagged union’ construct used by a ‘class’ to handle inheritance and polymorphism. Regrettably however, this trick is hidden within the ‘implementation’ and wholly unknown and unappreciated by most OOP programmers who think it some magical feature conferred by the class concept. The construct should instead be exposed for overt and creative use within the ‘box’. Were this to be done, many might see OOP languages for the thin facade that they really are. I will deal with this concept in later posts.“

With an election coming up tomorrow, and by way of illustrating that there are OOP alternatives that yield the same or even greater benefits, I though now might be a good time to make good on that earlier promise and detail what it means to have explicit knowledge of the ‘state of the union’. So in this post I will examine the tagged union construct (see here) in more detail and show how exposing it directly within an ontology-driven environment like Mitopia® leads to a number of tangible benefits.

Interestingly, a Dr. Dobbs article entitled “Discriminated Unions” appeared on October 12 this year. The article mentions the use of tagged (they call them ‘discriminated’) unions as an alternative to buying wholesale into OOP languages. It is encouraging that others are starting to peer behind the OOP curtain and recognize the independent value of tagged unions. However, when one combines tagged unions with a system like Mitopia® where the GUI and persistent storage are driven directly from a contiguous ontology (see here), the results are far more dramatic than others may yet have perceived.

At tagged union is basically a normal union construct with an extra field that holds the current valid union member, thus allowing code to determine which form of the union is valid at any given time. Without a tag, the union construct is generally relegated to situations where it must be used to save memory, and safe access is the responsibility of client code. In conventional languages, the advantage of a tagged union over an untagged union is that it is possible to make all accesses ‘safe’.

First, a brief history of tagged unions and the languages that explicitly support them:

Algol 68 supported a tagged union construct known as ‘united modes’. During the 1970’s and 1980’s, tagged unions occupied a central role primarily in functional languages such as ML and Haskell. This usage is clearly the anticedent of OOP’s use of hidden tagged unions to implement the virtual method lookup used to implement class hierarchies. An object’s constructor sets the hidden union ‘tag’ which thereafter remains constant throughout the object’s lifetime. This is the full extent to which OOP languages use tagged unions, regrettably they do not formally expose the construct.

In Pascal, Ada, and Modula-2 tagged unions are called variant records, and are a formal part of the language allowing an error to be raised whenever an attempt is made to access a member of the union that does not match the discriminator value.

In C and C++, though the languages do not explicitly support tagged unions, it is a simple matter to construct one from an untagged union by adding an extra discriminant field to the surrounding structure, and then ensuring that all code that uses the union enforces correct access by explicitly switching based on the discriminant value in effect at the time. Of course this requires a very high degree of rigor in the code, and in practice this construct is rarely used. One advanced dialect of C called Cyclone (released in 2006) actually has explicit support for tagged unions. Finally the Opa language (first presented in 2010) has support for polymorphic tagged unions.

Other than the languages described above however, tagged unions are a rare (or hidden) construct unavailable to programmers.

Mitopia’s Carmot language (which is an extension of C) has full support for tagged unions, allowing for the safe transmission of binary data structures containing unions which otherwise is not possible; a serious flaw with current programming languages. The reason binary structures containing unions cannot be directly transmitted in current languages is that the computer nodes in a network might have different byte orderings (e.g., bigendian/littleendian) and without explicit knowledge of the tag variant this would render data within unions garbage following binary transmission, thus making unions useless within a data-flow environment.

In Carmot, tagged unions also play a key role in leveraging the ontology definition to specify and control UI and database functionality. The Carmot type ‘Control’ below illustrates within it two common uses of tagged unions:

explicitly exposing the tagged union in a data driven OOP alternative

Here we see the definition of two distinct tagged unions within the Control type where both use the same ‘controlType’ field as the discriminant. In the case of the type Control, this allows us to create multiple different types of control, each with custom storage for control-specific fields, and all within a single type ‘Control’. In other words what we have done is much like the OOP approach of declaring a class ‘Control’ and making each different control type a sub-class of ‘Control’ with additional methods and data values – in this regard, we have a viable OOP alternative. The basic differences between the Carmot approach and OOP’s class system are thus:

Carmot exposes the tagged union directly thus allowing it to be leveraged and manipulated explicitly in all code that accesses the type. OOP hides the tagged union and requires using code to become a sub-class, thereby forcing program structure onto client code through the class metaphor rather than allowing client code to use any programming structure desired. In OOP big brother defines how your program is organized. In Carmot, you do.
Carmot limits inheritance to the data structure, not to the code classes. This is critical in a data-driven system as any data-specific knowledge that is frozen into the code (as in OOP) cannot be sent over data-flow links, and of necessity makes the program more vulnerable to change and less flexible. Carmot limits inheritance to data, while providing the entire substrate necessary to describe, discover, and manipulate that data so that circumvention through pointers (or even getting the address of data) is no longer possible. In this way we get all the benefits of OOP (and more) without letting it drive the programming metaphor. Overt support for tagged unions is an essential step to eliminating data-specific knowledge from leaking into the code. OOP takes the opposite tack and forces leakage into code through the class methods.
The Carmot substrate internally tracks the ’emptiness’ state of all fields within union variants. Unlike OOP, the discriminant value can be changed dynamically during the program run, it is not frozen by the object’s constructor. Any attempt to access invalid (or empty) members will be flagged as an error.
Carmot’s overt declaration of the tagged union concept within the ontology allows it to be discovered at run-time and used to auto-generate GUI and database behaviors all within the substrate, a thing not possible if sub-class code were allowed to be involved.
Since Carmot data is discovered at run time and its underlying byte-ordering can always be determined, this allows binary data structures to be sent across networks with heterogenous byte ordering settings and automatically (and transparently) corrected to local byte ordering at the receiving end. The same is true for binary data written to/from files. Without the overt tagged union construct this would not be possible for any data involving unions. For the same reason, un-tagged union data cannot be displayed in auto-generated GUI’s thus dragging the programmer back down into the Bermuda Triangle effect.

As can be seen from the declaration of ‘Control’ above, in Carmot a tagged union is declared by repurposing C’s ‘switch’ keyword to specify the (integer or enumerated) discriminator/switch field (expressed as a string) in the surrounding structure whose value determines the current union variant. The first member of a tagged union is active or selected when the content of the specified field has the value 1, the second member when the switch field has the value 2 and so on. If the specified switch field is undefined (i.e., is ‘empty’) or has a value outside the range given by the number of tagged union variants (including of course the value 0), then the active variant is undefined and thus the situation is similar to that of an un-tagged union which means that the union itself cannot be displayed in a GUI.

Any attempt through any Carmot-related APIs to access an element within a tagged union that is not within the currently active member, will fail and report an error. This means that Carmot provides completely safe access to data within tagged unions and that there is no possibility of accidental program access to an invalid member. Within an un-tagged union, the situation in Carmot is identical to that in any other language, that is any access to the content, regardless of the member involved, is allowed, even if this may be erroneous. However, data within such untagged unions is never displayed in the GUI, and cannot in general be reliably transmitted across a heterogenous network. For this reason, the use of un-tagged unions within ontological types is always avoided, although untagged unions exist in the underlying OS C include files (which Carmot must also compile and can reference), and thus must be supported by the Carmot language.

If the field string in the ‘switch’ clause starts with a leading ‘.’ character, it is assumed to be a field path all the way from the outermost containing type in which the tagged union field occurs (less the leading ‘.’ which is stripped before attempting to access the switch field). If the field path string does not contain a leading period, then it is assumed to be a field name/path that starts at the same level within the type definition as does the tagged union field itself.

Let us now consider how a tagged union might be represented in automatically generated GUI’s driven off the Carmot ontology definition for the type involved. As it turns out, tagged unions have the power to automate many of the kinds of GUI maintenance tasks that tend to cause the Bermuda Triangle effect and implicitly tie GUI code to compiled data declarations (thus laying the seeds of future maintenance nightmares).

The screen shot above shows a portion of the auto-generated GUI (in this case being used to examine the content of a GUI window itself – also described by the ontology). In the screen shot above, the user has just added a new control called ‘just testing’ and has not yet set the value of the ‘controlType’ field. As a result, both the ‘modes’ and the ‘_more’ union variants are currently undefined and so the auto-generated GUI does not show any UI controls associated with either of them. This is exactly as one would wish. Choosing ‘Button’ from the ‘controlType’ popup menu results in the following display:

In other words, now that the substrate knows the tagged union variant, it is able to display all the additional (as yet uninitialized) GUI controls associated with the selected variant. It should be obvious that this kind of adaptive and changing GUI display based on the current setting of some field or value is basically a major fraction of the work that GUI programmers must do. In this case however, the substrate can automate all this entirely through run-time data discovery using the ontology.

Above we see the GUI displayed when the ‘More’ tab (caused by the ‘_more’ union) is selected. In the case of a button, the content is very simple and there is no sub-structures within the variant so the two simple fields within are displayed directly. Note that the GUI does not display the name of the union variant (in this case ‘button’) since this is implicit in the discriminator setting.

In the screen shot above, the user has chosen to inspect an existing ‘static text’ control (i.e., controlType = staticText) and has then clicked on the ‘modes’ popup menu to inspect the settings. Without going into all the details of how various field types are displayed in the auto-generated GUI, we can see that because the ‘modes’ field is declared as an enumerated type, it is displayed as a popup. Further, because the “$MultiSelect” is associated with the type ‘ControlModes’, the GUI allows multiple values to be selected simultaneously as shown in the screen shot. This is appropriate since ‘modes’ is actually a set of option bits that can be applied to the control. As can be seen, the content of the modes popup is derived from the Carmot definition of ‘ControlModes’, but the additional ‘show border’ option associated just with the ‘staticText’ tagged union variant is also displayed. This illustrates a very useful application of the tagged union concept in that the ‘modes’ union within the type ‘Control’ defines each variant using a different enumerated type (in this case ‘StaticTextModes’) that is descended from the ancestral type ‘ControlModes’ and hence displays all enumerated values for ‘ControlModes’ as well as any additional values defined in ‘StaticTextModes’ (or any intervening ancestral types). The result is that the GUI automatically re-configures itself (no application code involved) to show just what options are appropriate based on the current setting of the ‘controlType’ field.

Normal programming metaphors that do not permit auto-generation of GUI’s by run-time type discovery, force the application code to handle such customization of sub-panes within user interfaces. In the OOP case, each sub-class would contain distinct code to handle its own unique GUI aspects. This kind of intimate link between data structures and the code that must drive the GUI associated with their display can be found pervasively in any conventional code that displays data. This unavoidable behavior is responsible for tying both the GUI and database together through code and I would argue one of the main reasons for seeking OOP alternatives. The result is the Bermuda Triangle effect we have discussed previously. As can be seen, no such tie-in occurs within a system such as Mitopia® that is driven by a contiguous ontology like Carmot.

To clarify this, the screen shot above shows the same situation in the case that the user selects a list control. As can be seen the available options when one clicks on the ‘modes’ popup are completely different and are driven by the type inheritance for the enumerated type ‘ListViewModes’. Note that the existence of the ‘modes’ union is completely invisible in the GUI, all that is displayed is the inner content of the selected union member as determined by the current setting of the ‘controlType’ discriminator field. Again this is because none of the members within contain any sub-structures so the ‘modes’ union can be hidden completely and replaced by a single popup menu titled modes. Once again this is a valuable GUI trick handled directly by the substrate and driven by the ontology.

In the screen shot above, the user has clicked on the ‘More’ tab of the static text control. Because this is a tagged union, the GUI automatically displays a custom UI driven by the content of the ‘staticText’ member of the tagged union. Because there is sub-structure within (in this case a color specification structure), the auto-generated GUI shows two sub tabs, one for the direct fields within ‘_more’, another for each sub-structure within. The process recurses indefinitely for deeply nested structure and unions within unions.

Finally in the screen shot above, we see the same situation, this time for the much more complex content of the ‘_more’ union variant associated with a list control. Note that the ‘_more’ union starts with an underscore because Carmot conventions are that fields beginning with ‘_’ are hidden in the GUI when possible. In the case of a tagged union, this means that if certain control types do not have any visible content within the ‘_more’ variant, then the ‘More’ tab itself is not even shown. Once again this helps to de-clutter the UI as much as possible. It should be clear that the human-readable control labels shown in the GUI are derived automatically from the actual field names by a simple set of operations on the CamelCase field names themselves.

In the examples above we have examined the use of tagged unions to drive complex GUI behaviors directly from the ontology definition of the types involved. In fact there is a great deal more to the things that can be accomplished in the way of automatic GUI behaviors using tagged unions, however, for the sake of brevity we will leave it there.

Clearly, since all access to data goes through the Carmot ontology and is implemented by the Mitopia® substrate, it is a trivial matter to ensure that any attempt to access invalid union variants will result in an error report. This eliminates the major criticism of tagged unions as implemented in an ad-hoc manner using non-supportive languages such as C or C++. The GUI screen shots above also illustrate that the substrate (and of course any client code) has full access to the information necessary to determine union state explicitly, in particular separate tracking of ’emptiness’ for fields and structures within union members.

This power of course extends to the behaviors of Mitopia’s database abstraction when querying fields within tagged unions. When issuing queries on tagged union member fields, only those persistent records for which the discriminator field is set appropriately are searched, this is enforced during query construction by automatically inserting an addition query condition on the discriminator field value. The result is that union fields can be safely queried, a feature that is not available in other more limited data models.

The discussion above hopefully illustrates the power of the tagged union construct to provide advanced features automatically through a contiguous ontology. The benefit of the approach over the OOP encapsulated view of tagged unions should I hope be clear. In future posts we will discuss in more detail other aspects of GUI auto-generation that we have glossed over in this post. The ultimate goal in all of this is to make the system rapidly adaptive to change by driving all behaviors off the ontology definition, thus eliminating the drag caused by type dependent code typical of other past and current system approaches.