Ontology software approaches are 'add ons', distinct from actual data

Mitopia’s ontology description language (ODL) is called ‘Carmot’ (for more details on Carmot syntax see here). For a long time alchemists believed that a key component of the legendary “philosopher’s stone” was the mythical element carmot. The philosopher’s stone, it was believed, had the power to transform base metals to gold. In an information sense, Mitopia’s Carmot ontology software and ODL is the key component that allows Mitopia® to accomplish the unique things it does in transforming information into knowledge (see here). Carmot declarations are normally distinct from the Carmot code that uses them and for this reason, the Carmot language actually has two variants, Carmot–D (for declarations – this is the ODL variant discussed in this post), and Carmot–E (for execution – the language for Carmot-based interpreters within Mitopia®). Carmot–E will be discussed in later posts. The ‘D’ and ‘E’ variants are usually dropped in use and both forms are referred to simply as Carmot. See later posts for details of the Carmot syntax.

This post is part 2 of a sequence of 3. To get to the start click here.

Preceding posts have described the motivations behind both the semantic approach to ontologies (see here) and that used by Mitopia® and it’s Carmot ODL (see here and here). In effect we have to go back to our fundamental definition of what ontology software is, and recognize that what makes an ontology different from a taxonomy is simply the focus on supporting and handling relationships and connections between data, in addition to field content. The representation, discovery, and manipulation of relationships must be entirely based on the ontology, not on knowledge latent in the application code.

Ontologies in a software engineering sense need not have anything to do with linguistics or sentence understanding, they are simply a higher level of organization for data on the knowledge pyramid (see here). A taxonomic or information level system uses a language and programming model(s) that focus on field content, which essentially means that any relationship knowledge must be embedded in the application code, and hence rigid and hidden from examination. Both approaches, Semantic ontologies (e.g., OWL), and Carmot, meet this definition of an ontological system, and yet they are fundamentally different. We will therefore have to invent some new language for defining the two types of ontology definition languages:

A contiguous-model ODL (of which Carmot is the only example) is one for which the ontological aspects of the language are integrated directly with the normal programming data model, that is, they can occur directly within the programming language used to access data held in that ontology, and furthermore where accessing code uses the same programming data model for all data be it in memory, on disk, or in a database. The ontological aspects are thus ‘contiguous’ with all other aspects of accessing data including binary compatibility with type declarations from the underlying platform headers.

A disjoint-model ODL (all other ODLs, of which OWL is just one) is one in which the ontological aspects are functionally and syntactically separated from the details of program access to and manipulation of data, be it in memory, disk, or in a database. All semantic ontologies are disjoint and make no attempt to unify programming models, the ontological aspects are ‘disjoint’ with normal program data access.

There are some fundamental differences in features and benefits between contiguous ontologies (i.e., Carmot), and disjoint ontologies (e.g., OWL), as summarized by the Table below:

Feature	Carmot/Mitopia®	Semantic (OWL)
Run-time discovery of types, fields and links.	YES	YES
Unifies in-memory and in DB data access and programming models	YES. All operations are unified through Carmot APIs (TypeMgr, TypeCollections, etc.).	NO. Does not address either in memory (binary) form, or how the data is stored and accessed in a DB. Simply a text and semantics interchange format.
Unifies in-memory and in GUI programming models	YES. All operations are unified through Carmot APIs (TypeMgr, TypeCollections, etc.).	NO. Does not formally address the mapping to GUI layout although third party tools exist for this purpose.
Supports Multilingual Text Understanding.	PARTIAL. Mitopia® uses the names and aliases of persistent data to automatically identify known items in multilingual text, but it does not directly contain linguistic code to parse sentence structure using the ODL.	PARTIAL. All semantic ontologies are directed at the problem of text understanding, although since they are linguistic, they only work in one language (English).
Binary data and structures supported.	YES. Functional superset of C.	NO. Does not address storage. All operations are textual. Ontology is distinct from programming model.
Direct language support for logic and reasoning.	PARTIAL. Using Carmot, data-flow based widgets can navigate through links and examine types to perform reasoning, but this ability is not formalized into the syntax of the ODL.	YES. Although an external technology is required to map to first-order logic and implement the actual reasoning. Not many convincing examples of actual complex reasoning using OWL.
Designed for performance and scaleability (distribution).	YES	NO. Text-based and thus slow to handle and share.
Syntax compatible with standard programming language.	YES. Based on extensions to C.	NO. The ontology and tools to use it represent yet another incompatible programming model/language with a massively different syntax.
Associate scripts and behaviors with ontological types.	YES	YES
Support for Database auto-generation and query.	YES	NO
Support for GUI generation and handling.	YES	PARTIAL. Using 3rd party tools.
Built-in & unified programming model for manipulating collections of related data.	YES. Based on the TypeCollections area of the Carmot API.	NO. Implementation detail how references are resolved and unified.
Web Standards Based, Open Source.	NO	YES
Automatic data migration when ontology changes.	YES. Mitopia’s Types server handles this through Carmot automatically when old data is accessed, regardless of source (disk, communications or DB).	NO. Still a subject of research.
Integrated support for federation and multimedia data and containers.	YES. The MitoPlex™ framework provides this with MitoQuest™ handling most non-multimedia types and fields.	NO. Only textual data is covered by the ODL.

As mentioned above, there are in fact two distinct parts to the Carmot language. The Carmot-D variant discussed in this post is a language of data and type declaration and subsequent dynamic run-time discovery. The Carmot-E variant is the run-time executable language and environment, which is utilized in many aspects of Mitopia® (e.g., MitoMine™) to access and compute using data described by the ontology. Thus we see that unlike all conventional languages, which combine the declaration and execution aspects of the language, Carmot takes a unique ‘split’ approach to language definition, and it is perhaps important to point out why this is so.

The first and foremost reason for the split is that as stated earlier, Carmot is designed to allow code to discover all the required data and data types as run-time as opposed to compile-time, in support of a data-flow and data-driven system. The fact that conventional languages allow and indeed encourage intermingling of type and data declarations with the executable code that manipulates them, is to a very real extent encouraging programmers to create software that contains fragile embedded assumptions that cross the data-code membrane and thus contribute to the Bermuda Triangle effect. For this reason, the Carmot-D language was implemented as a pure declaration language and all the Carmot APIs and abstractions were built upon it independent of the need for a run-time executable language.

The second requirement that drove the development of the Carmot-E variant was the need to match ‘impedance’ (see post here) between the Carmot-D ontology of the system and the taxonomies/formats of the large numbers of sources that must be combined into an ontological representation in order to perform useful analysis. This interface between source formats and the ontology must be accomplished by allowing source data to drive the conversion state without explicit hard-coded knowledge in the conversion layer of the input or indeed the output formats. This realization in turn led to the concept of entangled parsers (i.e., nested parsers where each can influence the other) and the patented mechanism to accomplish this feat. MitoMine™is the premier example of this approach. The key aspect of such entangled parsers is that the order of execution of statements in the ‘inner’ parser is determined not by the order they occur in the script, but rather by the source data itself, as reflected in the evolution of the ‘outer’ parser state. We refer to such unique and unusual languages as ‘heteromorphic’ languages (we will talk more about this concept in future posts). There is a massive cognitive difference between the programming model of a conventional language, and that of a heteromorphic language. Learning to embrace this approach can be difficult for those trained in conventional programming languages.

Since in all cases within Mitopia®, either the source, or more often the target, of any conversion is data described by the Carmot-D ontology, it was clear that the Carmot-E language should be defined as reuseable component, just like Carmot-D. The fact that Carmot-E program state is driven by source data format and content, and not the programmer’s ‘algorithm’, and that this occurs as a result of many small isolated snippets of Carmot-E code rippling to the top of a parser stack, means that the Carmot-E language has little use for type declarations of its own, discovering all needed types dynamically from the Carmot-D type information.

In conclusion then, it is clear that the splitting of the underlying Carmot language into two distinct aspects, one handling type declaration, and the other type access and manipulation, was a necessary step in the creation of a truly data-driven environment such as Mitopia®, and in encouraging the development of truly adaptive code.

This post is part 2 in a sequence of 3. To get to the next post click here. The first post is here.