Carmot ODL (2) - '#' - Mitopia Technologies

This post is number 2 in a sequence of 7. Click here to get to the beginning.

The ‘#’ symbol – Persistent ref.

The persistent reference symbol ‘#’ is used to denote a one-to-one reference from a field within an ontological type to another record of the same or different type that ultimately is stored within Mitopia® servers and can be referenced by a globally unique ID. The actual implementation of a persistent reference record is hidden by the abstraction layer from Carmot code, but is essentially as follows:

The ‘stringH’ field of ET_PersistentRef is an internal implementation detail used exclusively by MitoMine™ and is beyond the scope of the current discussion. Looking at the definition to the type ET_UniqueID, note that it contains two fields, a 32-bit system identifier held as a 4-character constant ‘system’, and an unsigned 64-bit ‘id’ field which holds the unique ID of the item within the current Mitopia® system. It is critical that the size and alignment of these two fields exactly match the two fields that begin the declaration of Datum, the root type for all persistent data (see the Datum declaration earlier in the ‘@’ reference post – here). This allows Mitopia® to be sure that it can obtain a unique ID for any data by treating the start of the data as if it were an ET_UniqueID, a feature which becomes critical when data is represented by proxies, ET_Hit records (see below) or other abbreviated forms down to and including just the unique ID itself.

The purpose of this two-part 96-bit ID is twofold. Firstly the fact that a complete ID is 96-bits long ensures that there is no possibility of running out of unique identifiers regardless of how large a system or system of systems might become. The ‘system’ field allows each unique Mitopia® installation and all the data within it to be distinguished from any other system, thereby allowing data to be exchanged between systems as they are later integrated into meta-systems or systems of systems. Within any given system, the contents of the ‘system’ field is normally defaulted to zero when creating new identifiers. Zero is interpreted throughout Mitopia® as meaning “the current system” whatever the local system identifier might actually be. This is why the ‘hostID’ field of Datum will almost always be displayed as zero when you examine any persistent record within a Mitopia® user interface. Whenever data is passed out of one system and into another, all zero ‘system’ fields within the data are automatically set to the local system identifier so that when the data is dereferenced in any other system, the system identifier can be used to automatically route the request back to the correct source system. System identifiers must be assigned by or registered with MitoSystems to ensure that there can never be a conflict in the future event that originally distinct systems begin to talk to each other. The 64-bit local system ‘id’ field is large enough to hold any conceivable number of data items that might be contained within a single system. Mitopia® attaches special significance to any ‘id’ field for which the most significant bit is set (i.e., which is negative if considered as a signed value), all such ‘id’ values are reserved to signify temporary IDs, not permanent IDs. More on this in later posts. The ‘id’ field of the type ET_PersistentRef holds the unique ID of the referenced item as an ET_UniqueID.

When held in persistent storage, the value of all persistent reference fields in any data record will have zero in the ‘stringH’ and ‘elementRef’ fields so that when the record containing the persistent reference is first fetched into an in-memory collection as part of client activity (perhaps as a result of a query), this is also true. Whenever client Carmot code (or APIs) attempt to access the item referenced by the persistent reference, the abstraction layer examines the ‘elementRef’ field of the ET_PersistentRef and if it is non-zero (which initially will not be true), then this is taken to mean that the referenced value has already been fetched from storage into the local collection and so to access it the abstraction simply follows the ‘elementRef’ value to the referenced structure. If on the other hand ‘elementRef’ is zero, this implies that the value must be transparently fetched by the abstraction layer into the local collection and then the ‘elementRef’ field updated to reference the newly fetched data item. To fetch items from storage, Mitopia® requires them to be requested using a standard structure known as and ET_Hit which is also used to return hit lists in response to a query. The ET_Hit structure is as follows:

Note that this structure also begins with the fields of an ET_UniqueID which means that it is a complete substitute for the actual data value. The ‘_relevance’ and ‘_reference’ fields are used as part of server querying and are not significant for the current discussion. It is a convention in Mitopia® auto-generated GUIs that any field starting with an underscore ‘_’ character is not displayed in the UI. The field of the type ET_Hit are thus clearly invisible, but you will also find padding and filler fields in the base ontology starting with underscore, and these fields will not appear in the UI.

To fetch the data item referenced by an ET_PersistentRef, Mitopia® fills out the first two fields of the ET_Hit with the unique ID value from the persistent reference. The ‘_type’ field of the ET_Hit is used by the Mitopia® architecture to route the query request to the appropriate server that holds the data since, as described later, server topology and data content is tied directly to the system ontology types.

The ‘aTypeID’ field of the persistent reference is constrained by the architecture to be equal to or descendant (directly or indirectly) from the type that the persistent reference field is declared to reference in the Carmot declaration. This means that persistent references can be to an item of the declared field type or any descendant of that type, they cannot be set to reference data of any other type. When resolving the reference, Mitopia® first examines the ‘aTypeID’ field of the persistent reference and if non-zero, transfers that value to the ‘_type’ field of the ET_Hit. If the ‘aTypeID’ field of the persistent reference is empty, Mitopia® sets the ‘_type’ field of the ET_Hit to the referenced type id in the original Carmot field declaration. The ET_Hit is then used to fetch the actual item value via the MitoPlex™ layer and the resultant value is added to the local in-memory collection, and the ‘elementRef’ field of the persistent reference is updated to contain the appropriate offset to the record.

The logic applied to set up the ‘_type’ field when fetching the value from persistent storage is particularly important, and is related to the Carmot concept of type inheritance.

Looking again at some of the fields of the declaration of the root ontological type Datum, we see two persistent reference fields in the code snippet above namely ‘#source’ and ‘#language’. In the base ontology, the type ‘Language’ describes a spoken/written language. In the case of the ‘#language’ field, this is the language of the text in the fields of the record concerned (e.g., English, Arabic etc.).

The type ‘Language’ in the base ontology has an associated key data type ‘LANG‘, and is itself descended ultimately from the type ‘Datum’ via ‘Observation’. There are no additional types descended from ‘Language’ in the base ontology. This means that regardless of whether the ‘aTypeID’ field of the ‘#language’ reference is set or not, the result after following the logic described will be to set ‘_type’ when fetching the actual value from persistent storage to be equal to the type ID of the type ‘Language’. If there is a server associated with the key type ‘LANG’, the data request when resolving the ‘#language’ field will be sent to the ‘LANG’ server, otherwise Mitopia® will scan up through the inheritance chain for the type ‘Language’ until it finds a key type with an associated server and route the request to that server. In most installations, there are no intermediate servers set up between the key type ‘LANG’ and the key type ‘DTUM’ (i.e., Datum) which is always required to have an associated “catch all” server. It is generally the case then that ‘Language’ records will be resolved in, and fetched from, the ‘DTUM’ server.

In contrast to the situation for ‘Language’, the type ‘Source’ has many descendants in the base ontology as shown to the right, and some of these types themselves have associated key data types for example:

Source – ‘SRCE’
Query – ‘QERY’
ContentQuery – ‘CQRY’
ConnectionQuery – ‘XQRY’

As will be described in detail in later posts, each of these descendant key types might potentially have an associated server that is quite distinct from the server containing ‘Source’ data and to make data fetches efficient, Mitopia® needs to know the descendant type to which the persistent reference refers in order to correctly set up the ‘_type’ field when fetching referenced data from servers. This is the essential purpose of the ‘aTypeID’ field in an ET_PersistentRef structure, that is, it identifies the descendent type (if applicable) of the declared field type that is actually being referenced in order to make reference resolution efficient.

The effect of this need to specify the descendent type can be clearly seen during manual data entry when for example, clicking on the type selector popup associated with entering the ‘#source’ field results in the popup menu tree shown to the left. This popup is driven directly and automatically off the types hierarchy. This forces the user to pick the exact type for the reference that he wants to make. During MitoMine™ and other automatic data and reference creation processes, the ‘aTypeID’ field is automatically set up to match the actual type referenced as the data is created.

If the ‘aTypeID’ field of the persistent reference is not set up for some reason, Mitopia® is forced to issue the data resolution request to all servers that are descendent from the type of the referencing field in order to fetch the result, and this can result in wasted time in servers that are not involved, and potentially a slowdown in the resolving process.

The final field of the ET_PersistentRef structure that we have not discussed yet is the ‘ char @name’ field, which, as the name suggests, holds a relative reference to the text string giving the actual name of the item being referenced (i.e., the string that matches the content of the ‘@name’ field of the referenced persistent record). In certain cases where the domain is quite limited (an example might be the type Language discussed above), and thus names are not ambiguous, it is sufficient to reference an item simply by type and name, however, in most cases names are not sufficiently unique, and the name field is associated with the persistent reference for the simple reason that this allows the name of the referenced item to be displayed in the referencing record (as can be seen with the text string “English” being displayed in the ‘#language’ field in the screenshot above). If the referenced item name were not carried around with the referencing field, the system would be forced to resolve all referenced items prior to displaying the referencing record in order to determine their name so it could be displayed to the user. This would impose an unacceptable performance penalty on the user interface.

When the item is actually resolved using the unique ID and type, it is sometimes the case that the actual item name and that used by the reference may be be different (perhaps the reference is to a person using only his initials, whereas the actual record contain the full person name spelled out). If there is a difference between the name stored in the reference and that fetched when the item is actually resolved (perhaps by clicking on the associated hyperlink button), the displayed name in the referencing record is automatically updated to match the true name from persistent storage.

If Mitopia® is forced to resolve a persistent reference containing only a name, it has to issue a query to the appropriate server(s) requesting all items where the name field matches the referencing name. If more than one such item exists, the system simply picks the first one. Clearly then, reference by name is highly unsafe, and indeed once a unique ID is available, the name is never passed to the server(s) as part of the resolution process and is therefore truly only present for display purposes. In fact, the name in the “char @name” field is a relative reference to a structure type of ‘kStringRecord’ and not ‘kSimplexRecord’ as implied by the ‘@’ symbol shown in the declaration of ET_PersistentRef above.

In database terminology, the ‘#‘ persistent reference field is equivalent to a one-to-one reference and this is how Mitopia’s ontology implements such references within the servers. In the next post we will examine the one-to-many reference, implemented in Carmot using the ‘##‘ collection reference syntax.

Click here for the next post in this sequence. Click here for the previous post.