Data Integration – the old way

DataIntegrationFormally speaking, data integration involves combining data residing in different sources and providing users with a unified view of these data.
 
This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example) domains. Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. In management circles, people frequently refer to data integration as “Enterprise Information Integration“.
 

Background

 
Data Integration is normally distinct from Data Mining which formally is the process of scanning the ‘integrated’ data looking for new patterns and presenting/visualizing them for human interpretation.
 
The historically distinct concepts of data integration and data mining were defined in terms of systems operating at the information level only (i.e., built upon relational databases).  However, in the Mitopia® context, because the data being ingested is simultaneously being transformed from the various data/information level source formats/taxonomies into the underlying knowledge-level system ontology, thus moving to a whole new tier of the knowledge pyramid, we tend to use the term data mining interchangeably with data integration to signify the extraction of ‘useful’ nuggets of knowledge from the low grade data/information level substrate/ore in which they reside.  This refinement/extraction process is more akin to what one would term ‘mining’ than it is to ‘integration’, even though both processes are occurring simultaneously.  Transformation from diverse taxonomies into a single unifying and integrating ontology in and of itself reveals new patterns and relationships without any additional steps being required beyond display.
 
DataWarehouseIssues with combining heterogeneous data sources under a single query interface have existed for some time. The rapid adoption of databases after the 1960s naturally led to the need to share or to merge existing repositories. This merging can take place at several levels in the database architecture. One popular solution is implemented based on data warehousing. The warehouse system extracts, transforms, and loads data from heterogeneous sources into a single common query-able schema so data becomes compatible with each other. This approach offers a tightly coupled architecture because the data is already physically reconciled in a single repository at query-time, so it usually takes little time to resolve queries. However, problems arise with the “freshness” of data, which means information in warehouse is not always up-to-date. Therefore, when an original data source gets updated, the warehouse still retains outdated data and the ETL process needs re-execution for synchronization. Difficulties also arise in constructing data warehouses when one has only a query interface to summary data sources and no access to the full data. This problem frequently emerges when integrating several commercial query services like travel or classified advertisement web applications.
MediatedAs of 2009 the trend in data integration has favored loosening the coupling between data and providing a unified query-interface to access real time data over a mediated schema, which means information can be retrieved directly from original databases. This approach may need to specify mappings between the mediated schema and the schema of original sources, and transform a query into specialized queries to match the schema of the original databases. Therefore, this middleware architecture is also termed as “view-based query-answering” because each data source is represented as a view over the (nonexistent) mediated schema. Formally, computer scientists term such an approach “Local As View” (LAV) — where “Local” refers to the local sources/databases. An alternate model of integration has the mediated schema functioning as a view over the sources. This approach, called “Global As View” (GAV) — where “Global” refers to the global (mediated) schema — has attractions owing to the simplicity of answering queries by means of the mediated schema. However, it is necessary to reconstitute the view for the mediated schema whenever a new source gets integrated and/or an already integrated source modifies its schema.
 
As of 2010 some of the work in data integration research concerns the semantic integration problem. This problem addresses not the structuring of the architecture of the integration, but how to resolve semantic conflicts between heterogeneous data sources. For example if two companies merge their databases, certain concepts and definitions in their respective schemas like “earnings” inevitably have different meanings. In one database it may mean profits in dollars (a floating-point number), while in the other it might represent the number of sales (an integer). A common strategy for the resolution of such problems involves the use of ontologies which explicitly define schema terms and thus help to resolve semantic conflicts. This approach represents conventional ontology-based data integration.

Although it represents the current trend in classical data integration, we can ignore the mediated schema approach henceforth because it is predicated on the fact that the information remains in the original sources/repositories, which in turn implies that it remains limited in its expressiveness to the ‘information level’.  The creation of connections across distinct sources, which is of course the fundamental operation of a knowledge level system, is difficult or impossible in this architecture and even if achieved, resultant performance tends to be very disappointing.  Such systems are also very difficult to adapt in the face of change which is of course pervasive for any non-trivial system integrating external ‘unconstrained’ sources.
 
Extract Transform Load (ETL)
 
ETL
 
The term ETL which stands for extract, transform, and load, is a three-stage process in database usage and data warehousing. It enables integration and analysis of the data stored in different databases and heterogeneous formats. After it is collected from multiple sources (extraction), the data is reformatted and cleansed for operational needs (transformation). Finally, it is loaded into a target database, data warehouse or a data mart to be analyzed. Most of numerous extraction and transformation tools also enable loading of the data into the end target. Except for data warehousing and business intelligence, ETL tools can also be used to move data from one operational system to another.
 
Extract – The purpose of the extraction process is to reach to the source systems and collect the data needed for the data warehouse.   Usually data is consolidated from different source systems that may use a different data organization or format so the extraction must convert the data into a format suitable for transformation processing. The complexity of the extraction process may vary and it depends on the type of source data. The extraction process also includes selection of the data as the source usually contains redundant data or data of little interest.  For the ETL extraction to be successful, it requires an understanding of the data layout. A good ETL tool additionally enables a storage of an intermediate version of data being extracted. This is called “staging area” and makes reloading raw data possible in case of further loading problem, without re-extraction. The raw data should also be backed up and archived.
 
Sources may be structured (e.g., relational databases) or unstructured (e.g., documents).  Typical unstructured data sources include web pages, emails, documents, PDFs, scanned text, mainframe reports, spool files etc. Extracting data from these unstructured sources has grown into a considerable technical challenge.  Whereas historically data extraction has had to deal with changes in physical hardware formats, the majority of current data extraction deals with extracting data from these unstructured data sources, and from different software formats. This growing process of data extraction from the web is referred to as Web scraping.
The act of adding structure to unstructured data takes a number of forms.
  • Using text pattern matching such as regular expressions to identify small or large-scale structure e.g. records in a report and their associated data from headers and footers;
  • Using a table-based approach to identify common sections within a limited domain e.g. in emailed resumes, identifying skills, previous work experience, qualifications etc using a standard set of commonly used headings (these would differ from language to language), e.g., Education might be found under Education/Qualification/Courses;
  • Using text analytics to attempt to understand the text and link it to other information
Transform – The transform stage of an ETL process involves application of a series of rules or functions to the extracted data. It includes validation of records and their rejection if they are not acceptable as well as an integration part. The amount of manipulation needed for the transformation process depends on the data. Good data sources will require little transformation, whereas others may require one or more transformation techniques to meet the business and technical requirements of the target database or the data warehouse. The most common processes used for transformation are conversion, clearing duplicates, standardizing, filtering, sorting, translating and looking up or verifying if the data sources are inconsistent. A good ETL tool must enable building up of complex processes and extending of a tool library so custom user’s functions can be added.
 
Load – Loading is the last stage of the ETL process and it loads extracted and transformed data into a target repository. There are various ways in which ETL tools load the data. Some of them physically insert each record as a new row into the table of the target warehouse involving SQL insert statement build-in, whereas others link the extraction, transformation, and loading processes for each record from the source. The loading part is usually a bottleneck of the whole process. To increase efficiency with larger volumes of data it may be necessary to skip SQL and data recovery or apply external high-performance sort that additionally improves performance.

Any flaws in this complicated process transform into massive delays, which is why pre-extraction data profiling is vital.  Growth of the database could be exponential – powerful central processing unit, vast disk space and huge memory bandwidth must be ensured.
 
Semantic Integration (from Wikipedia)
 
Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists; email archives; physical, psychological, and social presence information; documents of all sorts; contacts (including social graphs); search results; and advertising and marketing relevance derived from them. In this regard, semantics focuses on the organization of and action upon information by acting as a mediary between heterogeneous data sources which may conflict not only by structure but also context or value.
 
In Enterprise Application Integration, semantic integration will facilitate or potentially automate the communication between computer systems using metadata publishing. Metadata publishing potentially offers the ability to automatically link ontologies. One approach to (semi-)automated ontology mapping requires the definition of a semantic distance or its inverse, semantic similarity and appropriate rules. Other approaches include so-called lexical methods, as well as methodologies that rely on exploiting the structures of the ontologies. For explicitly stating similarity/equality, there exist special properties or relationships in most ontology languages. OWL, for example has “sameIndividualAs” or “same-ClassAs”.
 
Ontology based Data Integration (a branch of semantic integration) involves the use of ontologies to effectively combine data or information from multiple heterogeneous sources . It is one of the multiple data integration approaches and may be classified as Local-As-View (LAV). The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.
 
Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used:
Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies has made it possible for the data integration community to leverage them for semantic integration of data and information.
Semantic integration, and the fact that it uses semantic ontologies should not be confused with the process of transforming data and its representation from the information level to the knowledge level (i.e., actually transferring the data into an ontology).  The semantic approach is merely being used to try to ‘understand’ the labels on the data so that one taxonomy can be more accurately mapped into another taxonomy.  The output of the process is still a good old fashioned ‘relational’ information-level database.  This ‘semantic’ use of ontologies often causes confusion with Mitopia’s use of ontology which is far more fundamental and actually transforms the various unstructured or taxonomic sources into a unifying ontology, a fundamentally different, and far more powerful process where the output is totally stored, related, and accessed ontologically in a way that ‘relational databases’ cannot support, no matter how many layers of wrapping are applied.
 
WarehouseData warehouses are optimized for speed of data analysis. Frequently data in data warehouses are de-normalized via a dimension-based model. Also, to speed data retrieval, data warehouse data are often stored multiple times—in their most granular form and in summarized forms called aggregates. Data warehouse data are gathered from the operational systems and held in the data warehouse even after the data has been purged from the operational systems.
 
MagicSquare
The diagram to the left shows the November 2010 Gartner ‘magic quadrant’ for the data integration tools market.  Gartner estimates a market of approximately $2.1 billion by 2014.  Leaders in the field include Oracle, SAP, IBM and Informatica.  This entire market is based on relational (or object relational) databases, that is information-level systems, there is as yet no sign of tools for the much harder task of transforming and integrating taxonomic and/or unstructured data directly into an ontology (as is the case for MitoMine™).  This is unsurprising since, other than Mitopia®, there exists as yet no database technology that is directly built on ontologies, despite the existence of the misleading term “ontology-based databases” (which actually means systems where conventional databases contain both the ontology definitions and the data).
 
Before we look at MitoMine™ itself, we will first examine the earlier ETL technology it replaced within the Mitopia® architecture which was known as the ‘Populator’.  The Populator was a classical approach to the ETL problem and was in most regards very similar (though simpler) to the commercial ETL tools available today.  Like them, the Populator was based on the premise that a conventional database (at the time Oracle RDBMS) is the ultimate repository for persistent data.  By looking at the Populator capabilities we can simultaneously illustrate the approach of conventional ETL tools and the problems with these approaches, as well as show how radically the current Carmot/MitoMine™ heteromorphic language approach differs from conventional ETL.
 
The Classical ETL Solution – The Populator
 
Solution1
Data Integration – Solving the problem the old way
The Populator program was part of the Mitopia® Database Subsystem architecture prior to the wholesale replacement of the entire database subsystem by the MitoPlex™/MitoQuest™ layers.  In those days, the architecture of the database subsystem looked roughly as shown below.
ArchPop
The client code accessed and manipulated data through the ontology/types layer just as it does today, however, because the ontology had to be translated into the needs of existing database technologies, immediately below the types layer, the database subsystem code was responsible for routing all text fields to/from the Callable Personal Librarian (CPL) text server, all multimedia fields to/from the appropriate Mitopia® multimedia server(s), and all other fields to/from the Oracle RDBMS.  In other words, this arrangement was a classic systems integration approach to providing a ‘federated’ database so that the limitations of the RDBMS could be overcome by handling those areas through custom servers.  This type of arrangement has many problems with scaling, not the least being the ‘join’ problem caused by combining the hits from the various container types.  In this arrangement, the Populator (like today’s classical ETL approaches) sat directly on the database layer, that is it circumvented the ontology layer in order to directly access knowledge of database tables etc.
 
The Populator was split into two phases, both used solely by the system Database Administrator (DBA).  The first phase, called the Data Reader (DR) read from external sources and saved its output into intermediate storage (a temporary relational database).  This is much the same as the ‘Extract’ phase of today’s ETL tools.  The second phase was called the Mapper/Converter (M/C) and functioned in a manner exactly analogous to the Translate and Load phases of a classical ETL tool, resulting in the final data passing through the database layer and into the final system servers where it could be accessed and searched by client code.
 
According to the old Populator manual:
 
Data reading from source data files may take up to three hours; population by the Mapper Converter may take up to two days.  Population time is a function of the size and format of the raw data and of the complexity of the mapping/conversions and data destination structures, all of which can be very different for each population process.
 
The Data Reader
 
DataReader
The data reader was capable of interpreting data in just three basic format classes:
  • Field/Record Delimited Data
  • Jane’s Source Data 
  • Text/Report
Right here we see a problem!  Firstly, there are only three basic supported format groups (despite over 5 years of continuous Populator development at the time), and one of them is “Jane’s Source Data” which is an incredibly narrow source format (actually at the time the Jane’s data was delivered formatted using an SGML markup language).  What this basically means is that under the hood in the DR, there was a lot of custom source-specific code to interpret the various complexities of the Jane’s data.  This is a serious shortcoming as it means that when the source changes, the Populator code must change to match.  Indeed at the time, new Jane’s data would come out each year and the Populator development team would spend much of their time each year extending the code base so that it could correctly handle the new format.  This is clearly a bad approach for any system that hopes to be rapidly adaptive in the face of change.  This design failing was the main reason for the demise of the original Populator approach.  The other two selectable source formats were more generalized and capable of handling a fairly wide range of simple data formats.
 
The screen shot above shows the opening DR window which allows selection of the source type followed by setup of the basic criteria depending on the source type.  In the case shown, the “Text/Report” type allows specification of a set of ‘delimiter’ patterns that can isolate chunks of the text within a document that are associated with the pattern.
 
DialogOther detailed tools were provided to handle file/record delimited data as illustrated to the left.  The result of all the selections made within the DR dialogs was to initiate an extraction run that resulted in the creation of a temporary database where the table names were driven from the extracted raw data ‘chunks’.  This temporary database formed the input source for the Mapper/Converter (M/C) phase.
 
The Mapper/Converter
 
Dialog-1
The M/C phase essentially allowed the user to map the contents of the various fields in the extracted raw temporary database to the final database fields while choosing one or more operations to perform on the text/data as the mapping took place.  This direct approach circumvented the ontology/types layer at the time.  As one can imagine, the process of setting up a complete M/C specification was time intensive for any non-trivial source, and in operation the M/C was extremely slow since each atomic operation performed on any ‘chunk’ involved communications with two distinct databases.  Thus the multi-day run time quoted for any (at the time) complex source such as the World Fact Book.  By contrast now, the end-to-end MitoMine™ ingestion and persisting for the latest WFB takes a matter of a few minutes, while providing infinitely more fine grain control, being easily adaptive to change, and without violating the ontology abstraction layer.
 
MCdialog1Despite having an elaborate GUI with large numbers of controls and processing options (the Populator screen shots are from the classic MacOS version of Mitopia®, they pre-date OS-X), the truth of the matter was that the Populator was difficult to use and learn, slow and error prone, and required constant maintenance and upgrades in order to handle the continuous stream of issues that come up as any new source is added or as old sources change format.  Eventually the program fell into disuse and was for a while replaced by a set of custom strategies each targeted at converting a particular source to some simple form that could be ingested directly.  These custom strategies themselves proved un-maintainable and just as fragile as things changed.  In the end, it was the huge and on-going effort required to ingest data into the Mitopia® system that lead to the decision to replace the entire ingestion path with a new approach tied intimately to the ontology and independent of the existence of a relational database.

We will begin to examine that new approach in future posts.

The GUI interpreter layer

GUIInterpreterAs described in the previous post, the Mitopia-to-renderer communications path is implemented almost exclusively through the function UI_Command() which consists of a parser for Carmot-E based scripts using custom procedures and $functions registered by the renderer in order to implement the GUI connection.   Because such explicit GUI control is normally relatively ‘low bandwidth’, the benefits to be gained by passing these behaviors through an interpreted GUI language outweigh any possible timing overhead this might imply (which is minimal since GUI commands can be send in bulk as part of a large script).  In particular, the intermediate text form can easily be sent across the network or to another thread thus allowing remote, non-local, and potentially asynchronous interactions with the GUI.  Additionally this arrangement allows dynamic on-the-fly UI translation (i.e., software localization) to be supported.

Another advantage of this approach is that it minimizes the size and complexity of the registration layer with any registered renderers since the overwhelming bulk of GUI behaviors can be accessed through the single registration of the necessary interpreter function invoker by the renderer.  Also, the intermediate text form allows logging of GUI activities and can greatly assist in debugging GUI problems.  Any new universal renderer plugin that wishes to provide its services to Mitopia® should attempt to interpret as much of the simplified plugin language as possible within the procedure and/or $function invoker.  The fact that the language is Carmot-E which is also used within MitoMine™ and other areas of Mitopia® to tie things to the underlying ontological data held in collections, means that that those familiar with one use of the language are able to work in the others, and indeed you can call most MitoMine™ $functions from within GUI scripts.  There are however some minor differences in the Carmot-E runtime for GUI scripts as follows:

  • In the GUI language, there is no equivalent for the $ and # symbols of MitoMine™, though other registers are the same, and the form @$aa is allowed.
  • In the GUI language, the procedure “SetNode(collection,offset)” is used to specify the current data node tied to the GUI (rather than the <@1:4> plugin).  Once designated in this manner, fields of the designated node can be referenced in an identical manner to that provided by MitoMine™.  You may designate a new node at any time.  If you wish to obtain result values from the GUI, you can do so only by specifying a suitable typed collection node using “SetNode“, and then assigning the desired results to this collection element in the GUI script from where they can be examined later.  You may also use this technique to pass complex data sets to the script where once again they can be referenced by field path given the designated node.  You may need to define a suitable type and create a collection containing an element of that type in cases where your code is not using existing types defined in the ontology.  The functions UI_CreateGUIdataPath() and UI_DisposeGUIdataPath() are provided to simplify this process.  The function UI_CommandReply() provides a simple shortcut method for returning basic built-in type results.
  • In the GUI language, there is currently no “if” and “ifelse” statement since these operate by nesting parsers and the GUI parsers are not truly nested in this context (yet!).
  • Unlike MitoMine™ $functions, most GUI language procedures do not return a value.  Procedures can be distinguished from $functions by the lack of a leading ‘$‘ character.  Note that you can utilize many MitoMine™ $functions within GUI language scripts although any $functions that are dependent on a mining context will not work correctly.  Those $functions in MitoMine™ that cannot be called from non-MitoMine™ contexts are marked with an asterisk in the MitoMine™ documention.  Required GUI language $functions must also be registered by the renderers, and these can be used within scripts in an identical manner to MitoMine™ $functions.  Individual renderers can also extend the required $function set to add renderer-specific $functions or override renderer independent GUI $functions provided by the plug-in layer.
  • MitoMine™ scripts can also intermix GUI commands thus allowing them to be used in addition as GUI scripting tools.

For a complete description of the GUI interpreter and the various $functions and procedures it provides as well as details of how to register other renderers with the interpreter, see detailed Mitopia® documentation, or consult the API documentation for UI_Command().  The Listing below gives a number of examples of UI_Command() and related calls taken from actual code:

UI_Command(“ProcessControlList(“Disable”,%llu,%ld,%ld,%ld)”,winp,kStopSrv,kBackup,kRestore);
UI_Command(“LinkStringList(%llu,%ld,%d,”%s”,%d)”,contP->altWndRef,kFixupToolsList,contP->fToolsSL,kStdDelimiter,YES);
UI_Command(“Draw(%llu,%ld)”,contP->altWndRef,kCacheDisplayList);

xh = US_StrToHdl(“”);
UI_CreateGUIdataPath(TM_GetTypeID(NULL,”ET_ListColumnData”),&dpC,&dpO);
US_SprintfCatHdl(xh,”$a = %llu; $b = %ld; $c = %ld; $d = %ld; SetNode($c,$d); “
“dataType = %ld; AddListColumn($a,$b,”Input”,0);”
“AddListColumn($a,$b,”Output”,1);”,
contP->altWndRef,kCacheDisplayList,dpC,dpO,
kAttribute);
for ( max = US_CountColumns(sL,1,0,”t”) – 2, i = 0 ; i < max ; i++ )
{
    US_strcpy(file,scP->algorithms[i].fnName);
    US_SprintfCatHdl(xh,”AddListColumn($a,$b,”%s”,%ld); “,file,2+i);
}
LOCK_HDL(xh);
UI_Command(*xh);
KILL_HDL(xh);
UI_DisposeGUIdataPath(&dpC);

UI_CommandReply(kInt32Type,&m,”fInt32 = $GetValue(%ld,%ld,0)”,contP->altWndRef,kNoRawMapping);

In particular, note the middle example involving the use of a C code loop within Mitopia® to build up an entire interpreted program consisting of many statements which together define the entire content of a list control.  This ability to send long sequences of GUI commands as a single block is critical to performance, particularly when the GUI renderer is actually operating remotely.  Of course the programmer could have accomplished the same thing with just a single ‘link’ command specifying a list held within Mitopia’s ontology.

The Data Model Interface

Unlike the Mitopia-to-renderer (M2R) path, the renderer-to-Mitopia (R2M) communications path is not passed through an interpreter in order to create the necessary abstract interface.  This is because during the process of data discovery by the GUI renderer, it may need to make hundreds or thousands of calls very rapidly in order to explore and display all nodes of a large linked collection for example.  In general it is the case that a vastly larger number of ‘inquiries’ pass from the GUI renderer through Mitopia’s data model abstraction to the underlying data, than do ‘inquiries’ or ‘commands’ passing from Mitopia® to the renderer.  This means that the timing overhead of an interpreted connection can less easily be tolerated in the R2M path.  For this reason Mitopia® provides a very high performance abstract interface to data held in the flat memory-model through the ‘DataModels’ package and API.  Note that the flat memory model abstraction includes all Mitopia® structures, not just those described by the Carmot ontology.  For example, data held in Mitopia’s ‘string list’ abstraction can also be discovered and displayed by renderers using the DataModels layer.  In this regard, the DataModels layer is a high-level generalized abstraction for discovering data types and content on the fly.

One of the forms of data that can be discovered through the DataModels layer is data described by Mitopia’s type manager and held in collections.  One part of the discoverable data described by the type manager is that associated with the Carmot ontology for the system (i.e., I.DEF and others).  From the perspective of the GUI renderer doing the data discovery, the interface to all data accessible through the DataModels layer is identical and it cannot tell directly which lower-level abstraction is actually describing the data.  This extra level of abstraction allows the GUI renderer to be easily connected to a much wider range of data residing within Mitopia® although the bulk of all data accessed and displayed through the DataModels layer is ontological data.

The DataModels package provides a series of API functions starting with “DM_r” which can be called by the GUI renderer to implement every possible data discovery operation that might be required by a GUI renderer.  This includes all operations for exploring hierarchical collections, for discovering types and type fields, for discovering field content, layout restrictions, selections and all other things the renderer might need to know to display and interact with the underlying data.  Many other specialized API functions are provided by the DataModels abstraction to assist in renderer operations.  Since most GUI renderers tend to be implemented in OOP languages, often C++, it is probable that to connect the renderer to Mitopia®, the renderer methods must be sub-classed within an adapter layer which in turn uses the DataModels abstraction to make the connection thereby avoiding the need to fully understand the lower level Mitopia® API layers.

Mitopia® in turn expects the renderer to provide certain ‘behaviors’ in support of the data model abstraction and these are registered by the adapter layer using the Mitopia® API functions RP_DefineDMRendererPlugin(), RP_DefineCMRendererPlugin(), and RP_DefineWMRendererPlugin().  The complete set of defined plugin points (in either direction) can be found in the ‘RendererPlugin.h‘ header file.  Using the definitions in this file it is possible to create and register new renderers with the Mitopia® architecture so that those renderers are able to discover, access, and display data held within Mitopia®.  The full details of how to do this are beyond the scope of this post and are covered by other Mitopia® documentation.  A correctly implemented renderer using this abstraction can allow the user to discover, scroll, and navigate Mitopia® data collections containing millions of nodes in real time.

While the details of the data model interface are beyond the scope of this post, it is important to understand that it is only by the correct connection of a renderer to the DataModels layer, that all of the UI behaviors and auto-generation capabilities described in earlier posts are possible.

On-the-Fly GUI Translation

The Mitopia® architecture provides extensive support for translating text from one language to another.  One of the most important and pervasive aspects of this is the ability to translate the entire GUI for the application from  English to any other supported language on-the-fly.  Because text can appear in so many different places in a renderer, and may be generated by hard code internal to the renderer as well as any number of other sources, it is impractical to define a fixed plugin that is likely to exist in any GUI framework to support this.  Consequently, it is a GUI renderer’s responsibility to ensure that any text that is about to be displayed anywhere in the UI as part of control labeling or contents (if applicable) is passed through some kind of filter function within the renderer which we shall refer to as “PrepareText”.  Furthermore, a means must be provided to allow the renderer adapter layer to utilize Mitopia® translation capabilities, particularly the function XL_GetString() to implement the required “PrepareText” functionality based on the current UI context.

Dynamic software localization - English appearance

The screen shot above illustrates the user in the process of switching the current UI language from English to Arabic while viewing a Country record (in this case Slovakia) in the standard portal.  The screen shot below shows the effect on the GUI of this action.  As you can see most of the control labels in the GUI have been translated by Mitopia® to Arabic, those for which a translation could not be found remain in the original English.  Note also that the renderer itself has flipped the entire UI within the window to reflect the right-to-left nature of the Arabic script.  This flipping even extends to moving scroll bars to the left of list controls.  For most languages which use left-to-right scripts, this flipping does not occur.  You will note that the translation of text strings through the renderer’s “PrepareText” extends to translating the language popup menu which is in fact driven by dynamic discovery off a Mitopia® collection containing a list of the supported languages.  The GUI labels in the rest of the UI are derived automatically from the Carmot type’s field names and thus the ontology for the system is also tied to the translation capability through this behavior.

Dynamic software localization - Arabic appearance

DictionaryWindowMissing translations are entered automatically into Mitopia® translation dictionaries which can be edited through Mitopia’s built-in ‘Dictionaries’ window as illustrated in the screen shot to the right.  Once translated, they immediately impact the GUI display in the language concerned.

The details on how renderers should implement the “PrepareText” functionality are beyond the current scope, however, the main Mitopia® API functions involved are XL_GetString() and XL_FindCollectionDictionary().