Self modifying code in interpreted languages

Back in my flight control software days, code was written in assembler, memory was scarce (a few 10’s of KB), and performance, reliability, redundancy, and self test requirements were at the extreme end of what was possible. Naturally therefore one was always faced with trying to solve ‘hard’ requirements within a memory or CPU budget that on the face of it did not allow a solution. It was in these kinds of situations that I was first driven to self modifying code as one of the only practical solutions.

Now the guardians of good programming form will tell you that self modifying code is evil, dangerous, and perhaps even deviant behavior. You should never do such things they will tell you. But of course in the comfort of a university campus without a deadline and impossible requirements looming over them, such statements may well constitute sage advice. I’m here to tell the other side of the story; of how I’ve come to embrace self-modification as a legitimate technique in many situations right up to implementing and leveraging it directly in many of the heteromorphic languages Mitopia® uses to get complicated things done relatively simply.

First lets look at a classic example from my flight control days. Modern fighter aircraft are inherently unstable in order to gain the necessary maneuverability, rather like throwing a paper dart backwards. This means that you need a computer to keep them flying, human reactions simply cannot control them, so the pilot essentially requests flight trajectories, and theDigital Flight Control Computer (DFCC) figures out how to do it without ripping off the wings and ‘splidging’ the pilot. This means the DFCC code is ‘flight critical’ (i.e., any failure means loss of aircraft and crew) which in turn puts some crazy requirements on redundancy, fault-tolerance, and most importantly on self test. The software must work its way through all flight related equipment testing it using earlier tested subsystems to validate later ones tested. The first step in this process is for the software to perform a complete instruction test on the CPU on which it is running to prove that every CPU instruction works as it is supposed to, and that every register and other essential aspect of the CPU behaves correctly. After all if you can’t prove your own CPU is working, how can you be trusted to test if anything else is.

Given that, one generally encounters a Built in Test (BIT) software requirement of the following form: “The software shall perform a processor instruction set test through which correct operation of all essential instructions is confirmed in combination with all processor registers and addressing modes“. Now with a few 10’s of K ROM memory to write an instruction test for all instructions used with all register and addressing mode combinations (literally millions of combinations), we are faced with what appears to be an impossibility in the resource budget. The solution we found to this issue was self modifying code. What we did (simplifying it a bit) was write a set of CPU tests each using register R0 to go through a series of operations using known inputs, and confirm the results against expected values. Then instead of running the code from ROM, we copy each test into RAM and run it there. Once each one completes, it returns to the ROM calling code which then goes through and edits the instruction OP codes in RAM to change register usage (requires a knowledge of various instruction encodings). Then the ROM code re-executes the code in RAM using the new register combinations. This process is repeated in a cycle through various test code sequences, until a major proportion of all instructions have been tested with each and every register and addressing mode. If all goes well, we move on to the memory tests. If anything goes wrong, we rely on the fact that we’re in a quad redundant system so three other CPU’s are going through the same test sequence, and if our CPU fails to check in with the others at any point, the redundancy management logic will result in it’s being voted out and disconnected from any actuators so it can do no harm (the aircraft can of course still fly with three CPUs, in fact it can fly in with just one, and if that fails there are two very simple analog backups that should provide enough control to land). The ROM footprint for such an extensive test (just one of many) is actually quite small, thanks entirely to the wonders of self modifying code.

I could give a number of other compelling examples of the self-modifying technique over the years, but lets fast forward now to look as some of the many ways that Mitopia® leverages self modifying code in order to get things done.

Perhaps the most obvious and pervasive form of self modification comes from Mitopia’s use of dynamically generated language parsers to make most significant internal connections, rather than direct code calls. We have discussed some of these languages in earlier posts. One such language is that used by Mitopia’s GUI adapter layer so that all UI activity is mediated by nested GUI interpreters.

The code snippet above illustrates the use of this interpreted layer to dynamically construct the GUI code necessary to perform a particular operation (in this case adding columns to a list and setting titles and widths appropriately). Once the entire program has been constructed dynamically, it is then executed by the call UI_Command(*xh). This is an example of self modification at the C code level prior to invoking an interpreted language.

Within the MitoMine™heteromorphic language suite, Carmot-E endomorphic programs are invoked through the <@1:5> meta-symbol. Other than the grammar productions themselves, these <@1:5> meta-symbols form the bulk of any MitoMine™ script. Carmot-E is Mitopia’s preferred endomorphic language for accessing data held according to the system ontology within the collection abstraction. In the case of MitoMine™, the designated node in the collection to be read/written is set by a preceding call to the <@1:4> plugin which creates a new record of a specified type into which data extracted from the source text can be written or referenced from. Completion of the designated node is indicated in the MitoMine™ script by a <@1:3> meta-symbol. The Carmot-E language and run-time environment is fully described in the post entitled “Dropping the other Shoe”. Carmot-E is the endomorphic language for many Mitopia@ heteromorphic suites, not just MitoMine™.

Carmot-E allows the <@1:5:$registerName> form to allow the string content of the specified register to be macro expanded in order to obtain the actual Carmot-E program statements to be executed. This capability essentially allows self modifying code, that is code that can construct in a register a program to be executed, and then do so. This is a very powerful technique and can solve a number of complex problems in script writing. Self modification is generally frowned upon as a programming technique, but in the context of a heteromorphic language it can be a perfectly valid and effective strategy. The macro expansion behavior occurs only if a <@1:5> meta symbol contains nothing but a valid register name (including ‘$’) within it as in <@1:5:$aa> or <@1:5:$>. If the register specified contains a string value then that string value is taken to be the meta-symbol ‘hint’, that is the Carmot-E program to execute. If the register specified does not contain a string, a syntax error will be reported. Examples of possible uses include loading nested MitoMine™ scripts from file and executing them, executing Carmot-E instructions in source text being mined, dynamically constructing and executing code, etc.

Mitopia’s ‘TypeMapping’ ontology mapping language BNF definition shown above (click to enlarge) is a simple illustration of the use of Carmot-E’s self-modifying register form to accomplish exotic effects. In this case you will note that virtually all the <@1:n:…> plugin operators (which ultimately invoke Carmot-E) contain simply ‘$‘ as the plugin hint, in other words, execute whatever instructions you just encountered in the source file. If you look at the ‘=V=‘ and ‘=R=‘ forms you will see the <@1:5:$> is preceded by a <15:MitoMine plugin 5> token which we can see from the corresponding .LXS lexical specification will recognize any standard MitoMine™ plugin of the form <@1:5:any Carmot-E program>. So this means that the <@1:5:$> plugin will take a Carmot-E program from the source file (which specified the necessary ontology mapping instructions) and execute it but in this case it enforces the additional behavior that all Carmot-E operations/assignments appearing on the left side of an assignment operate according to the new ontology definition and on the data being created by the mapping, while all Carmot-E operations appearing to the right of an assignment operate according to the old ontology definition and on the data held in that old ontology that is to be auto-converted. This language thus uses the self modification trick to handle ontology mapping relatively simply.

Finally lets look at an example of a MitoMine™ script that directly uses self modification to accomplish useful behaviors quite simply.

The script snippet above (click to enlarge) is related to extracting a ‘Background Check’ report on an individual. The script must deal with the problem where while processing a complex PDF report, it encounters referenced documents and news articles about the person concerned that are referenced by the report. The script needs to have those news articles persisted separately as if received from their original sources, and referenced by the report, rather than enclosed within it. To do this the script needs to load the referenced news report content from file (see $InputTextFile call in snipped above), wrap it as necessary, and then invoke the generic “NewsStory” script to process it (see the highlighted call to $MitoMine in the snippet above, which is mining the accumulated content of register $xx). The tricky part is that the script also wants the generic “NewsStory” script to do some extra housekeeping and create the ‘GroupRelation‘ between the persisted news story and the person concerned on it’s behalf. You can see that the script constructs the necessary Carmot-E statements to do this in the statement:

<@1:5:$yy = “<CodeInsert>\n_BEGIN_notes.relatedToGroup = \”” + $fn + ” – MediaReports;;GroupRelation;;source = ” + $SourceName() + “;;relationType = Referenced From;;regarding = [1[” + $fn + “;;Person]1]\””>

<@1:5:$yy = $yy + “\n_END_</CodeInsert>\n”>

The statements are wrapped within the <CodeInsert>…</CodeInsert> delimiters and prepended to the text to be parsed by the $MitoMine($xx,”NewsStory”) call. Now if we look at the “NewsStory” script, we see the following snippet (click to enlarge):

As you can see, the script (in BNF production insert_orNot) looks to see if there is a <CodeInsert>…</CodeInsert> block present in the document XML (which normally there isn’t), and if so, extracts it into register $cc, and executes it on the highlighted line. This has the effect of executing the MitoMine™ instructions built up by the invoking script, thereby creating the link between the mined news story and the person that is the subject of the background check. Self modifying code to the rescue once again.

There are many other examples of self modification dotted around the Mitopia® code base and as a technique, it is I believe an essential part of creating an adaptive data-driven environment. No doubt I will return to this subject in later posts.