Package TRACER
(Implementation notes)


This section gives additionnal details about the implementation technique of package Tracer. We recommend that you first have a look at the implementation of Tracer before reading this section. YOU DO NOT NEED TO UNDERSTAND (not even read) THIS SECTION IN ORDER TO USE THE PACKAGE.

Some important parts of Tracer rely on package Protection. We suggest that you read the documentation of package Protection before this one.

Tracer uses a lot of nice features of Ada, including task attributes, code in package body, protected types, task-IDs, finalizable types, etc (even goto!). Have a look at the source, it can be quite instructive. Actually, it is a tribute to the power of Ada 95 to have such a package written in a 100% portable way.

The initial constraints for this package were quite challenging:

Admitedly, such a level of protection was not really necessary for a simple debugging aid. However, we took this opportunity to demonstrate how the features of Ada could be used to provide safe operations, even in the face of aborts. We do believe that Tracer will behave as specified under any circumstances; if you think you have discovered a race condition, please tell us by sending a note to rosen@adalog.fr, we'll do our best to correct it!

Protecting against race conditions

The issue of mutual exclusion is very important in any multi-tasking system. The implementation of Tracer uses two independent locking mechanisms:

  1. A general one, based on the use of package Protection, to provide mutual exclusion between high level services, such as Pause, Flush_Trace, etc.
  2. A local one, that uses protected types to grant exclusive access to ressources such as the message queue or the task information.

The most difficult part was the definition of the first one. A first attempt used a kind of semaphore provided by a protected type, but we could not avoid the case where a task was aborted while holding a semaphore. Avoiding dead-locks under this circumstance turned out to be extremely difficult. Putting all the print operations in a protected type was not an option, since IO's are not allowed from within a protected operation (they are, by definition, "potentially blocking"). Therefore, the first version of Tracer chose to use a task to serialize calls. Unfortunately, this solution still lead to a subtle race condition when the main program was aborted.

Eventually we decided to return to the semaphore-type protection, but we devised a mechanism to ensure that a task could not be aborted while holding the semaphore (package Protection). It turned out that this component could be useful in other similar cases, so we made a separate software component out of it. Please check the documentation provided with this package for more details.

After trying various solutions, we decided that Trace would be implemented as a Keep_Trace followed by a Flush_Trace; therefore, only Flush_Trace (and Pause) really print something and need to be protected. This seemed to be the simplest solution to ensure that traces originating from Keep_Trace were printed in the right order if some other task was performing regular Trace calls at the same time.

Kept messages are stored as a simple linked chain. Of course, chaining and dechaining also had to be protected.. We use a protected type for this, taking advantage that protected subprograms (i.e. not entries) are not potentially blocking. After trying various solutions, it appeared safer to make all dynamic allocation (as well as dealing with Storage_Error) inside the protected operation. Fortunately, new is not a potentially blocking operation !

In the first version of this package, only the indentation level was kept as a task attribute, and no protection was required since Get_Value and Set_Value are guaranteed atomic. However, the introduction of the marking of tasks required to add a new field, and a protected object had to be used to set or query the fields of task attributes. Note that race conditions could happen only if a task was setting its indentation level while another task was changing its registration status. It could seem that the use of a protected object could have been avoided by taking advantage of the Reference function, that would allow to modify directly the attribute, rather than using a reading-and-writing scheme which requires exclusive access. However, a task could terminate after the pointer to its attribute had been obtained, and accessing the attribute in this case would be erroneous, without any way to prevent it. With the read-then-write paradigm, the target task can terminate between Get_Value and Set_Value, but this only raises Tasking_Error and does not cause any erroneous condition.

Shared variables

Tracer has a number of global variables. Since we are in a multi-tasking program, it is important to avoid race conditions when accessing theses variables. All the global variables are grouped near the top of the package, and each is accompanied by a comment that describes where the variable is set and read, and how absence of race conditions is guaranted. The two most common paradigms are:

  1. Protected from concurrent access. The procedure where the variable is set and the one where it is used cannot execute concurrently because of the use of one of the previously described exclusion mechanisms.
  2. Pragma Atomic is sufficient. The procedure is set in a procedure and can be read from another procedure, but no procedure reads the variable to later reset it. In this case, atomicity of physical access, as granted by the pragma atomic, is sufficient to avoid race conditions. This is used only with variables of very elementary types, and it is highly unlikely that pragma Atomic would not be applicable to such variables.

Printing kept messages

With our design, only the Flush_Trace procedure actually prints something, and this is completely protected from other calls by the use of package Protection. It is possible that some tasks add messages while a Flush_Trace is going on; but since addition and removal of messages are protected by a protected objects, this will not cause any problem.

We discovered a subtle bottleneck effect when a task was calling Trace while another task was holding the semaphore (something that can be quite long if the other task is executing a pause): since a Trace is merely a Keep_Trace followed by a Flush, the Flush would block, therefore adding an unnecessary synchronization. More precisely, the call to Flush is not necessary if we can be sure that some other task is currently flushing. We added therefore a function to the protected object to tell if the queue of messages was currently being flushed. Note that there is still a small, very unlikely, race condition if the message is flushed between the Keep_Trace and the Flush; this can result in Flush being called unnecessarily, but never in not calling Flush when necessary; we considered therefore the race condition as acceptable.

Other procedures use the same semaphore as the one used for Flush_Trace, and are therefore mutually exclusive: Pause, since it is its purpose to block the printing of messages; Set_Watch, since we don't want the Watch procedure to be changed by some background task while a Pause is going on; Mark and Unmark, to avoid race conditions between a task in the background and user command during a Pause.

To ensure that all outstanding messages are printed even in the case of a general abort, the body of Tracer declares a finalizable object whose Finalize calls Flush_Trace. Note that since there is an Elaborate_Body pragma in the specification, this ensures that the body of Tracer is elaborated before any use of it, and therefore that the finalizable object is initialized before any unit that refers to Tracer is elaborated. This in turn ensures that this object is finalized after any such unit, and therefore that no call to Keep_Trace can happen after the object has been finalized.

Management of lost messages

When a Keep_Trace raises Storage_Error, we note the Task_ID of the task into a field of the protected object Chain, whose value is otherwise Null_Task_ID. This serves as a flag to indicate that some messages have been lost. However, if space is released, a subsequent Keep_Trace may well succeed, even if no Flush_Trace has been performed in between. Therefore, there is an extra flag in the chain structure in which the "lost messages task ID" is copied, indicating that the "Lost messages" message should be printed *before* the current message (if not Null_Task_ID). Since this processing occurs within the protected object Chain, no race condition can occur. If Flush_Trace is called and the "lost message task ID" is not Null_Task_ID, this means that the "Lost messages" message should be printed *after* the last message from the chain.

The timer

The timer is an abstract state machine implemented as a task which accepts requests as calls to its entries. There is one select statement for each state; in the case of the "active" state, the select has a "or delay" branch to trigger the call if the timer expires. The only subtility is that we didn't want any restriction on what could be done by the user procedure, including resetting the timer (something that can be quite useful actually). Since a timer reset ends up as an entry call to one of the timer task entries, it would dead-lock if the timer task called the user procedure directly. This was solved by subcontracting the call to the user procedure to a subtask (Proxy).

For cosmetic reasons, we didn't want these internal tasks (Timer and Proxy) to be apparent to the user. Normally, each message printed by a task is preceded by the task's ID. In the case of messages generated by Timer and Proxy (including user messages printed from the user timer procedure), the task-IDs of these tasks would become apparent, potentially puzzling the user who would see tasks not belonging to his/her program. Therefore, we keep the task ID's of our internal tasks in global variables, and they are "censored" off any message. These variables are set by the Timer task itself (the only one which has visibility of Proxy); but of course, we had to ensure that the variables were properly initialized before starting Tracer for good. That's why the body of the package starts with a call to Timer.Cancel_Delay; this call is ignored by Timer, but ensures that the task has started, and therefore that the variables are initialized.

Note that Timer and Proxy always have terminate alternatives in their select to make sure that they don't prevent the user program from terminating normally.

Auto-debug procedures

One of the most frustrating things in the design and debugging of this package was that it was the only package on the surface of the earth (actually we could even include satellites) that cannot use Tracer.Trace to trace what's happenning inside...

To overcome this, we added a very rudimentary ITrace (Internal Trace) procedure that simply prints its argument, with a leading "+++" to show its origin. It is currently never used. Moreover, the most important procedures have a catch-all exception handler that calls the procedure Tracer_Error in the case of an unexpected exception. Tracer_Error does its best to print all available information at that point. Of course, these procedures are internal to the body of Tracer and not exported.

We left Tracer_Error and ITrace in the working version of Tracer, because they can be of help to those who want to modify Tracer. And since you are reading this, it may be you!

Tracer.Timing

There isn't much to say about this package, it is really simple. The only issue is that it makes no attempt to detect recursive or reentrant calls. We felt that the primary concern was for the timing procedures themselves to be as short and fast as possible. Protecting from reentrancy and/or recursion would involve things like accessing task attributes, the Current_Task function, etc. which are likely to be quite costly.

If there is demand for it, we can provide more sophisticated versions. Please keep us informed if you have such a request.

Technical Q & A:

Q: How does the initial message magically appear ?
A: Asking for the desired kind of trace is done in the statement part of package Tracer. Since there is a pragma Elaborate_Body, this ensures that the body of Tracer is elaborated (triggering the printing of the message) before any unit that withes Tracer. No special action is required from the user.

Q: Why not use a child library to provide the basic services ?
A: This would seem to provide a better organization; to tell the truth, we even tried it at some point, but we stepped back. Here are the reasons:

  1. Proper operation of the package heavily relies on package Tracer having pragma Elaborate_Body. If some functionnalities are put in a child package, then this package's specification must be elaborated between the parent specification and the parent body, therefore preventing the use of Elaborate_Body.
  2. With the use of clever renamings and a few other tricks, we were able to work around the previous problem, but it complicated the structure.
  3. In practice, having several packages rather than one big package did not bring much; actually, searching something in a single file is simpler than having to deal with multiple windows.
  4. Eventually we realized that having only one big package (except for language customization) was not really a burden for the maintainer, and would make things simpler for the regular user (the one who uses the package and does not want to look into it).

Q: Shouldn't it be possible to have a simpler version of Tracer, without tasking support for example ?
A: This was our original intent, however we also wanted to avoid duplication of code, especially to ease extending the package with new trace procedures. It turned out that providing support for various versions with the same level of security would complicate all versions to the point where the non-tasking version would not get much simplification.

Q: Why do the Trace procedures call Keep_Trace ?
A: Semantically, Trace is equivalent to Keep_Trace followed by Flush_Trace. It may seem expansive to actually code it this way, since it involves seemingly gratuitous memory allocation. However, we don't care about efficiency here, and this solution has two major advantages:

  1. It avoids code duplication between the corresponding Trace and Keep_Trace procedures;
  2. (and foremost) the problem of interferences between Trace and Keep_Trace, as well as making sure that no message is lost when tasks are aborted during Trace is quite difficult. Having only one place where messages are actually printed (in Flush_Trace) makes Tracer much more reliable.

Q: Why use this strange Tracer_String_Access type for messages passed as discriminants to Auto_Tracer and R_Detector, rather than plain String and access to String?
A: An access type was required, since it is not possible to pass a string as a discriminant. We wanted the types to be as transparent as possible, therefore they are controlled types and free the message from their Finalize when they disappear. Had we used plain strings, the user could have had the discriminants pointing to user strings, or be tempted to reuse the pointer for other purposes, creating a risk of dangling pointers when the controlled objects are finalized. The way Tracer_String_Access is declared, it is impossible to do anything with it (not even declare variables). This protects the user from incorrect usage. Note also that the definition of the "+" operator could have been in conflict with a user defined one. By having our own type, no confusion can happen.

Q: Shocking! There are goto's in this package
A: Yes, and we seriously argue that it is appropriate. It is used in an abstract state machine, where each state is modelled as an accept statement that does its required operations, then jumps to the next state. A goto perfectly describes this behaviour, and replacing it with a state variable and a case statement would be an abstraction inversion. To be honnest, this is the only case we ever encountered where a goto could be considered appropriate in Ada...