XI Jornadas de Tiempo Real

Transcripción

1 XI Jornadas de Tiempo Real Palma de Mallorca, 7 y 8 de febrero de 2008

2

3 XI Jornadas de Tiempo Real Palma de Mallorca, 7 y 8 de febrero de 2008 Editores: Albert Llemosí Julián Proenza

4

5 Comité Organizador: Albert Llemosí Julián Proenza Departament de Ciències Matemàtiques i Informàtica Universitat de les Illes Balears Cra de Valldemossa km 7, Palma de Mallorca Entidades patrocinadoras: Conselleria d Economia, Hisenda i Innovació del Govern Balear* Fundació La Caixa Ministerio de Educación y Ciencia Universitat de les Illes Balears (*)La aportación de la Conselleria d Economia, Hisenda i Innovació del Govern Balear ha sido cofinanciada con fondos FEDER

6

7 Lista de Contribuciones 1 Modelización y Métodos Formales Adaptive Petri Nets implementation. The Execution Time Controller. R. Piedrafita Moreno and J. L. Villarroel Salcedo Modeling and Verification of Master/Slave Clock Synchronization Using Hybrid Automata and Model-Checking. G. Rodríguez-Navas, J. Proenza and H. Hansson Software Modeling of Quality-Adaptable Systems. J. F. Briones, M. A. de Miguel, A. Alonso and J. P. Silva Análisis Temporal Considerations on the LEON cache effects on the timing analysis of on-board applications. G. Bernat, A. Colin, J. Esteves, G. Garcia, C. Moreno, N. Holsti, T. Vardanega and M. Hernek A Stochastic Analysis Method for Obtaining the Distribution of Task Response Times. J. Vila-Carbó and E. Hernández-Orallo D-P domain feasibility region in dynamic priority systems. P. Balbastre, I. Ripoll and A. Crespo Providing Memory QoS Guarantees for Real-Time Applications. A. Marchand, P. Balbastre, I. Ripoll and A. Crespo Sistemas Operativos y Middleware Operating System Support for Execution Time Budgets for Thread Groups. M. Aldea Rivas and M. González Harbour Una Máquina Virtual para Sistemas de Tiempo Real Críticos. J. A. Pulido, S. Urueña, J. Zamorano y J. A. de la Puente Middleware based on XML technologies for achieving true interoperability between PLC programming tools. E. Estevez, M. Marcos, F. Perez and D. Orive Real-Time Distribution Middleware from the Ada Perspective. H. Pérez, J. J. Gutiérrez, Daniel Sangorrín and M. González Harbour Sistemas Distribuidos Integración de RT-CORBA en robots Eyebot. M. Díaz, D. Garrido, L. Llopis y R. Luque An Ada 2005 Technology for Distributed and Real-Time Component-based Applications. P. López Martínez, J. M. Drake, P. Pacheco and J. L. Medina An Architecture to Support Dynamic Service Composition in Distributed Real-Time Systems. I. Estévez-Ayres, L. Almeida, M. García-Valls and P. Basanta-Val Herramientas de Desarrollo Plataforma Java para el Desarrollo de Aplicaciones Empotradas con Restricciones Temporales. J. Viúdez y J. A. Holgado Sistemas de Control A Taxonomy on Prior Work on Sampling Period Selection for Resource-Constrained Real-Time Control Systems. C. Lozoya, M. Velasco, P. Martí, J. Yépez, F. Pérez, J. Guàrdia, J. Ayza, R. Villà and J. M. Fuertes Distributed Control of parallel robots using passive sensor data. A. Zubizarreta, I. Cabanes, M. Marcos, D. Orive and C. Pinto v

8

9 1. Modelización y Métodos Formales

10

11 1. Modelización y Métodos Formales 3 Adaptive Petri Nets implementation. The Execution Time Controller. Ramón Piedrafita Moreno. José Luis Villarroel Salcedo Aragon Institute for Engineering Research (I3A), University of Zaragoza, Maria de Luna, , Zaragoza, Spain (Tel: {piedrafi, jlvilla}@unizar.es) Abstract: In this work we have developed a technique which allows the choice in real time of the most suitable algorithm to execute a Petri Net for control applications in accordance with its structure and the sequence of control events. Thus, we decided to design a controller, which we have called Execution Time Controller (ETC). The aim of the ETC is to determine in real time which algorithm executes the Petri Net fastest and to change the execution algorithm when necessary. In this way, the two Petri net implementation algorithms with better performances have been taken into account, the Enabled Transition and the Static Representing Places techniques. In the case of system control, this minimizes the controller reaction time and also the power consumed by the controller. One possible application of the technique is the minimization of execution time of the Programmable Logic Controllers programs developed in Grafcet. The well behavior of the ETC has been established by a set of test using a parametric Petri net library and the Java real-time specification. 1. INTRODUCTION Petri Nets (PN) are a formalism which is well suited to modeling concurrent discrete event systems: it has been satisfactorily applied in fields such as communication networks, computer systems, discrete part manufacturing systems, etc. Net models are often regarded as selfdocumented specifications, because their graphical nature facilitates communication among designers and users. Petri Nets have a strong mathematical basis which allows validation and verification of a wide range of correctness and liveness properties. Moreover, these models are executable and can be used to animate and simulate the behavior of the system and also for monitoring purposes once the system is readily working. The final system can be derived from a Petri Net model by means of hardware and software (code generation) implementation techniques. In other words, they can be used throughout the life cycle of a system. In this paper we assume that the reader is familiar with the basic concepts of Petri Nets (Murata, 1989). The implementation of Petri Nets has received considerable attention from researchers over the last twenty five years. This implementation involves the translation of a system model expressed by a Petri Net to an actual system with the same behavior as the model. It has been studied in numerous application fields such as digital hardware, simulation, robotic systems and other concurrent software applications. However, the most extended application is the GRAFCET, a standardized language for programmable logic controllers. A Petri Net implementation may be either hardware or software. However, we are interested in the latter; i.e. software implementation. A software implementation is a program which fires the net transitions, observing marking evolution rules; i.e., it plays the token game. An implementation is composed of a control part and an operational part. The control part applies to the structure, marking and evolution rules of the Petri Net. On the other hand, the operational part is the set of actions and/or codes of the application associated with the net elements. In a centralized implementation, which is the most extended approach, the full control part is executed by a single task, commonly referred to as token player or coordinator. The execution of a Petri net without a suitable algorithm can suppose increases of the response time in control applications. Moreover, the reduction of the CPU overload introduced by implantation algorithm can allow the execution of other tasks (tests, communications, statistics, ) and the use of more complex control models. The objective of the technique developed in the present paper is to determine in real time which algorithm executes a Petri Net fastest and to minimize the execution computation time. Furthermore, this must be achieved with minimum computation time overload. An analysis of the features of Petri Net centralized implementation algorithms was carried out in (Piedrafita and Villarroel, 2007). Brute Force, Enabled Transitions, Static Representing Places and Dynamic Representing Places algorithms were analyzed. The following conclusions were reached: The implementation of the Enabled Transitions, Static and Dynamic Representing Places algorithms brings about a drastic reduction in the execution computation time compared to the Brute Force algorithm. If the Static Representing Places algorithm chooses suitable Representing Places, performance is similar to or better than Dynamic Representing Places.

12 4 XI Jornadas de Tiempo Real (JTR2008) The choice of the most suitable type of algorithm to execute a Petri Net depends on the Petri Net behavior (effective concurrency vs. effective conflicts). Whether or not an algorithm is suitable to execute a Petri Net depends on its structure, but also on the sequence of events that fire the transitions. Even if the net structure is analyzed previously, the evolution of its marking, which is brought about by the events that arise, may lead to it shifting to the algorithm that would execute the Net fastest in the light of that sequence of events. In this work we have developed a technique which allows the choice in real time of the most suitable algorithm to execute a Petri Net in accordance with its structure and the sequence of events. With this aim in mind, we decided to design a controller, which we have called Execution Time Controller (ETC). The aim of the ETC is to determine in real time which algorithm executes the Petri Net fastest and to change the execution algorithm when necessary. In the case of system control, this minimizes the controller reaction time and also the power consumed by the controller. One possible application of the technique is the minimization of execution time of the Programmable Logic Controllers programs developed in Grafcet. The organization of this paper is as follows; in Section II, we review the centralized implementation techniques for PN; in Section III the ETC is introduced; Section IV describes the techniques developed for algorithm computation time estimation; in the Section V the technique evaluation is done describes the tests run to evaluate the estimation techniques and the working of the ETC in real time. Finally, in section VI, we present the main conclusions and suggest future lines of research. 2. CENTRALIZED PETRI NETS IMPLEMENTATION Petri Net implementation is highly dependent on the interpretation of the net model; namely, how inputs, actions and code are associated to the net elements. In this work we consider binary Petri Nets with an interpretation that associates application code to places, and predicates and priorities to transitions. Centralized implementation techniques (Colom et al, 1985; Villarroel, 1990) are codified in a task called the Coordinator, which plays the so-called token game, i.e., it makes the net evolve over time. The Coordinator establishes when transitions are enabled and must fire. Apart from the simple exhaustive test of all transitions (brute force approach), various solutions are available for reducing the costs of the enabling test and subsequently the overload introduced by the coordinator. Depending on the solution chosen, centralized implementation techniques can be classified into any of the following classes (Briz, 1995): Place-driven approaches. Only the output transitions of some representative marked places are tested (Colom et al, 1985). Each transition is represented by one of its input places, the Representing Place. The remaining input places are called synchronization places. Only transitions whose Representing Place is marked are considered as candidates for firing. Transition-driven approaches. A characterization of enabling of transitions, other than marking, is supplied, and only fully enabled transitions are considered. This kind of technique is studied in works such as (Briz, 1995; Silva et al, 1982). In the present work we have implemented two algorithms in which different enabled transition search techniques are developed: Enabled Transitions (ET) Static Representing Places (SRP) Other implementation algorithms as Brute Force o Dynamic Representing Places are not taken into account following the conclusions of the work (Piedrafita and Villarroel, 2007). 2.1 Data Structures In a centralized interpreted implementation we need a static data structure which encodes the Petri Net structure and a dynamic one which represents the net state or marking. All the algorithms share the same basic data structure, in which the Petri Net is encoded, with different access possibilities adapted to each technique. Likewise, they have two kind of lists to make the net evolve: treatment lists to be processed in the present treatment cycle and formation lists to be processed in the next cycle. With the use of these two lists, the net is executed by steps, avoiding the appearance of runs. The fundamental difference between each of the implementation techniques lies in the way in which the formation lists are built, and hence in the transitions which are considered in each treatment cycle. In the Enabled Transitions technique the following data structures will be available: Enabled Transitions List (ETL). Treatment list made up of the transitions with all marked input places. Almost Enabled Transitions List (AETL). Formation list which is built with the output transitions of the places marked in the firing of the transitions, that is, the transitions than can become enabled in the next cycle. In the Static Representing Places technique, the following data structures will be available: Marked Representing Places list (MRPL) and Marked Synchronization Places list (MSPL). Treatment Lists with the marked Representing Places and Synchronization Places. Marked Representing Places list next cycle (MRPLnext) and Marked Synchronization Places list next cycle (MSPLnext). Formation Lists with the Representing Places and Synchronization Places that will be marked in the next cycle by the firing of the transitions. 2.2 Algorithm Execution Cycle Program 1 presents the basic treatment cycle of the Coordinator for ET technique and Program 2 for SRP

13 1. Modelización y Métodos Formales 5 technique. The entire code for launching the actions associated to the net places has been omitted in these programs. loop forever while elements in ETL do T = next_element (ETL) ; // enabled transition analysis if enabled (T) and predicate(t) then // transition firing // update ETL Demark_input_places (T, ETL) ; // update AETL Mark_output_places (T) ; end if ; end while ; // update ETL with AETL ETL.update(AETL); Clear(AETL); end loop ; Program 1. ET Coordinator Treatment Loop In the ET technique, ETL contains all enabled transitions at the beginning of the cycle. From this list must be extracted in the cycle execution the fired transitions and the disabled transitions (effective conflicts). AETL is built with the output transitions of the places marked in the firing of the transitions. When ETL is updated for the next cycle, transitions in AETL are verified for enabling. loop forever while elements in MRPL do Rplace = next_element (MRPL); Transitionsrepr=RPlace.transitionsrep ; while elements in Transitionsrepr do T = next_element (Transitionsrepr) ; // enabled transition analysis if enabled (T) and predicate(t) then // transition firing // update MRPL and MSPL Demark_input_places (T, MRPL, MSPL); // update MRPLnext and MSPLnext Mark_output_places (T, MRPLnext, MSPLnext); Break () ; end if ; end while ; end while ; // update MRPL with MRPLnext // update MSPL with MSPLnext MRPL.update(MRPLnext); MSPL.update(MSPLnext); Clear(MRPLnext); Clear(MSPLnext); end loop ; Program 2. SRP Coordinator Treatment Loop In the case of SRP technique, MRPL contains the marked representing places and MSPL the marked synchronization places. The output transitions of a marked representing place are verified for enabling. If a represented transition fires the verification process ends because the rest of represented transitions become disabled (effective conflict). MRPLnext and MSPLnext are built with the places that become marked in a treatment cycle. Finally, MRPL and MSPL are incremented with MRPLnext and MSPLnext respectively. 3. EXECUTION TIME CONTROLLER In the programs referred to in the previous section it is demonstrated that the coordinator cycle computation time depends on the size of the treatment and formation lists. The size of the treatment lists in the case of ET and SRP depends on the current net marking. The current marking determines the number of enabled transitions and the number of representing places marked. The size of the formation lists depends on the number of transitions that fire in the cycle (and also on the net structure). If no transitions fire, the formation list size will be zero. Thus, the computation time depends on the evolution of the net marking and on the active net part, the net structure and the sequence of events. As algorithms ET and SRP use different lists, their computation times will be different. SRP ET Execution Time Controller TET=f(ETLsize,AETLsize) TSRP=f(MRPLsize, MRPLNsize ) Error=TET-TSRP Integral=integral+error.. If integral>integralmax then changealgorithm ETLsize AETLsize MRPLsize MRPLNsize MSPLNsize Actions Events Fig. 1. Execution Time Controller With a view to minimizing Petri Net computation time, we propose an adaptive implementation that will choose the best algorithm to execute the net at a given time. We refer to this solution as the Execution Time Controller (ETC). The main function of the ETC will be to determine in real time which algorithm executes a Petri Net fastest. The ETC will execute the algorithm chosen and estimate the computation time of the non-executed algorithm, making the necessary comparisons and choosing the best algorithm in line with the controlled system. If necessary, the ETC will make the algorithm change. The execution time controller will be implemented as a highpriority thread that will be executed with the same period as the thread that executes the Petri Net. In an initial phase, the ETC loads the net and measures several static execution times related to ET and STR (see the next section). This measurement is made without carrying out any actions. The initial choice of the best algorithm is made with the results of the time measurements. The Petri Net then returns to its initial marking and the control loop is activated. The control

14 6 XI Jornadas de Tiempo Real (JTR2008) system inputs are read, the Petri Net is executed and the outputs are written cyclically. The computation times of the algorithms are then estimated. Although the computation time of the executed algorithm can be measured by reading the system clock. To avoid the overload of the control actions, the execution time of the executed algorithm (runnig_alg) are then calculated with equation (2) or (3) (it depends on the algorithm that is executed). The execution time of the alternative algorithm (alternative_alg) must be estimated with equation (2) or (3). The comparison will be made with the calculated time of the executed algorithm. The decision of to change or not to change the algorithm is based on these computing times. To avoid the overload of continuous algorithm changes, an integral cost function is used: ε = ExTcalculated ( running _ a lg) ExTestimated ( alternative _ a lg) (1) I( k 1) + ε ( k) if I( k 1) + ε ( k) > 0 I( k) = 0 if I( k 1) + ε ( k) 0 The change is made when I(k) is greater than half the computation time of the executed algorithm. When a change happens then I(k-1) = 0. // Offline Control Load Petri Net Measuring Times First Choice of the best algorithm Return to initial mark // OnLine Control loop forever Read Inputs Petri Net execution with the best algorithm Write Outputs Compute execution time of running_alg Estimate execution time of alternate_alg Compute I(k) If I(k)>(ExT calculated (running_alg)/2) then Change algorithm Initialize data structures I(k-1)=0 End if Wait for next period(); end loop Program 3. Execution Time Controller 4. ESTIMATION OF ALGORITHMS EXECUTION TIME In the implementation algorithms the dependency of the computation time on the size of the treatment and formation lists is observed. First, we study the SRP algorithm. The SRP algorithm cycle time expression is: ET(SRP)=T1*size(MRPL)+T2* FTnumber + (2) T3*size(MRPLnext) + T4*size(MSPLnext) Where: FTnumber is the number of fired transitions T1 mean consulting time of output transitions from a marked representing place T2 mean firing time of a enabled transition T3 mean update time of MRPL with a place of MRPLnext T4 mean update time of MSPL with a place of MRPLnext The ET algorithm is also analyzed. The cycle time expression is: Where: ET(ET)=T5*size(ETL)+T6* FTnumber + T7*size(AETL) T5 mean time of the enabling test of a transition T6 mean firing time of a enabled transition T7 mean update time of ETL with a transition of AETL The ETC must determine the execution time of the alternative algorithm. Thus, it needs to know the value of times T1 to T7, the number of fired transitions and estimate the size of the algorithm treatment and formation lists if it was being executed. Times T1 to T7 are measured in an off line execution test. For this purpose, the required measurement instrumentation is incorporated into the program. This instrumentation comprises the instructions required for reading the system clock and the necessary time difference calculations. If the net is executed with the SRP algorithm, the net execution computation time with the ET algorithm must be estimated. The number of fired transitions (FTnumber) is known because the different algorithms lead the net to evolve in the same way and, therefore, the number of fired transitions will be equal in all the algorithms. The size(etl) will be estimated when the enabling of the transitions represented by the marked representing places is tested. The solution adopted is to make an approximation that involves considering sensitized half of the represented transitions yet to be verified when a sensitized transition that can be fired is found. The size(aetl) will be estimated in the transition firing as the size of the set of descending transitions of the output places of the fired transitions. The results are shown in section 5. In the execution of the ET algorithm the size of the lists of the SRP algorithm must be estimated. The mean number of marked representing places is more or less constant in most nets; therefore, the size(mrpl) will be the mean value estimated in the off line time measurement test. Consequently, it can be stated that, on average, the firing of a transition involves the unmarking of its representing place and the marking of a new one. size(mrplnext) can be approximated by the number of transitions fired. (3) size (MRPLnext) FTnumber (4) size(msplnext) can be approximated by the expression: size(msplnext) FTnumber * (f p 1) (5) Where fp is the mean parallelism factor (number of output places of a transition) of the net.

15 Modelización y Métodos Formales Test Platform 5. TECHNIQUE EVALUATION A library of Petri Nets has been developed for carrying out the tests. The library is based on eight base models which can be scaled using a parameter. Some of these models are Petri Nets which are well-known and frequently used in the literature. The library comprises the following nets: SEQ. Petri Nets with one sequential process with ne (1..100) states (Fig. 2.a). PAR. Petri Nets with p (1..100) sequential processes with 2 places (Fig. 2.b). PR1. Petri Nets with p (1..40) sequential processes with 2 states and a common resource (Fig. 2.c). These belong to s 3 pr net class (Ezpeleta and Colom, 1995). DB. Petri Nets of b (5..11) databases (Fig. 2.d) (Jensen, 1997). P1R. Petri Nets with 1 sequential process and r (1..40) resources (Fig. 2.e). These belong to s 3 pr net class (Ezpeleta and Colom, 1995). PH. Petri Nets with the philosophers' problem (Dijkstra, 1971) (Fig. 2.f) with f (5..40) philosophers. SQUARE. Petri Nets with r (1..40) sequential processes of r+1 states and r common resources (Fig. 2.g, defined by the authors). a) SEQ P1_1 b) PAR PR5. Petri Nets of p (5..62) sequential processes of 6 states and 5 common resources (not showed in figure). These belong to s 3 pr net class (Ezpeleta and Colom, 1995). We have implemented centralized PN implementation techniques in Java language using Java Real-time extension (RTJS, 2000) and following some ideas presented in (Piedrafita and Villarroel, 2006 a, b). In our implementations, we used the Real Time Java Virtual Machine JamaicaVM v2.7 (Aicas, 2007). The target hardware was a personal computer with a Pentium IV processor at 1.7GHz, running Net Hat Linux Computation Time Estimation Tests The results of the tests carried out for establish the precision of estimations are shown in Fig. 3. The execution time is shown in nanoseconds of 2000 transition firings. Figures a..e show the estimation of the execution time compared with the measured execution time for the two algorithms and for several kind of nets (SEQ, PAR, PR5, SQUARE, PH) and sizes. In Fig. 3.a the results for 100 SEQ nets are shown, varying the number of places from 1 to 100. In Fig. 3.b the results for PAR nets of 1 to 100 p processes are shown. d) DB e) P1R P1_1 T1_1 T1_1... T1_2 T1_2 c) PR1... T1_ne-1... P1_ne T1_ne T1_r P1_r+1 T1_r+1 f) PH Idle_0 Idle_1 Idle_f-1 g) SQUARE Tr0_0 Tr0_1 Tr0_f-1 P1_1 P2_1 P3_1 Pr_1 Wleft_0 Wright_0 Wleft_1 Wright_1 T0 T1 T2 Tf-1 Wleft_f Wright_f T0 T1_1 T2_1 T3_1 Tr_1 R_1 Tr2_f Tr1_0 Tr1_0 Tr2_0 Tr1_1 Tr2_1 Tr1_f Tr2_f T1_2 T2_2 T3_2 Tr_2 Tleft_0 Tright_0 Tleft_1 Tright_1... Tleft_f Tright_f R_2... Tr3_0 Tr3_1 Tr3_f R_r-1 Eat_0 Eat_1 Activación periódica Eat_f T1_r T2_r T3_r Tr_r Tr4_0 Tr4_1 Tr4_f P1_r+1 R_r P2_r+1 P3_r+1 Pr_r+1 T1_r+1 T2_r+1 T3_r+1 Tr_r+1 Fig. 2. Petri Nets Library

16 8 XI Jornadas de Tiempo Real (JTR2008) 7 x 108 SEQ nets. Real Execution and time computing estimation x 108 PAR nets. Real Execution and time computing estimation SRPreal SRPest 14 ETreal ETest 12 8 x 109 PR5 nets. Real Execution and time computing estimation SRPreal SRPest 7 ETreal ETest 6 time ns 2000 firings 4 3 time ns 2000 firings time ns 2000 firings SRPreal SRPest ETreal 2 1 ETest a) parameter ne b) parameter p c) parameter p x 109 SQUARE nets. Real Execution and time computing estimation SRPreal SRPest ETreal ETest 3.5 x 109 PH nets. Real Execution and time computing estimation 3 SRPreal SRPest ETreal ETest 11 x 108 PAR nets. ET Overload for SRP estimation 10 9 ET ETestSRP time ns 2000 firings time ns 2000 firings time ns 2000 firings d) parameter p e) parameter f f) parameter p x 109 SQUARE nets. ET Overload for SRP estimation ET ETestSRP x 108 SEQ nets. SRP Overload for ET estimation 4 18 x 108 SQUARE nets. SRP Overload for ST estimation SRP SRPestET time ns 2000 firings time ns 2000 firings time ns 2000 firings g) parameter p h) SRP SRPestET parameter ne i) parameter p Fig. 3. Petri Net tests with ET and SRP. SRPreal and ETreal are the actual execution time of algorithms and SRPest and ETest the estimated ones. In figures f and g the plot ET is the execution time of the algorithm without perform the estimation of SRP and ETestSRP with estimation. In figures h and i the plot SRP is the execution time of the algorithm without perform the estimation of ET and SRPestET with estimation. In Fig. 3.c the results for 35 PR5 nets are shown, varying the number of processes from 5 to 39. In Fig. 3.d the results for 10 SQUARE nets are shown, varying the number of processes from 5 to 15. In Fig. 3.e the results for 36 PH nets are shown, varying the number of philosophers from 5 to 40. In all the performed tests the estimation has been precise enough. On the other hand, Fig. 3.f to Fig. 3.f show four examples of the overload introduced by the computation of the estimation of the alternative algorithm execution time. In all performed tests the overload is minimal. 5.3 Real Time Execution of ETC The experiments shown, were carried out on a net comprising a SQUARE of 8 resources and a PAR of 20 processes. In this net, if events reach both the SQUARE and the PAR parts in similar amount, both the SRP and ET algorithms display the same computation time. If more events reach the SQUARE part of the net, the algorithm with best behavior is the SRP algorithm (see Fig. 3.d). If more events reach the PAR part the algorithm with best behavior is the ET algorithm (see Fig. 3.b).

17 1. Modelización y Métodos Formales 9 The ETC is implemented in a Periodic Real Time Thread of high Priority with 20 ms of period. Fig. 4 shows a test in which events only reach the SQUARE part for 20 cycles (the best algorithm will be SRP) and in the following 60 events only reach the PAR part of the net (the best algorithm will be ET). Fig. 4.a shows the execution of the ETC, the estimation of the algorithm that is being executed at the time (ESTsame) and of the other algorithm (ESTother). The cost function integral I(k) (Integral+ in the figure) is calculated from the difference between the two estimations. cycle time nanosecons cycle time nanosecons time nanosecons 10 x 106 a) SQUARE8PAR20 Real Time execution an Estimation time ETC 10 x 106 b) SQUARE8PAR20 Real Time execution ETC, SRP and ET SRP ET change algorithm ET to SRP time change algorithm SRP to ET 10 x 107 c) SQUARE8PAR20 Time computing 5 ETC ESTsame ESTother Integral+ 0 integsrp integet 2.5 time Fig. 4 Execution of ETC As has been stated, the change takes place when I(k) is greater than half the current cycle time of the executed algorithm. Fig. 4.b shows the execution of the ETC, compared with the actual execution of SRP and ET when faced with the same sequence of events. cycle time nanosecons cycle time nanosecons time nanosecons x 10 6 a) SQUARE8PAR20 Real Time execution an Estimation change algorithm SRP to ET time change algorithm ET to SRP 20 x 107 c) SQUARE8PAR20 Time computing integsrp integet 10 0 ETC ESTsame ESTother Integral time ETC x 10 6 b) SQUARE8PAR20 Real Time execution ETC, SRP and ET SRP 10 ET time Fig. 5 Execution of ETC In Fig. 5 we see a test in which more events reach the PAR part for the first 1.8 seconds (the best algorithm will be ET); the events that reach the SQUARE part are gradually increased and between 1.8 and 2 seconds the two algorithms behave practically identically. More events reach the SQUARE part from second 2 (the best algorithm will be SRP). Fig. 6 shows a test in which events only reach the PAR part of the net during odd cycles (best algorithm ET) and only reach the SQUARE part during even cycles (best SRP algorithm). As can be observed, the ETC does not make any algorithm change; this is due to the fact that I(k) (Integral+) does not reach the change value and executes the SRP algorithm, which is the one with the shortest mean computation time. cycle time nanosecons cycle time nanosecons time nanosecons 10 x 106 a) SQUARE8PAR20 Real Time execution and Estimation time 4 x 107 c) SQUARE8PAR20 Time computing time ETC ESTsame ESTother Integral time ETC 10 x 106 b) SQUARE8PAR20 Real Time execution ETC, SRP and ET SRP ET intesrp inteet Fig. 6. Execution of ETC In Fig. 4.c, Fig. 5.c and Fig. 6.c we see the real error integral between SRP and ETC (intesrp), and the real error integral between ET and ETC (inteet). When these graphs are positive, they show the computation time gained with respect to the two algorithms; when they are negative, they show the slight overload that the ETC introduces with respect to the best algorithm. 6. CONCLUSIONS In this work we have developed a technique which allows the choice in real time of the most suitable algorithm to execute a Petri Net in accordance with its structure and the sequence of events. The execution of a Petri Net without a suitable algorithm can lead to significant increases in computation time, together with a less satisfactory and slower answer in control applications. The ETC has been tested with all the nets comprising the PN library, with a high best-algorithm success rate. The execution of the ETC can lead to enormous savings in computation time; in a SQUARE net with 15 resources, the saving would amount to 43% (see Fig. 3.d); in a PH net with 40 philosophers, the saving would amount to 77% (see Fig. 3.e). The success rate is lower in nets in which the SRP and ET algorithm computation time is very similar, such as PAR nets with 5 to 15 processes.

18 10 XI Jornadas de Tiempo Real (JTR2008) The ETC enables the algorithm that executes a Petri Net fastest to be chosen in real time, thus leading to faster reaction of Petri Net-based control systems. One application of the technique could be minimization of the execution time of programs written in Grafcet, by minimizing the Programmable Logic Controller cycle time. Future work will examine the following: Incorporation of new Petri Net execution algorithms into ETC. Improvement in real time algorithm computation time estimation. We hope to systematize the use of the reachability graph so as to choose the most suitable algorithm to implement a Petri Net. ACKNOWLEDGMENTS This work was funded by the NERO project DPI of the Spanish Ministry of Science and Technology. REFERENCES Aicas GmbH (2007). JamaicaVM Realtime Java Technology. Briz, J.L. (1995). Técnicas de implementación de Redes de Petri. PhD thesis, Univ. Zaragoza. Colom, J.M., Silva M., and Villarroel, J.L. (1986). On software implementation of Petri Nets and coloured Petri Nets using high-level concurrent languages. Proc of 7th European Workshop on Application and Theory of Petri Nets, Oxford. pp Dijkstra E. W. (1971). Hierarchical ordering of sequential processes. Acta Informática, vol.1, pp Ezpeleta, J., Colom, J.M. and Martínez, J.(1995). A Petri Net based deadlock prevention policy for flexible manufacturing systems. IEEE Transactions on Robotics and Automation, Vol. 11, 2, pp Jensen, K. (1987). Coloured petri nets. Petri Nets: Central Models and Their Properties, pages Springer- Verlag, LNCS 254. Murata, T. (1989). Petri Nets: Properties, Analysis and Applications. Proc. of the IEEE, 77(4), Piedrafita, R. and Villarroel, J.L. (2006). Petri Nets and Java. Real-Time Control of a flexible manufacturing cell. 11th IEEE International Conference on Emerging Technologies and Factory Automation. Prague. Piedrafita, R. and Villarroel, J.L. (2006). Implementation of Time Petri Nets in Real-time Java. The 4th International Workshop on Java Technologies for Real-time and Embedded Systems. Paris. Piedrafita, R. and Villarroel, J.L. (2007). Performance Evaluation of Petri Nets Execution Algorithms IEEE International Conference on Systems, Man, and Cybernetics. Montreal.(to be published) RTJS. The Real-Time for Java Expert Group ( 2000). The Real-time Specification for Java. Addison Wesley. Silva, M. and Velilla, S. (1982). Programmable logic controllers and Petri Nets: A comparative study. Proc. of the Third IFAC/IFIP Symposium, Software for Computer Control, pp Villarroel, J.L. (1990). Integración Informática del Control de Sistemas Flexibles de Fabricación. PhD thesis, University of Zaragoza.

19 1. Modelización y Métodos Formales 11 Modeling and Verification of Master/Slave Clock Synchronization Using Hybrid Automata and Model-Checking Guillermo Rodríguez-Navas, Julián Proenza SRV, Dept. de Matemàtiques i Informàtica Universitat de les Illes Balears, Spain guillermo.rodriguez-navas@uib.es, julian.proenza@uib.es Hans Hansson Malardalen Real-Time Research Centre Dept. of Computer Science and Electronics Mälardalen University, Sweden hans.hansson@mdh.se Abstract An accurate and reliable clock synchronization mechanism is a basic requirement for the correctness of many safety-critical systems. Establishing the correctness of such mechanisms is thus imperative. This paper addresses the modeling and formal verification of a specific fault-tolerant master/slave clock synchronization system for the Controller Area Network. It is shown that this system may be modeled with hybrid automata in a very natural way. However, the verification of the resulting hybrid automata is intractable, since the modeling requires variables that are dependent. This particularity forced us to develop some modeling techniques by which we translate the hybrid automata into single-rate timed automata verifiable with the model-checker UPPAAL. These techniques are described and illustrated by means of a simple example. I. INTRODUCTION This paper addresses the formal verification of a specific solution for fault-tolerant clock synchronization over the Controller Area Network (CAN) fieldbus [1]. This solution is called OCS-CAN, which stands for Orthogonal Clock Subsystem for the Controller Area Network [2], [3]. The aim of this formal verification is to use model checking in order to determine whether the designed fault tolerance mechanisms guarantee the desired precision in the presence of potential channel and node faults. OCS-CAN can be naturally described with the formalism of hybrid automata [4] by assuming that clocks are continuous variables. Unfortunately, the resulting automata cannot be directly verified with model checking. The main difficulties are caused by two specific characteristics of the adopted clock synchronization algorithm: the existence of clocks of various rates, and the fact that neither the rates nor the values of the clocks are independent. Without the second characteristic, the first one would not be a real problem. It is known that a system with clocks of different rates, also known as multirate clock system, can be translated into a verifiable single-rate timed automata as long as the rates of the clocks are independent [5], [6]. But the second characteristic the lack of independence poses a real challenge to model checking, as it actually relates to a more general issue in the field of hybrid systems: the undecidability of the reachability problem in hybrid automata where variables are not decoupled [4], also called non-rectangular hybrid automata. Despite this limitation, we were able to translate our nonrectangular hybrid automata into a network of timed automata verifiable with UPPAAL [7], and thus model check the precision guaranteed by OCS-CAN, as shown in [3], [8]. The essence of this translation is twofold: 1) the behavior of the system is expressed over a single timeline, and 2) the lack of precision (the offset) between the clocks is converted into the corresponding delays over that timeline. The techniques developed to perform these tasks, which are closely related to the notion of perturbed timed automata [6], are discussed in this paper. The contribution of this paper is relevant in many senses. First, it concerns the application of model checking to a realistic, and relatively complex, system. Second, it addresses a very important topic in the context of dependable embedded systems: formal verification of clock synchronization; and proposes a novel approach, since to the authors best knowledge, model checking has not been previously applied to master/slave clock synchronization. Third, it shows that despite the theoretical limitation of verifying non-rectangular hybrid automata, the model of OCS-CAN can be translated into timed automata to allow model checking of certain properties. The discussed translation techniques may inspire other researchers willing to model check hybrid systems with dependent variables. The rest of the paper is organized as follows. Sect. II introduces the notion of perturbed time automaton and relates it to the problem of clock synchronization. In Sect. III, the main characteristics of OCS-CAN are discussed, paying special attention to the properties of its clock synchronization algorithm. In Sect. IV, the basic notation of OCS-CAN is

20 12 XI Jornadas de Tiempo Real (JTR2008) Fig. 1. An example of two perturbed timed automata defined, and the aim of the formal verification is stated in terms of this notation. Sect. V describes the modeling of OCS-CAN as a network of non-rectangular hybrid automata. In Sect. VI, the translation of such hybrid automata into a network of timed automata verifiable with UPPAAL is addressed. Some verification results are presented in Sect. VII, whereas Sect. VIII summarizes the paper. Fig. 2. x2 An external observer to check precision between clock x1 and clock II. PERTURBED TIMED AUTOMATA Timed automata are, in principle, a very useful formalism to model systems with clocks. However, timed automata exhibit an important limitation: although they allow definition of multiple clocks, all clocks must evolve at the same pace [9]. This represents a limitation because real systems often work with drifting clocks, i.e. clocks that evolve at a slightly different rate, and therefore such systems cannot be directly modeled as timed automata. This limitation may, however, be overcome by adopting certain modeling techniques. One of such techniques, which is known as perturbed timed automata [6], proposes to move the uncertainty caused by the drifting clocks into the guards and invariants of the automata. A similar technique is also used in [5]. The usefulness of perturbed timed automata is illustrated by the example in Fig. 1. This example shows two automata which exhibit the same behavior: they both use a clock (x1 and x2, respectively) to trigger a periodical action (signaled through channel a1 and a2, respectively)), with period R. Both clocks are assumed to start simultaneously and to have the same maximum drift (ρ) with respect to real time. Due to this drift, they actually do not trigger the periodical actions at an exact point of time, but they may trigger it within a time interval [R - ρr, R + ρr], as defined by the guard and invariant expressions. When using such a model, the lack of synchronism between the clocks can be easily checked by an external observer, which just measures the time elapsed between the signaling over channel a1 and the signaling over channel a2. This observer is depicted in Fig. 2. Notice that location Failure can only be reached when one of the automata has performed the periodical signaling at least Π time units later than the other one. Assuming that exceeding such threshold is undesirable for some reason, the following safety property should be defined for the system: A[ ] not Observer.Failure, stating that location Failure should never be reached. Note that according to the automata of Fig. 1, the location Failure is reachable, regardless of the value of Π, because the clocks are never resynchronized. Therefore, behaviors in which they continuously diverge are possible. This perfectly Fig. 3. Architecture of an OCS-CAN system matches the behavior of a real system with unsynchronized clocks. Nevertheless, the aim of this work is to model check the clock error of a system (OCS-CAN) where clock resynchronization is periodically performed, and where the effect of resynchronization is to dynamically change the values and drifts of the clocks. For instance, we wish to specify actions such as x2:= x1, which means that clock x2 takes the value of clock x1 (i.e. x2 synchronizes to x1). This requires more complex modeling than just perturbed automata. The techniques we have developed for this modeling are described in Sect. VI. III. SYSTEM UNDER VERIFICATION OCS-CAN is designed to be incorporated into a CAN-based distributed embedded system. The role of OCS-CAN within such a system is to provide a common time view, which the processors of the nodes can rely on in order to perform coordinated actions [2], [3]. A. Architecture of OCS-CAN OCS-CAN is made up of a set of specifically designed hardware components, named clock units, which are interconnected through a CAN bus. When OCS-CAN is used, a clock unit is attached to every node of the system, as depicted in Fig. 3, along with the processor and the fieldbus controller (FC). Notice that the clock unit has its own connection to the CAN bus. The clock unit is provided with a discrete counter, the socalled virtual clock, which is intended to measure real time. The clock units execute a master/slave clock synchronization algorithm, which aims at keeping all virtual clocks within a given interval of tolerance, which is called precision. In

21 1. Modelización y Métodos Formales 13 Fig. 4. Transmission pattern of the TM in the absence of faults SOF Data field t_reference t_reference t Fig. 6. Order of events within a synchronization round Fig. 5. The Time Message contains a timestamp of the Start of Frame bit principle, only one of the clock units (the master) is allowed to spread its time view, and the rest of clock units (the slaves) synchronize to this time view. In order to spread its time view, the master periodically broadcasts a specific message, which is called the Time Message (TM). Fig. 4 shows the transmission pattern of the TM when the resynchronization period is R time units. The function of the TM is twofold: it signals the resynchronization event, which coincides with the first bit (the Start of Frame bit) of the TM, and also contains a timestamp that indicates the occurrence time of that event. This is depicted in Fig. 5. Thanks to such timestamp mechanism, after receiving the TM, every slave can adjust the value and the rate of its virtual clock to take the value and the rate of the master s virtual clock [2]. B. Fault Tolerance Issues Concerning the fault model, it is important to remark that the failure semantics of the clock unit is restricted to crash failure semantics by means of internal duplication with comparison. With respect to channel faults, OCS-CAN assumes the CAN bus to provide timely service but not reliability nor data consistency. This means that a TM broadcast by a master clock unit at time t is expected to be delivered to some clock unit within the interval (t, t + wcrt] or not delivered at all, where wcrt is the worst-case response time of the message [10]. Both inconsistent duplicates and inconsistent omissions of the TM, as defined in [11], [12], may occur. Permanent failures of the bus, such as bus partition or stuck-at-dominant failures, are not addressed by OCS-CAN. In order to provide tolerance to faults of the master, OCS- CAN defines a number of backup masters, one of which should take over upon failure of the active master. The mechanism for master replacement assumes that masters are organized hierarchically. The priority of a master is defined with two parameters. The first parameter is the identifier of the TM broadcast by the master; following the common convention in CAN, a lower identifier implies higher priority. The second parameter is the release time of the TM, which for every round indicates to every master when it is allowed to broadcast its corresponding TM. The release time of master m in the resynchronization round k, is calculated as follows: T rlsm = k R + m Where R is the resynchronization period (the same for all masters) and m (the release delay) is a small delay in the order of a few ms whose length is inversely proportional to the priority of the master. The release time, combined with the assignment of identifiers discussed above, must guarantee that in a round, a master may broadcast its TM before a master of higher priority only if the latter is faulty. This is depicted in Fig. 6, for the case of three masters. In the absence of faults, the second and third TM are usually not broadcast, and if any of them is broadcast (for instance because one backup master could not timely abort a just-requested TM broadcast) then it is ignored by the slaves. The spare TMs are only taken into account if master 0 fails and is not able to broadcast its TM. Thanks to the master redundancy, in such situation the system will recover after a very short delay. Nevertheless, in a CAN network it may happen that a message is not consistently received by all the nodes, as discussed in [11], [12]. In such cases, the clock units might not receive a TM to synchronize with, or even worse, in the same round different clock units may synchronize to TMs broadcast by different masters. These scenarios, although being rather unlikely, may jeopardize clock synchronization and should be carefully studied. A fundamental property of the CAN protocol states that, regardless of being consistent or not, a CAN broadcast always finishes within a bounded time interval, so the worst-case response time of any broadcast can be calculated, as discussed in [10]. In OCS-CAN this property implies that whenever a master clock units requests a TM broadcast, this request causes a reception of the TM in some other clock units before wcrt time units, or it does not cause any reception at all. This property also means that for every resynchronization round, receptions of the TM may only happen within a bounded temporal interval. This is shown in Fig. 6 by means of a shadowed window, which is called TMdelay. In an OCS- CAN system, the length of TMdelay is equal to l + wcrt l, where l is the master of lowest priority in the system. Since clock synchronization may only happen after reception of a TM, this implies that the maximum distance between two consecutive synchronizations of a clock unit is Rmax = R + TMdelay. Although it is not properly represented in Fig. 6, R is always much greater than TMdelay.

22 14 XI Jornadas de Tiempo Real (JTR2008) C. Master and Slave Finite State Machines This section describes the algorithms executed by the clock units, as they are fundamental to understand the model used for formal verification. Every clock unit may behave either as a master or a slave. A non-faulty master clock unit executes the finite state machine in Fig. 7, whereas a non-faulty slave clock unit executes the finite state machine in Fig. 8. Both algorithms are built upon five primitives: TM.Request, TM.Indication, TM.Confirm, TM. Abort and Sync. TM.Request. A master executes TM.Request to broadcast its TM as soon as it reaches the corresponding release time. This primitive is denoted TM.Req(n), where n is the identifier of the TM broadcast. Further information about the lowlevel actions triggered by TM.Req, such as timestamping, is available in [3]. TM.Indication. This primitive is executed when a TM is received. It is denoted TM.Ind(n), where n indicates the identifier of the received TM. Every master compares the value of n with its own identifier (m) to determine whether this TM comes from a higher priority master (case n < m) or not. Masters may only synchronize to masters of higher priority. TM.Confirm. This primitive indicates to the transmitting master that a previously requested TM broadcast has been successful. It is denoted TM.Conf(n), where n indicates the identifier of the successfully broadcast TM. TM.Abort. A master uses this primitive to abort the broadcast of a TM whose transmission was previously requested. It is denoted TM.Abort(n), where n is the identifier of the TM to be aborted. This action is caused by the reception of a higher priority TM, and has some associated latency so it may be the case that the TM broadcast is not timely aborted. Sync. This primitive is executed by any clock unit (either master or slave) that receives a valid TM and wants to adjust its own virtual clock to the value conveyed by the TM. For the slaves, a valid TM is the first TM received in any resynchronization round (first TM.Ind(n)). For the masters, a valid TM is the first TM of higher priority received in any resynchronization round (the first TM.Ind(n) with n < m), provided that the master did not successfully broadcast its own TM in that round. This primitive is denoted Sync(n, a), where a indicates the clock unit that is adjusting its virtual clock, and n is the identifier of the TM which clock unit a is using as a reference. Concerning the Sync primitive, it is important to remark that the clock adjustment can never be exact. Even with the very accurate timestamping mechanism of OCS-CAN [3], certain imprecision remains, for instance due to small system latencies or to fixed-point arithmetics. Note that a clock unit can only synchronize once per round. This is ensured by entering a waiting state after execution of the Sync primitive, in which further receptions of TM are R ignored. Given that 2 > TMdelay (as already indicated in Sect. III-B), we ensure that TM duplicates and non-aborted TMs cannot cause duplicated resynchronizations. Fig. 8. Behavior of a non-faulty slave s are specially useful for describing the aim of our formal verification, which is to model check the precision guaranteed by OCS-CAN under diverse fault assumptions. A. Basic Definitions The synchronization algorithm is characterized by the resynchronization period R, and two parameters ɛ 0 and γ 0, which indicate the "quality" of the mechanism for clock adjustment. The failure assumptions are defined with two values OD (the omission degree) and CD (the crash degree), which indicate the maximum number of consecutive rounds affected by inconsistent message omissions and the maximum number of faulty masters, respectively. Definition 1 An OCS-CAN system is a set: such that: OCSS = {A, R, ɛ 0, γ 0, OD, CD} A is a set of clock units. R R + is the resynchronization period of the clock synchronization algorithm. ɛ 0 R + is the maximum offset error after synchronization. γ 0 R +, γ 0 << 1, is the maximum drift error after synchronization. OD N is the omission degree. CD N is the crash degree. In an OCS-CAN system, the state of a clock unit is defined at any instant by the three following variables: the value of its virtual clock vc(t), the rate of its virtual clock vc(t), and its operational state f(t). Furthermore, every clock unit is characterized by the following three additional parameters, which indicate how the clock unit executes the clock synchronization algorithm: the relative priority (p) of the TM that the clock unit broadcasts, the release delay ( ) of the TM that the clock unit broadcasts, and the worst case response time (wcrt) of the TM that the clock unit broadcasts. Definition 2 A clock unit a A is a 6-tuple: IV. AIM OF OUR FORMAL VERIFICATION In this section, the basic notions of OCS-CAN, such as clock unit or virtual clock, are formally defined. These definitions such that: a = (vc a, vc a, f a, p a, a, wcrt a )

23 1. Modelización y Métodos Formales 15 Fig. 7. Behavior of a non-faulty master m vc a (t) R + is the value of the virtual clock of a at time t, t R +. vc a (t) R + is the instantaneous rate (or speed) of the virtual clock of a at time t, t R +. f a : R + {0, 1} is the operational state of clock unit a. f a (t) = 1 when a is faulty at time t, otherwise f a (t) = 0. p a N is the relative priority of the TM that clock unit a broadcasts, where p a = 0 means that the clock unit never broadcasts the TM. a R + is the release delay of the TM that clock unit a broadcasts. wcrt a R + is the worst case response time of the TM that clock unit a broadcasts. Note that although the virtual clock of a clock unit is actually implemented as a discrete counter, and therefore it may take only values over N, we define it over R + for compatibility with the definition of time in timed automata. Also note that the values of a and wcrt a are irrelevant for slaves. B. Offset and Precision In OCS-CAN, each clock unit supplies its corresponding processor with a local view of real time. Therefore, the consistency in the perception of time depends on the difference (or offset) exhibited by the virtual clocks. Definition 3 Let A be a set of clock units. The maximum offset of set A at time t is: Φ A (t) = max a,b A { Φ ab(t) } where: Φ ab (t) = vc a (t) vc b (t) is the offset between clock units a, b A at time t. When the maximum offset between the clock units is always bounded, then the OCS-CAN system is said to be synchronized. Definition 4 An OCS-CAN system is Π-synchronized when there exists a constant Π R +, which is called the precision, such that Φ A (t) Π, t R +. The extent to which the system is synchronized depends on the value of Π. The lower the value of Π, the higher is the achieved precision. Last, we define the concept of consonance between two clock units, as this concept turns out to be very important when modeling drifting clocks. Definition 5 Let a, b A be two clock units. The consonance between them at time t is: γ ab (t) = vc a (t) vc b (t) V. MODELING OCS-CAN AS A NETWORK OF HYBRID AUTOMATA The first step of model checking is to specify a formal model of the system under verification. Whenever a system combines both continuous components, which evolve over time as expressed by a differential equation, and discrete components, expressed by finite state machines, hybrid automata are very suitable for the modeling [13]. This is the case of OCS-CAN, since the virtual clocks can be easily modeled as continuous variables that are modified by the (discrete) synchronization actions performed by the clock units. In this section, we discuss how the behavior of OCS- CAN can be specified by means of hybrid automata. It is shown that the resulting model includes variables that are not independent. Although this characteristic makes, in principle, the verification of our model unfeasible by model checking, in Sect. VI we show that the model can still be translated into timed automata. Thanks to this, some safety properties, such as the guaranteed precision, can be verified. A. Channel Abstraction The communication channel is abstracted by means of an additional process channel_control, together with a global variable msg_id and a broadcast channel [7] called

24 16 XI Jornadas de Tiempo Real (JTR2008) Since the Sync(n,a) primitive may cause discontinuities of the virtual clock values as well as discontinuities of the virtual clock rates, it makes sense to define the following notation. f(t + 0 ) = lim t t 0 f(t) Fig. 9. Abstraction of the communication channel in OCS-CAN f(t 0 ) = lim t t 0 f(t) After that, the points of discontinuity can be characterized. tx_msg. This abstraction is shown in Fig. 9. The function of the automaton channel_control is to enforce the worstcase response time of the TM broadcasts. A full description of channel_control is available in [3]. The variable msg_id represents the identifier of the TM being broadcast. TM.Request is modeled as a write operation over msg_id, and CAN arbitration [1] is modeled by allowing the masters to overwrite the value of msg_id only whenever they have higher priority that the TM being transmitted. Therefore, TM.Req(m) is modeled with the following assignment: msg_id:= min{m,msg_id}. However, and for compatibility with the UPPAAL model checker, we hereafter use the C-like assignment: msg_id:= m <? msg_id, which is equivalent. The broadcast channel tx_msg is used by channel_control to signal the instant at which the TM is delivered. Therefore, TM.Confirm and TM.Indication primitives are both signaled through tx_msg. For a master, a signaling through tx_msg is a TM.Confirm if the value of msg_id is equal to the identifier written by the master. Otherwise, it is a TM.Indication. B. Abstraction of Clock Correction In [2], we provide some details about the way virtual clocks are corrected (or adjusted) in OCS-CAN, and we highlight that clock correction is never performed immediately, but it is gradually carried out. This is called clock amortization. Nevertheless, for the purpose of modeling and formal verification, we assume instantaneous clock correction instead of clock amortization. We make this abstraction because including clock amortization would cause an unnecessary complexity in the modeling. We are interested in assessing the maximum error (the achievable precision) between virtual clocks, and to do this we only have to examine the value of the virtual clocks a long time after the last resynchronization action. At these time instants, and provided that clock amortization is properly implemented [14], there is no difference between considering either instantaneous clock correction or clock amortization. When instantaneous clock correction is assumed, executing the Sync(n,a) primitive is equivalent to assigning the value and the rate of the virtual clock of master n to the virtual clock of the synchronizing clock unit a. Since this assignment is never exact, the value and the rate assigned are always within an error interval. The width of this interval is determined by the maximum offset error ɛ 0, in the case of clock value assignments, and by the maximum drift error γ 0, in the case of clock rate assignments. Definition 6 Let a A be a clock unit and m M be a master. Then both vc a (t) and vc a (t) are piecewise linear functions such that: vc a (t + ) = B( vc m (t ), γ 0 ) when clock unit a executes Sync(m, a) at time t. vc a (t + ) = B(vc m (t ), ɛ 0 ) when clock unit a executes Sync(m, a) at time t. where B(x, ɛ) = [x ɛ, x + ɛ]. Remark 1 Let m M be a master and a A be a clock unit. If clock unit a executes Sync(m,a) at time t then Φ ma (t + ) ɛ 0 and γ ma (t + ) γ 0. C. Master and Slave Hybrid Automata When using the discussed abstractions for the communication channel and for clock correction, the hybrid automaton of a master corresponds to the one in Fig. 10. Notice that in the transitions where the Sync(n, a) primitive should be executed, which were described in Sect. III-C, this model includes assignments to the virtual clock s value and to the virtual clock s rate, as specified in Sect. V-B (Definition 6). Particularly, these assignments occur in the transitions from location 1 to location 4 and from location 2 to the committed location right before locations 3 and 4. Furthermore, this automaton models three additional characteristics of OCS-CAN masters: the inconsistent reception of the TM, the possible non-abortion of the TM, and the possibility of master crash. A full description of these characteristics can be found in [3]. Inconsistent receptions of the TM are modeled at the receiver s side, by ignoring TM.Indications. For this reason, in locations 1 and 2, it is possible that a transition fired by a valid TM (tx_msg? with msd_id < m) does not cause any modification of the virtual clock. When describing the management of the TM in Sect. III- C, it was mentioned that a TM broadcast may not be timely aborted. This is modeled with a committed location, between locations 2 and 3, which is reached when the master has performed a TM.Request, but receives a TM.Indication of a higher priority master. From this location, the master may either overwrite again the variable msg_id or not. The first behavior would represent a non-aborted message. The master hybrid automaton includes a location that represents the crash failure (location 5). Notice that a master may nondeterministically step into this state as long as there is another non-faulty master in the system (condition nalive > 1).

25 1. Modelización y Métodos Formales 17 Fig. 10. Hybrid automaton of master m Fig. 11. Hybrid automaton of slave s The hybrid automaton of a slave is depicted in Fig. 11. This automaton also models the synchronization as an assignment to the virtual clock s value and to the virtual clock s rate, according to Definition 6. The possibility of inconsistent receptions of the TM is modeled by having transitions that are fired by TM.Indications but do not cause any clock correction. Crash failures are not modeled for slaves, as such failures do not have any consequence for the rest of the system. VI. TRANSLATING THE MODEL INTO TIMED AUTOMATA As discussed in Sect. IV, the aim of our formal verification is to determine whether an OCS-CAN system is Π-synchronized under certain fault hypotheses or not. This formal verification is addressed by translating our hybrid automata into a network of timed automata verifiable with UPPAAL. The main challenge of such translation is that, as a consequence of the Sync actions, clock and rate assignments exist. Although these assignments cannot be directly specified in timed automata notation, we circumvent this limitation in the following way: 1) the behavior of the system is expressed over a single timeline, and 2) the offset between the virtual clocks is converted into delays over that timeline. Therefore, the first step is to decide what this single timeline represents. In our model, time corresponds to the clock of the highest priority master, which is called reference clock hereafter. For the rest of clock units, we use the consonance (γ i ) with respect to this clock in order to calculate the delays over the reference timeline. Furthermore, four additional aspects need special consideration: The instant when the offset is to be checked has to be properly defined. This instant is called the observance instant. Updates of the value of a virtual clock, as defined in the equations of Sect. V-B, must be modeled. Updates of the rate of a virtual clock need to be modeled as well. Particularly, it is important to model how a rate change may affect the consonance with respect to the reference clock. The model must include changes of the reference clock when the master of highest priority crashes. In the following, these aspects are described in detail. A model of a simplified OCS-CAN system, which is made up of two masters and an arbitrary number of slaves, is used to illustrate the main points. The modeling of the failure assumptions of OCS-CAN is not included to reduce the complexity of the model and help reader s understanding. The complete UPPAAL model can be found in [8]. A. Definition of the Observance Instant In order to adopt the verification technique described in Sect. II (the precision observer), the observance instant must be known a priori, and it must be signaled by all of the clock units. Since we are interested in knowing the precision of OCS-CAN, we should check it at the instant with the maximum offset. This instant must be located before a Sync(n,a) primitive because the involved clocks converge immediately after this primitive is executed. Although it is not possible to know the exact instant of execution of any Sync(n,a) primitive, it is possible to determine the maximum distance between the synchronization instants of two consecutive rounds. In Sect. III-C it was shown that the maximum distance is given by Rmax= R + TMdelay.

26 18 XI Jornadas de Tiempo Real (JTR2008) Fig. 14. Auxiliary automata Fig. 12. Fig. 13. Virtual clock of clock unit i Precision observer This value can be used to upper bound the offset accumulated during one synchronization round, as described next. Fig. 12 depicts the virtual clock automaton, which models the behavior of virtual clock i. Although one virtual clock is included for each clock unit in the system, this automaton does not describe the behavior of a master or a slave; what it actually models is the passage of time as measured by clock unit i, and represents it with the clock vc[i]. According to the value of vc[i] certain events are signaled, so the clock units (either master or slave) can execute the clock synchronization algorithm discussed in Sect. III-C. This means that in every round, every master and slave automaton chooses which virtual clock it uses, which is equivalent to having clock assignments. As shown in Fig. 12, the virtual clock automaton signals three events: the instant to broadcast the TM (through channel begin[i]); the observance instant; and the instant for resetting the virtual clocks (through channel end[i]). Notice that the first two events are signaled within time intervals whose lengths depend on the consonance (γ i ) with respect to the reference clock. In the second event, the integer variable nsync is incremented. This variable is monitored by the observer depicted in Fig. 13, in order to detect the first virtual clock to reach this point. This observer resets clock watch after that event. If watch exceeds a given value Π before all nodes increment nsync then location Failure will be reached, expressing that the system is not Π-synchronized. Note that the observer makes use of the synchronization channel a, which is activated by the dummy automaton shown in Fig. 14. B. Modeling of Virtual Clock Value Assignments Once the observance instant and the precision observer have been defined, the model must ensure that, for each virtual clock, the delay in reaching this instant really corresponds to the offset between the virtual clock and the reference clock. According to the hybrid automata of Sect. V-C, the Sync(n,a) primitive causes an update on the value of the synchronizing virtual clock. This kind of clock assignments may be indirectly modeled with the simultaneous restart of the clocks involved in the synchronization action. However, in our model virtual clocks cannot be restarted immediately after the Sync(n,a) primitive because this would interfere the role of the observer. Instead, virtual clocks have to continue until they reach the observance instant and signal it. This forces us to delay the simultaneous restart. In this manner, the Sync(n,a) primitive does not cause a clock assignment (which is not possible in a timed automaton) nor an immediate restart (which would make the measurement of the precision impossible). Instead, Sync(n,a) causes an assignment to a clock pointer, the variable ref_id, which is used later on to detect when to restart vc[i]. This is shown in Fig. 15 for master 2, and in Fig. 16 for slave j. In these automata, the channel abstraction of Sect. V-C, based on the variable msg_id and the channel tx_msg, is further simplified to reduce the complexity of the automata and improve legibility. In fact, Sync(1,j) is signaled through channel s1 whereas Sync(2,j) is signaled through channel s2. In both automata it can be observed that vc[i] is restarted when the corresponding virtual clock automaton signals through channel end[ref_id] that the pointed clock has reached a certain value R1= R/2 (third event in Fig. 12). This modeling technique guarantees that all the clocks that have synchronized to the same master are restarted simultaneously, thus fulfilling Remark 1 in Sect. V-B. In contrast, whenever two clocks do not synchronize to the same master, the offset that these two clocks have accumulated in the round is kept for the next round. Channel all_end, which appears in the automaton of Fig. 12, is used in order to avoid violation of time invariants. The left auxiliary automaton of Fig. 14 uses this channel to make every virtual clock automaton wait until all masters and slaves have reset vc[i]. C. Modeling of Virtual Clock Rate Assignments Clock rate assignments can be easily modeled with a variable γ i that keeps the consonance with respect to the reference clock. This variable is used by the virtual clock automata in order to define the interval of occurrence of any relevant event.

27 1. Modelización y Métodos Formales 19 TABLE I FAULT ASSUMPTIONS AND PRECISION GUARANTEED (IN µs) WITH R = 1 SEC # Channel faults # Faulty masters No faults OD = OD = OD = OD = Fig. 15. Automaton of master 2 Fig. 16. Automaton of slave j In Fig. 15 and 16 it can be seen that the value of γ i is updated in every synchronization action. Whenever a clock unit does not synchronize to any master within a synchronization round, the value of γ i remains unchanged. It is important to remark that whenever a clock unit synchronizes to a master that is not the current reference clock, the clock unit "inherits" the drift error of that master. In this case, the consonance after synchronization may be worse than before synchronization. This can be observed in one of the transitions fired by s2 in the slave automaton of Fig. 16. D. Change of the Reference Clock due to Master Crash Whenever the reference clock crashes, the timeline of the model needs be redefined. Although it is not shown in the automata, this recalculation is implicitly performed if in every round the value of γ i is assigned as follows: If master i is the current reference clock: γ i := 0. If clock unit i synchronizes to the current reference clock: γ i := γ 0. If clock unit i synchronizes to a master n that is not the current reference clock: γ i := γ 0 + γ n + γ ref ; where γ ref is the consonance between the current reference clock and the reference clock of the previous synchronization round. VII. SOME VERIFICATION RESULTS By applying the transformations described above, an OCS- CAN system can be modeled as a network of timed automata and the guaranteed precision can be model checked. In a previous paper we provided some results that were achieved with the complete UPPAAL model of OCS-CAN [8]. These verification results were obtained in the following situations: Fault-free scenario. Only master faults (no channel faults). Only channel faults (no master faults), assuming data consistency and without assuming data consistency. Master faults and channel faults, assuming data consistency and without assuming data consistency. Concerning the precision guaranteed by the clock synchronization service, Table I shows the precision that was verified under diverse fault assumptions. These results were obtained with the following parameters: N= 4 masters, R= 1s, 0 = 0, 1 = 1 ms, 2 = 2 ms, 3 = 3 ms. Regarding the network load, it was assumed that no other messages were sent on the bus, so wcrt= 1.04 ms was used in those scenarios without channel faults whereas wcrt= 6 ms was used in those scenarios with channel faults. The first cell in Table I shows the precision guaranteed in the fault-free scenario. This precision equals to 2 µs. The first row of Table I corresponds to the scenarios in which only master s faults were assumed. Note that the number of faulty masters does not affect significantly the precision guaranteed. This is due to the fact that master replacement takes place in a very short time, which is negligible compared to R. The first column of Table I corresponds to the scenarios in which only channel s faults were assumed. OD= 0 indicates that no inconsistent omissions can occur, which is a common assumption in other clock synchronization protocols for CAN. The rest of cells in Table I correspond to the scenarios where a combination of node and channel faults is assumed. In particular, the right bottom cell corresponds to the most severe fault scenario. VIII. CONCLUSIONS In this paper, the formal verification of OCS-CAN has been discussed. OCS-CAN is a solution for clock synchronization over CAN that adopts a fault-tolerant master/slave clock synchronization. It has been shown that this system can be naturally described with hybrid automata, by modeling the virtual clocks as variables that evolve over time with certain rates. An important particularity of these hybrid automata is that they are not rectangular, because of the inevitable dependencies that the clock synchronization actions cause among the clocks. This lack of independence makes, in principle, the verification of these timed automata intractable by model checking. However, we have shown that it is possible to

28 20 XI Jornadas de Tiempo Real (JTR2008) translate the hybrid automata into a timed automata verifiable with the UPPAAL model checker. Thanks to this, the precision guaranteed by OCS-CAN has been successfully model checked under diverse fault assumptions. The techniques developed in order to carry out such translation have been presented, and they have been illustrated in a simple example. These techniques somehow extend the notion of perturbed timed automata, by allowing drifting clocks whose rates may change dynamically as a consequence of discrete actions. Our modeling may be useful for other researchers that aim at model checking hybrid systems in which variables are dependent. Acknowledgments: This work is partially supported by DPI C03-02 and FEDER funding. Authors would like to thank Mercè Llabrés and Antonio E. Teruel for their useful remarks on the mathematical notation. REFERENCES [1] ISO: ISO Road vehicles - Interchange of digital information - Controller area network (CAN) for high-speed communication (1993) [2] Rodríguez-Navas, G., Bosch, J., Proenza, J.: Hardware Design of a High-precision and Fault-tolerant Clock Subsystem for CAN Networks. Proceedings of the 5th IFAC International Conference on Fieldbus Systems and their Applications (FeT 2003), Aveiro, Portugal (2003) [3] Rodríguez-Navas, G., Proenza, J., Hansson, H.: Using UPPAAL to Model and Verify a Clock Synchronization Protocol for the Controller Area Network. Proc. of the 10th IEEE International Conference on Emerging Technologies and Factory Automation, Catania, Italy (2005) [4] Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What s decidable about hybrid automata? Journal of Computer and System Sciences 57(1) (1998) [5] Daws, C., Yovine, S.: Two examples of verification of multirate timed automata with KRONOS. In: Proceedings of the 16th IEEE Real-Time Systems Symposium (RTSS 95), Pisa, Italy. (1995) [6] Alur, R., Torre, S.L., Madhusudan, P.: Perturbed Timed Automata. In Morari, M., Thiele, L., eds.: 8th International Workshop, Hybrid Systems: Computation and Control, HSCC Number 3414 in LNCS, Springer Verlag (2005) [7] Behrmann, G., David, A., Larsen, K.G.: A tutorial on UPPAAL. In Bernardo, M., Corradini, F., eds.: Formal Methods for the Design of Real-Time Systems: 4th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM- RT Number 3185 in LNCS, Springer Verlag (2004) [8] Rodriguez-Navas, G., Proenza, J., Hansson, H.: An UPPAAL Model for Formal Verification of Master/Slave Clock Synchronization over the Controller Area Network. In: Proc. of the 6th IEEE International Workshop on Factory Communication Systems, Torino, Italy. (2006) [9] Alur, R., Madhusudan, P.: Decision problems for timed automata: A survey. In Bernardo, M., Corradini, F., eds.: Formal Methods for the Design of Real-Time Systems: 4th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM-RT Number 3185 in LNCS, Springer Verlag (2004) [10] Tindell, K., Burns, A., Wellings, A.J.: Calculating Controller Area Network (CAN) Message Response Time. Control Engineering Practice 3(8) (1995) [11] Rufino, J., Veríssimo, P., Arroz, G., Almeida, C., Rodrigues, L.: Faulttolerant broadcasts in CAN. Digest of papers, The 28th IEEE International Symposium on Fault-Tolerant Computing, Munich, Germany (1998) [12] Proenza, J., Miro-Julia, J.: MajorCAN: A modification to the Controller Area Network to achieve Atomic Broadcast. IEEE Int. Workshop on Group Communication and Computations. Taipei, Taiwan (2000) [13] Henzinger, T.A.; Pei-Hsin Ho; Wong-Toi, H.: Algorithmic analysis of nonlinear hybrid systems. IEEE Transactions on Automatic Control 43(4) (1998) [14] Schmuck, F., Cristian, F.: Continuous clock amortization need not affect the precision of a clock synchronization algorithm. In: PODC 90: Proceedings of the ninth annual ACM symposium on Principles of distributed computing, New York, NY, USA, ACM Press (1990)

29 1. Modelización y Métodos Formales 21 Software Modeling of Quality-Adaptable Systems Javier F. Briones, Miguel Ángel de Miguel, Alejandro Alonso, Juan Pedro Silva Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid Abstract Enclosing quality properties with software designs is typically used to improve system understanding. Nevertheless, these properties can be used to compose systems whose quality can be adapted and predict their behavior. Existing software modeling languages lack enough mechanisms to describe software elements that may offer/require several quality levels. This paper presents concepts that such a language need to include to describe quality-adaptable systems. Index Terms Software Modeling Language, Software Architecture, Quality of Service D I. INTRODUCTION EPENDABLE systems are those on which reliance can justifiably be placed on the functionality they deliver. Reliance is usually evaluated by means of non-functional parameters and encompasses measures such as reliability, availability, safety and security. As the dependence on computing systems increases, so does the likelihood of hazards and malfunctions caused by those systems. The consequences range from merely inconveniences (e.g. poor quality in a video application) to the loss of human lives. Engineering systems to the highest reasonably practicable standards of dependability is a great challenge. To better deal with this challenge, engineers could study primary sketches of the systems attempting to estimate their probable dependability measures; and not less important they could include exhibited quality constrictions in analysis and design documents. During the design of the system architecture these constrictions can be allocated to the different subsystems and thus responsibilities broken down. Only when the functional and the non-functional characteristics of a subsystem are completely described and a rigorous development process is put in practice, enough confidence can be placed on the future subsystems. Sometimes it is appealing to create subsystems adaptable in its environment, e.g. when an air traffic management vendor makes a subsystem and expects to make profit from subsequent sells, or when a multimedia subsystem needs to deliver different quality levels according to user wishes or environmental conditions Adaptable subsystems can vary the set of non-functional properties it provides. This adjustment can occur during design time as in the first example or in run-time as in the second one. Nevertheless, the use of non-functional properties from design documents is useful in both cases. In the first one, to help to compose the architecture selecting the subsystems (potentially from different vendors) that all together deliver the required dependability properties. Certainly, the impossibility to deliver such properties could be reached. And in the second case, to simulate how the composition and adaptation would work and thus detect quality levels that cannot be achieved, or subsystems overused since under most circumstances they are delivering very high quality. In this work we undertake: the representation of non-functional properties in design documents the description of software architectures adaptable in quality the automatic composition of contracts based on the nonfunctional properties exposed by design documents the prediction of the quality behavior that the system being constructed will exhibit the required mechanisms to allow simulations of adaptable architectures. II. RELATED WORK The notion of building dependable systems by enforcing contracts between subsystems has been largely exploited in the literature. [1] remarks the importance of determining beforehand whether we can use a given component within mission-critical applications. This information takes the form of specification and against it the component can be verified and validated, thus providing a kind of contract between component and its clients. The authors differentiate four classes of contracts in the software component world: basic or syntactic, behavioral, synchronization, and quality of service contracts. They explain the particularities of each one and examine the different technologies used to deal with each kind. They distinguish four phases a contract passes through: definition, subscription, application, and termination/deletion. During the contract application, contracts should be monitored to check whether they are violated, and violations can be handled in different ways: ignore, reject, wait, and renegotiate. The authors also identify different types for data constraints:

30 22 XI Jornadas de Tiempo Real (JTR2008) precondition, post condition and invariants, as they do with control constraints. It is a wide-ranging paper that covers the lifecycle of the four different contracts, as opposed to our work which focus on only quality of service contracts. Currently, we concentrate more in the design phase of the development, so the definition and subscription (what we call composition) of contracts is better covered. We focus, as well, in just a specific group of technologies included by the term model driven software development. Concerning constraints and up to now we only work with invariants. About the specification of QoS-aware architectures, we want to highlight [3]. This paper recognizes three main techniques used to specify QoS systems: extensions of Interfaces Description Languages (IDL), mathematical models, and UML extensions and meta-models. The author elects this last technique as it is part of the initial submission of the OMG standard [11] UML Profile for Quality of Service and Fault Tolerance Characteristics and Mechanisms. We coincide with this election, and our work could be considered an extension to this standard although the work is reusable if other UML extensions or similar meta-models are used. [3] also points that QoS specifications can be used for different purposes: i) specification of QoS-aware architectures, ii) management of QoS information in QoS reflective infrastructures (e.g. QoS adaptable systems), and iii) generation of code for management of QoS concepts (e.g. negotiation, access to resource managers). This is our main intention, to demonstrate that QoS specifications are utilizable to compose, analyze, simulate and develop QoS-adaptable systems. Nevertheless, the cited standard centers in the annotation of design models, providing support for QoS characteristics, QoS constraints, QoS execution modes, and QoS adaptation and monitoring. [3] and this standard is a reference of our work in the annotation of QoS-aware architectures. In the position paper [9] P. Collet investigates what properties need to be provided by the languages supporting the four levels defined above in the context of software components. To be used as a contract, he considers that a contract must provide: a specification formalism, a rule of conformance (to allow substitution), a runtime monitoring technique. For the first requirement he does not have an ideal candidate although he thinks QML [8] (QoS Modeling Language) looks like the most advanced QoS specification language. QML does not provide any means to express a QoS contract according to some parameters that would come from the component interface. In QML, a contract is an instance of contract type (QoS aspects, such as performance or reliability), and a profile associates a QML contract with an interface. He remarks the importance of considering partial conformance regarding the second requirement. Concerning the last requirement he shows, after assuming that is not possible to fully verify statically that contracts are never violated, the need of a contract monitoring framework to verify functional and non-functional properties included in the contracts. Besides QML (proposed e.g. by [6] and [9]), other specification formalisms are been used. We will use an extended version of the OMG standard [11] UML Profile for Modeling QoS and Fault Tolerance Characteristics and Mechanisms (used e.g. by [3] and [5]); complemented with OCL2 [12] to represent the constraints. Other option is CCL-J inspired by OCL. The work in [7] enhances this formalism to adapt it to a specific component model. CCL-J provides five kinds of specifications: inv, post, pre, guarantee (guarantor is the implementation of the method), rely (guarantor is the method caller). Another choice is QoSCL [10] (QoS Contract Language). This is a QoS-aware component meta-model which includes some QML concepts and allows designers to define quality levels contracts attached to provided and required interfaces. Another work [2] describes the architecture of a distributed middleware that provides end-to-end QoS management services in an application independent framework. Services provided include: characterization and specification, negotiation and establishment, and adaptation. As we do, it asserts the importance to provide, for each component, information on how the QoS of its output depends on the QoS of its input and the amounts of resources it uses. The authors use the concepts of feasible region and reward function; similar notions can be found in our work with the minor difference that we treat resources as another QoS (a QoS provided by a model element). The paper nevertheless, focus on the negotiation at run-time, meanwhile we mainly do on the contract binding at design time. There exists works that research how to incorporate contracts in a model-driven development life cycle. Authors of [4] consider contracts, including non-functional, as a modeling technology to foster the assembly of componentbased applications mainly by allowing case tools checking when the contracts are fulfilled. To support modeling contract-aware components in UML they present an extension to the UML meta-model. Even when they intended to keep changes as little as possible, this diverges from our intention of not modifying UML at all. For the authors abstract contracts identified during the analysis phase are transformed in more concrete contracts to accommodate design, implementation, deployment and runtime phases; also new contracts can emerge as the development process advances. They use the concept of phase transition to deal with such transformations. [5] takes this idea further promoting QoSaware model transformations to gradually resolve QoS requirements in order to deliver efficient code. Our work does not consider contract/requirement transformations and succeeds only within a single modeling phase. Besides creating a QoS-aware specification framework and propose a way of handling and resolving QoS specifications in model transformations, [5] also describes a QoS-aware execution platform for resolving requirements at run-time. In contrast, we study how our work could be reused to work during runtime or be mapped to a QoS-aware run-time platform. For the implementation of transformations, [6] propose a novel method: i) specify non-functional properties as QoS contracts

31 1. Modelización y Métodos Formales 23 with a small set of stereotypes, ii) specify how they can be implemented with aspects, representing them a bit like design patterns occurrences, that is, using parameterized collaborations equipped with transformations rules expressed with meta-level OCL2. A design level aspect weaver takes the form of meta-level OCL2 interpreter to output UML models that can serve as a basis for code generation. [7] presents the contracting system, ConFract, for an open and hierarchical component model, Fractal. Fractal is a component model with several features: composite components, shared component, reflective capabilities and openness. A Fractal component is composed by content and membrane, where interceptors and controllers (contract, lifecycle, binding, content) are located. Three types of contracts (run-time specifications for the authors): interface contract between each pair of client and server interfaces, external composition contract which refer only to external interfaces of the component to express usage, and internal composition contract to express assembly and internal behavior rules of the implementation. In ConFract, the system dynamically builds contracts from specifications at assembly time and updates them accordingly to the dynamic reconfigurations of components. Contract violations are handled by activating an atomic negotiation. Our work shares with this one the automatic building of contracts, even when ours is used mainly for simulation. They achieve it because in their negotiation model, components have clearly identified their roles and so they can interact automatically. They distinguish three roles that we adopt: guarantor, beneficiary and contributor. On the other hand, we do not confine to hierarchical components as we use general UML2 [13]. To conclude, there are numerous QoS-aware architectures proposed, being some of them component-based. Most of them cover many of the aspects involved in this kind of systems: admission test, resource reservation, contract negotiation, composition, quality monitoring, adaptation, and maintenance. Since we concentrate on design models and on simulating the composition to analyze some properties of the architecture, there is no need to discuss them. If the architecture is going to be realized they will have to be studied deeply. III. META-MODEL IN THE UML PROFILE FOR QOS We consider important to have a clear and comprehensive model for QoS concepts. This model can be used to explain the concepts used, as a source to build a UML profile, or as the model to create modeling and analysis tools. We take as foundation of our model the meta-model included in the standard: UML Profile for Modeling QoS and Fault Tolerance Characteristics and Mechanisms. In our opinion, this meta-model lacks some mechanisms to enable the composition of adaptable architectures in the way we sketchily described in the introduction. The following figure is part of the current meta-model to show the relationships among some of the concepts of the standard. We will describe first the main meta-classes of the current meta-model to highlight afterwards the main enhancements we propose. Fig. 1. Current meta-model used to build the UML Profile for QoS. QoSCharacteristic. It is used to represent non-functional characteristics of the system elements (e.g. services) included in the design models. QoSCharacteristic is the constructor for the description of non-functional aspects like, for instance: latency, throughput or capacity. It allows specifying these characteristics independently of the elements they qualify. QoSDimension: It is a dimension for the quantification of QoSCharacteristics. We can quantify a QoSCharacteristic in different ways, for instance: absolute value, maximum and minimum values, or statistical values. QoSConstraint. It is an abstract meta-class to limit the allowed values of one or more QoSCharacteristics. Two approaches for the description of allowed values are: an enumeration of QoSValues for each involved QoSCharacteristic, or an expression that must be fulfilled by the QoSCharacteristics. These expressions define for instance the maximum and minimum values and the dependencies of QoSCharacteristics. QoSContext. The QoSContext establish the vocabulary of the constraint. This meta-class is required since often a QoSConstraint combine functional and non-functional elements, and can have more than one QoSCharacteristic associated. QoSRequired. When a client (of a software element or a resource) defines its required QoSContraint, the provider that supports it must provide some quality levels to achieve its client requirements. When the provider defines its required QoSConstraint, the client must achieve some quality requirements to get the quality offered. QoSOffered. When the provider defines a QoSOffered constraint, it is the provider who must achieve the constraint. When a client defines a QoSOffered constraint, the client must achieve the constraint. Often a QoS offered depends on the QoS provided by the resources and the providers that the software element uses. QoSContract. The quality provider specifies the quality

32 24 XI Jornadas de Tiempo Real (JTR2008) values it can support (provider-qosoffered) and the requirements that must achieve its clients (provider- QoSRequired); and the client, the quality it requires (client- QoSRequired) and the quality it ensures (client-qosoffered). Finally in an assembly process, we must establish an agreement between all constraints. In general, the allowed values that client-qosrequired specifies must be a subset of values supported in provider-qosoffered, and the allowed values that provider-qosrequired specifies must be a subset of values supported in client-qosoffered. Sometimes we cannot compute the QoSContract statically, because it depends on the resources available or quality attributes fixed dynamically. QoSLevel. It represents the different modes of QoS that a subsystem can support. Depending on the algorithms or the configurations of the systems, the component can support different working modes, and these working modes provide different qualities for the same services. For each working mode, we specify a QoSLevel. They represent states in the system from a quality point of view. QoSTransition. It is used to model possible transitions among QoSLevels IV. NEW REQUIREMENTS We enlarged the meta-model to allow: The specification of adaptable architectures whose elements can change their quality characteristics to react to environment changes or/and user wishes. The meta-class, QoSOffer, allows grouping all the quality levels that can be offered in an excluding way: only one can be ensured at a time during the execution. An equivalent concept stands for QoSRequirement where only one can be relied on. Implementing a quality level implies to fulfill a set of constraints (the allowed space ). If the quality level is located in the offer-side the constraints are of the kind QoSOffered. In case it is in the requirement-side they are QoSRequired. The composition of quality-aware elements (QoSAwareEntities) to bring up architectures that fulfill the expected requirements. Indeed, this composition is only studied from a non-functional point of view. QoSContracts need to be bound in order to ensure the fulfillment of every QoSRequired constraint based on the available QoSOffered. This process can be automated if enough information is given within the constraints. At design phase some QoSContracts can be given by the modeler either because it is a preferred contract or the only one allowed in a contract binding. A contract binding comprises the negotiation of constraints to be hold by QoSAwareInstances. A negotiation may involve several QoSOffers and QoSRequirements according to the QoSContexts of the constraints they involve. The negotiation ends up when the level for every QoSOffer and QoSRequirement is established. To compute all the QoS contracts, the resources available and the quality attributes need to be fixed statically but they can also be simulated/estimated. Fig. 2. QoS Negotiation. The definition of responsibilities in the monitoring and adaptation. Three responsibilities have been observed: guarantor, beneficiary and contributor. - A QoSGuarantor has at least one offer whose quality can be adapted; this means it must ensure a QoSLevel. - A QoSBeneficiary has at least one adaptable requirement; this also means a QoSLevel it can rely on. - A QoSContributor represents any other element which wants to be aware of a QoSConstraint. For example, when an element is responsible of monitoring a constraint when this constraint is involved in a contract during run-time, being this element neither the guarantor nor the beneficiary. V. CONCEPTS ADDED TO THE METAMODEL The following figure shows some of the concepts that have been added and their relationships. It includes some of described elements: QoSAwareEntity represent elements of a model which play a role in the definition of quality-aware architectures. Not every QoSAwareEnitity can be adapted. The number of instances can be specified. We distinguish three different roles: QoSGuarantor: the one that ensures a negotiated quality in a quality contract QoSBeneficiary: the one that relies on a negotiated quality QoSContributor: other elements aware of a specific contract QoSAwareInstance represent instances of QoSAwareEntities. Specializations identified are: QoSGuarantorInstance, QoSBeneficiaryInstance, and QoSContributorInstance to model other elements aware of a specific contract QoSContract now include two new attributes: given to indicate a contract defined by the modeler, and renegotiable to indicate that a contract can be renegotiated in quality adaptations occurring in the system.

33 1. Modelización y Métodos Formales 25 Fig. 3. New QoS concepts added to the meta-model: QoSAwareEntity and QoSAwareEntityInstance. Fig. 4. New QoS concepts included in the meta-model: QoSOffer and QoSRequirement. QoSOffer is used to group the quality levels at the offerside, from which only one can be ensured at a time. Several QoS guarantors that have the same QoSOffer need to implement all the quality levels included in the offer. QoSRequirement is used to group the quality levels at the requirement-side from which only one has to be selected at a given instant, the one that can be relied on in a contract. its entity type declares. A meta-class QoSBehaviorRealization links the behavior, the quality level exhibited on that behavior and the instance exhibiting that quality level. To offer a QoSLevel included in a QoSOffer, a guarantor needs to meet all the QoSOffered constraints of the allowedspace. If any of the constraints is not satisfied the QoSLevel is not respected. When a beneficiary is requesting a Several QoS beneficiaries that have the same QoSLevel included in a QoSRequirement, it is claiming that QoSRequirement need to implement all the quality levels included in the requirement. QoSOffer and QoSRequirement inherit the abstract metaclass QoSExternalBehavior. A QoSAwareEntity presents zero or more QoSExternalBehavior. Adaptable entities are those ones which can change for at least one behavior the quality level exhibited. Notice that during execution-time, an entity instance exhibits at most one quality level for each behavior the counter-part in the contract meet all the constraints QoSRequired of the allowedspace. QoSContributorBehavior is some-how an artifice denoting that a QoSContributor is a special QoSAwareEntity. OpLevel makes reference to the constraints a contributor is related. A contributor may exhibit a NoOpLevel indicating that the task it performs is not required for the whole system to work. At this moment, the concept QoSNegotiation of figure 2

34 26 XI Jornadas de Tiempo Real (JTR2008) can be better explained. It is used to group all the offers and requirements involved in a negotiation process because they are compatible. Two QoSExternalBehaviors are compatible when the QoSConstraints they involve constrain the same model elements: QoSCharacteristics and QoSAwareEntities. After the negotiation process, one or more contracts are bound. The contracts given by the user can alter the collection of contracts bound. A bound contract implies that several instances will need to fulfill some QoSBehaviorRealizations. Another issue tackled is the dependency between the quality offered by an element and the quality required by that element. This is a common situation where the quality a software component offers depends on the quality other components supply to it. A situation where this does not occur is the quality provided by a resource, such as the bandwidth of a network; in this case there is no input quality. We had to further develop the idea of dependency among quality constraints because we care about the composition and simulation of quality aware architectures. We need to describe how an element is going to behave internally from a quality point of view (QoSInternalBehavior). A table could be used to match input QoS required level and output QoS provided level. This table should be included in the design models. A mathematical or statistical function to include in the design models. A constraint expressing output and input quality levels. VI. CONCLUSION Once responsibilities, constraints, and quality levels are incorporated within the design models, it is possible to analyze the architecture to find out some features about the behavior of the architecture. To enumerate some examples: Is it possible to fulfill all the QoS requirements even at their lowest level of quality required? Will the system meet all the non-functional requirements? Which quality-aware entities, among a repository, enable to meet all the QoS requirements in the system? The component of which vendor can I use to meet the requirements? Is the architecture able to operate at every level of quality offered for an entity of the system? Will be the system able to offer the environment all the quality levels exposed? Has any of the quality-aware entity to operate at the highest level of quality offered (with the quality requirements establish for the system)? Is any component of the system overused? How many quality reconfigurations need to be made when a quality requirement change its quality level? How many internal changes of quality level are triggered when the user demands a new quality level? To achieve the composition of quality-aware architectures at design-time all the quality characteristics need to be modeled. This include the quality behavior of entities, that is, how an entity behaves when the quality it relies and/or the resources available change. There is a big difference between functional and QoS composition. For the functional composition it is enough an external view of the entities, whereas for the QoS composition some properties of the internals (behavior) need to be known. A simulation of the system behavior allows us one or more degrees of freedom, by estimating some values in some elements, but the rest of the architecture need to be fully described. We used a UML profile as concrete syntax of the proposed metamodel. We use it to include QoS elements in the design models. Since we take into consideration more concepts that the ones included in the standard UML Profile for Modeling QoS and FT Characteristics and Mechanisms, we needed to augment the profile conveniently. The created profile enables us to enhance models with features not included in the standard. We are working in a tool to enable the composition of quality-aware architectures. Initially, this tool will validate the quality constraints included in the model, check the contracts declared, bind requirements and offers to build quality contracts, find out quality levels at the requirement-side that cannot be met because there is no offer in their context, discover required constraints that cannot be hold since the architecture does not allow to satisfy them. This tool will be extended to fully compose and simulate quality-aware architectures. REFERENCES [1] Antoine Beugnard, Jean-Marc Jézéquel, Noël Plouzeau, Damien Watkins, Making Components Contract Aware, Computer, vol. 32, no. 7, pp , July, [2] Mallikarjun Shankar, Miguel de Miguel, Jane W.S. Liu, An end-to-end QoS management architecture, RTAS99 In Real-Time Applications Symposium, IEEE Computer Society [3] M. de Miguel, QoS Modeling Language for High Quality Systems, In 8th IEEE International Workshop on Object-oriented Real-time Dependable Systems, IEEE Computer Society, January [4] T. Weis, C. Becker, K. Geihs and N. Plouzeau, A UML meta-model for contract-aware components, In Proc. of UML 2001 conference, LCNS Springer Verlag, [5] A. Solberg, J. Oldevik, J. Aagedal, A Framework for QoS-aware Model Transformation, using a pattern-based approach, In International Symposium on Distributed Objects and Applications, LCNS Springer Verlag, [6] J.M. Jezequel, N. Plouzeau, T Weis, K Geihs, From Contracts to Aspects in UML Designs, AOSD Workshop on "AOP in UML", [7] P. Collet, R. Rousseau, T. Coupaye, N. Rivierre, A Contracting System for Hierarchical Components, In Component-Based Software Engineering, LCNS Springer Verlag, [8] S. Frolund and J. Koistinen. Quality of Service Specification in Distributed Object Systems Design, Distributed System Engineering, In Proceedings of the 4th conference on USENIX Conference on Object- Oriented Technologies and Systems, [9] P. Collet, Functional and Non-Functional Contracts Support for Component-Oriented Programming, First OOPSLA Workshop on Language Mechanisms for Programming Software Components, [10] O. Defour, J.M. Jézéquel, N. Plouzeau, Extra-functional Contract Support in Components, In Proceedings of the 7th International Symposium on Component-Based Software Engineering, LCNS Springer Verlag, [11] UML Profile for Modeling QoS and Fault Tolerance Characteristics and Mechanisms, OMG. [12] Object Constraint Language, OMG. [13] Unified Modeling Language, OMG.

35 2. Análisis Temporal

36

37 2. Análisis Temporal 29 Considerations on the LEON cache effects on the timing analysis of on-board applications G. Bernat, A. Colin J. Esteves, G. Garcia, C. Moreno N. Holsti Rapita Systems Thales Alenia Space Tidorum IT Centre, York Science Park, York Y010 5DG United Kingdom 100, bd. du Midi BP99, F CANNES LA BOCCA France Tiirasaarentie 32 FI Helsinki Finland T. Vardanega M. Hernek University of Padua via Trieste 63, I Padova Italy ESA/ESTEC TEC-SWE Keplerlaan 1, Postbus 299, 2200 AG Noordwijk The Netherlands 1. Introduction This paper provides a short account on the findings of the project Prototype Execution-time Analyser for LEON (PEAL) funded by ESA/ESTEC and executed in the course of The PEAL project was a collaboration between academic researchers, small and medium-sized tool vendors, and a large, established space company. Our goal was to study if and how the presence of cache memory in an onboard computer system complicates the reliable verification of its real-time performance. We adapted some timing analysis tools to the LEON processor in a prototype fashion, performed experiments, and drew the conclusions reported here. This paper is organized as follows. Section 2 sets the scene by explaining the trend towards caches, the verification problems that caches create, and some suggested solutions to these problems. Section 3 summarises the objectives of the PEAL study and section 4 sets out the assumptions of the study. Section 5 describes the experiments and results and section 6 presents our conclusions. 2. Study Context 2.1 The processor-memory gap European space projects are moving away from 16-bit on-board processors such as the MIL-STD series through rather simple 32-bit processors such as the ERC32 and special-purpose processors such as the ADSP-21020, to more complex and powerful processors such as the LEON family. In the long-term view the evolution of on-board processors is expected to provide for: more computing power through higher processor speeds; simpler programming thanks to large, flat address spaces, avoiding complications such as memory overlays; the ability to run several applications on one and the same computer, using memory management units and other hardware features to prevent space and time interference between applications; and better software reusability and portability through relaxed constraints on speed and memory and through the adoption of general, rather than space-specific, processor architectures and software-development tools. As processor speed increases, however, the speed gap between the processor and the memory becomes a critical bottleneck, as it has already long been for ground-based computing. The LEON processor family has therefore introduced a memory hierarchy with a fast, on-chip cache memory between the processor and the slower, external (off-chip) main memory. The LEON design also includes other accelerator features that aim to isolate the processor from slow memory accesses: instructions can be fetched in burst mode, and writes to memory are buffered and completed asynchronously. 2.2 The problem of variability and worstcase execution time The cache memories in the LEON processor are expected to provide for higher average computation speed, but at the cost of more variable execution time, as a cache miss takes much longer than a cache hit. The number of hits and misses incurred by an execution depends on the cache architecture and on what the program does and how and when it does that. Most on-board software is subject to realtime deadlines, so that the important figure is not (only) the average execution time, but the worstcase execution time (WCET) or, in practice, a trustworthy upper bound or estimate of the WCET. The PEAL project addressed the question of whether and how the presence of a cache helps or else hinders the goals listed above, when we consider the WCET and not just the average execution speed. For the purposes of the project this

38 30 XI Jornadas de Tiempo Real (JTR2008) basic question presented four main avenues of investigation: Is the verifiably-usable computing power always really higher with a cache? Perhaps some (rare) programs or situations may exist in which the miss rate is so high that the cache is a brake rather than an accelerator. Is programming simpler with a large memory, even if the memory is slow, and fast access is limited to a relatively small cache? Perhaps the software designers must pay a great deal of attention to making good use of the memory hierarchy so as to not risk poor performance. Can we isolate applications running on the same computer from one another? If one and the same cache is shared among all applications that run on one and the same processor, the applications may compete for cache-space and thus slow each other down. Some cache-handling errors in one application could even propagate to corrupt other applications. Can we achieve these goals without sacrificing portability, maintainability and reusability of the on-board software? If the design and coding of the programs has to match the details of the cache architecture, such as the associativity or total size, the program becomes unportable to other cache architectures, or at least may perform worse when ported. 2.3 How caches affect execution time Compared to a cache-less processor, four new factors emerge which influence the execution time of a program on a cache-equipped processor by changing the pattern of cache hits and misses: 1. the total size of the program code and datain relation to the cache size, 2. the location (i.e. the memory addresses) of the program code and data, 3. the history of code and data addresses accessed by the program in the past, and 4. the interrupts and preemptions occurred during program execution. In actual fact, the LEON processor can also be configured with a Memory Management Unit (MMU) which uses page-mapping tables to translate virtual memory addresses to physical memory addresses. To speed this translation up (in the average case) the MMU contains a small cachelike memory, called the Translation Look-aside Buffer (TLB). References to the TLB can also hit or miss and thereby reduce or increase the execution time. These effects of the cache and the MMU cause problems to the verification of the real-time performance of an application because they tend to reduce execution time in average cases but make it more difficult to set bounds on the worst-case execution time. In particular, these effects reduce the predictive value of execution-time measurements obtained by test, because the total program size, the memory layout, the execution paths and the pattern of interrupts or preemptions may all change from testing to flight, and the tests themselves are unlikely to hit on the worst combination of all these factors together. Cache effects also reduce the precision of static analysis for WCET because the analysis can only approximate the state of the caches at various points in the program. Moreover, schedulability analysis becomes harder because, firstly, the overhead of an interrupt or context switch is no longer constant but depends on the cache state at the time, and secondly, the WCET of a task is no longer a property of the task alone but depends on the interrupts and preemptions imposed on the task, unless the kernel takes costly measures to preserve the state of the caches across an interrupt or preemption. The crudest reaction to this problem is to avoid it by simply disabling the caches, at least for the timecritical parts of the application. The drawback of that solution of course is that the performance gain is totally lost. This strategy could be defended if it could be shown that the cache performance is so unpredictable that the verifiable (worst-case) gain is close to nil. Another crude way is to ignore the problem altogether by using caches fully and freely in the same way as in non-critical systems such as desktop computers and number crunchers. This way one may enjoy increased (average) performance, but at the cost of risking occasional, unpredicted and quizzical performance problems. One could attempt to reduce the risk by adopting larger performance margins, even if this cuts into the available performance gain, or by more performance testing, even if the quantitative risk and risk reduction are unknown. To defend this solution one can point out that no in-flight mission failure or problem has yet been traced to cache-related performance problems at least as far as we know. Where the crude solutions are not satisfactory then the only approach left is to use some systematic method to get a safe upper bound on the WCET, or a sufficiently reliable and precise estimate of the WCET. If we discount the traditional end-toend execution-time measurement tests as too unlikely to find the worst case, we are left with two methods: (i) static cache-aware WCET analysis [1, 5]; and (ii) measurement-based WCET analysis [2, 4] with some systematic way to specify and measure the test coverage to ensure a sufficiently reliable result. 2.4 Methods for WCET analysis We know of two methods that can give better (safer) bounds or estimates of the WCET of a task than end-to-end measurements can. The static WCET analysis method makes a model of the task that includes all its possible executions and gives an upper bound on the duration of each execution. This model combines control-flow graph with call graph and includes all instructions that the task can execute. An execution of the task corresponds to a path through this graph. A model of the processor gives an upper bound on the execution time of each

39 2. Análisis Temporal 31 basic block in the graph. When this time depends on the processor state (e.g. cache contents or register values) a static analysis of the state changes is necessary to find a safe approximation (upper bound) on the state and the time; this step typically uses abstract interpretation of the instructions. For complex processors it can be quite difficult to devise a model-analysis combination that is precise enough but not detailed to the point of failing the analysis through combinatorial explosion. Finally an upper bound on the WCET of the whole task is found by summing up the execution-time bounds of the blocks along the worst-case execution path. This step typically uses Integer Linear Programming to find the maximum sum without explicitly trying all execution paths. The second method, called measurement-based WCET analysis, also computes a WCET estimate from the execution times of the blocks of the task and the possible execution paths through these blocks. However, here the execution times of the blocks are measured, i.e. observed in test runs, and not computed from a static analysis. Some processors have special tracing hardware that can measure the execution time of each block transparently, without interfering with the execution at all; the NEXUS interface is one example. For other processors one must instrument the program with additional instructions that read the clock at suitable points in the code (e.g. on entry to a block) and somehow record the resulting time stamps (together with an identification of the instrumentation point) in an execution trace. This gives a set of samples of the execution times of each block. For complex processors the execution time of a block depends on the initial processor state when the block is entered. Even if the test set executes each block many times there is no guarantee that the worst case for this block is measured. Some implementations of measurement-based analysis try to set the processor into a worst-case initial state before each block so that the measured time is the WCET for the block. However, for complex processors it can be very difficult to find the worstcase initial state and to set the processor to that state. The measured block execution times are combined into a WCET estimate for the whole task by considering all possible paths in the task, as in the static analysis method, but perhaps using sophisticated statistics to compute execution-time distributions. Comparing the static and measurement-based methods we may note the following: Complex hardware, such as caches, is a problem for both methods: making a processor model is difficult for the static analysis; and measuring the worst case is difficult for the measurement-based analysis. Both methods have another common problem: the analysis may include execution paths paths through the control-flow and call graph that are impossible (i.e. infeasible) according to the overall logic of the task. This defect leads to a WCET over-estimation factor that depends on the design of the task and can be important. Both methods must usually be guided manually. The engineer running the analysis must usually specify iteration bounds on complex loops and mark some execution paths as infeasible so as to reduce over-estimation. 2.5 Cache locking, freezing and partitioning Caches are good and easy to use for reducing the average execution time because they are selfadjusting containers of the dynamically changing working set of code and data. But this dynamic behaviour causes unpredictability. Several researchers suggest cache locking for increasing predictability, e.g. [6]. This technique requires that the application deliberately loads some frequently accessed code or data into the cache and then locks this content in the cache so that it stays in the cache and is not evicted until it is unlocked. The locked part of the cache is similar to a scratchpad memory, e.g. [7, 8]. The general LEON architecture supports cache locking per cache line, but the fault-tolerant (FT) LEON variant used in space only supports cache freezing, which is equivalent to locking the whole cache at once. Another suggestion is to partition the cache so that each task (or each software function) is given a piece of the cache for its own use. This removes the dynamic aspect of the cache-mediated interference between tasks or functions but reduces the usable cache size for each task or function, which may increase the unpredictability within each task or function. The current LEON architecture does not support cache partitioning. 3. Study Objectives Overall the PEAL project was given the following four objectives: To evaluate the cache-sensitivity of typical onboard software (OBSW). To consider how the LEON cache should be configured and how the application software and kernels should use the cache (including possible changes to the compilers and kernels) to improve predictability, testability or analysability. To procure a prototype static WCET analyser for the LEON, based on the ERC32 version of the Bound-T tool [3] from Tidorum Ltd (however, excluding static cache analysis at this stage). To procure and evaluate the measurement-based WCET analysis by experiments with the RapiTime tool [4] from Rapita Systems Ltd. The project achieved much of these objectives and also identified subjects for future work, whether technical investigation or tool development. The project reports are available from ESA [9]. 4. Study Assumptions In the PEAL project we considered primarily the instruction cache (I-cache) for it can be analyzed

40 32 XI Jornadas de Tiempo Real (JTR2008) with much greater accuracy than the data cache (Dcache). Arguably in fact, the former depends in ways that can be accurately analyzed on the fixed and unmodifiable contents and the finitely variable evolution of the program flow. The latter instead depends on application-specific behaviour that is difficult to categorize as well as on programming and coding techniques (e.g. modes and numbers of subprogram parameters; use of pointers) which greatly vary across software suppliers. Thus the D- cache was disabled in all our experiments except for experiments expressly aimed at the D-cache only. In particular the study focused on the cache configuration as provided in the LEON AT697E chip from Atmel, which specifies an I-cache equipped with 32 KB of memory, 4-way associativity and least-recently-used (LRU) replacement policy. The D-cache is similar but has 16 KB of memory and 2- way associativity. In the study we intentionally excluded reliance on any ad-hoc support from the compilation system. However, we surveyed and evaluated research on compiler support for memory hierarchies, for example compiler-supported use of scratchpads as an alternative to caches. Our fundamental intent was to gage the magnitude of the impact of I-cache effects on the verification and validation process of industrialquality OBSW. We did so empirically, by way of experiments, yet striving to relate our findings to sound engineering principles and industrial best practices. On this account, the ultimate products of the PEAL study are guidelines and recommendations, backed by experimental results and observations, for anticipating, assessing and taming the effects that the LEON cache may have on typical OBSW. 5. Experiments 5.1 Cache-risk patterns To guide our experiments we constructed a set of cache-risk design patterns: reasonable examples of code for which a bad memory layout could cause up to 100% I-cache misses, persistently or sporadically, making execution up to four times slower even for the fastest external memory. These patterns helped us select the OBSW parts for our experiments and set the experimental conditions, such as good or bad memory layouts, but they are certainly not an exhaustive list of designs that can cause cache problems. A common feature of the patterns is that they overload some cache sets by using more memory blocks that map to the same set than there are cache lines in the set. If the design accesses these memory blocks cyclically in the same sequence the LRU replacement policy always evicts the blocks before they are needed again, so we have 100% cache misses for the overloaded sets. For example, on the AT697E consider a loop in which the loop body consists of 32 KB of code with consecutive addresses and no inner loops. The first iteration of the loop loads all this code into the I-cache, which means that later iterations run with no cache misses (assuming that the cache is not changed by preemptions or interrupts). However, if the loop body grows larger than 32 KB then some cache sets are assigned more than 4 memory blocks; these sets are overloaded and cause cache misses. If the loop body is 40 KB or larger all I-cache sets are overloaded and all code fetches are cache misses. For code that is not laid out consecutively similar problems may happen with much smaller amounts of code. For example, on the AT697E consider a loop with a body that just calls five procedures from five separate modules. If we are unlucky the linker may place these modules in memory so that these five procedures are mapped to the same I-cache sets (that is, the procedures have the same address mod 8 KB). In the worst case the procedures have the same size and overlap exactly; then all cache sets for the code in these procedures are overloaded resulting in 100% misses. Figure 1 illustrates the good and bad layouts. This is an example pattern for the effect of memory layout only. The rest of the cache-risk patterns are variants of this loop-calls-five-modules pattern. For example, to illustrate the effect of execution history we make one of the five calls conditional. As long as the condition is false, the loop loads its I-cache sets only by four memory blocks per set, so there are no I-cache misses. However, if the condition becomes true the load increases to five blocks per set and all instruction fetches cause cache misses in the worstcase memory layout. The patterns that illustrate the impact of preemptions and interrupts put the loop and some of the five calls in one task and the rest of the calls in another task or an interrupt handler. Thus the cache sets used in the loop are overloaded only when a preemption or interrupt happens while the loop is executing.

41 2. Análisis Temporal 33 (a) Good layout All memory I-cache load (blocks /s et) (b) Bad layout All memory I-cache load (blocks /s et) A mod 8 KB A C B D E A mod 8 KB A B E C D A div 8 KB Load 4: all hits A div 8 KB Load > 4: 100% misses in these sets Figure 1: Good and bad layouts of "loop A; B; C; D; E; end loop". 5.2 The example OBSW We made several experiments on one example of on-board software (OBSW) for the LEON from Thales Alenia Space (then known as Alcatel Alenia Space). This OBSW is a baseline intended for reuse across new-generation on-board systems. Its architecture is highly representative of real industrial OBSW with a real-time kernel (OSTRALES), concurrent and preemptive tasks and interrupt handlers. The OBSW is easily customisable and can be run in a fully simulated environment. It is not a full, ready-to-fly OBSW; it implements only a selection of services and functions, but these include both heavy algorithmic parts such as an Attitude and Orbit Control System (AOCS) and heavy datahandling parts such as an on-board Mission Time- Line (MTL) of time-tagged telecommands. The AOCS is autocoded in C from Matlab Simulink. The MTL implements most of the corresponding PUS service, is written in Ada and uses several balanced trees to hold the telecommands. The OSTRALES kernel was extended to be cache-aware. Firstly, the cache can be frozen during interrupt handling. Secondly, the cache can be frozen during the execution of some tasks and unfrozen for other tasks as controlled by a settable task attribute. Thirdly, the kernel provides an API that lets the OBSW flush, freeze or unfreeze the cache at any time. Flushing the cache means to discard all cached code or data; after a flush the first access to any address is always a miss. For our experiments we also extended the OBSW with direct commands that control the cache via this API. 5.3 The OBSW I-cache experiments Our OBSW experiments focused on the I-cache usage in the AOCS and the MTL. All the experiments were run in the same input/output scenario. The AOCS scenario was a gyro-stellar control mode with open-loop tabulated equipment inputs. The MTL scenario was a sequence of insertions of telecommands in the time-line. We ran over 20 experiments with different ways of using the cache and/or different memory layouts. Table 1 shows in summary how we varied the experimental conditions to investigate various questions. In all OBSW experiments the D-cache was disabled and Instruction Burst Fetch was enabled. Our OBSW experiments were run on the TSIM simulator from Gaisler Research (professional version 1.3.9). It should be noted that execution times on TSIM may differ by some 20% from the execution time on a real AT697E. We chose to generate the execution traces by instrumenting the OBSW because the non-intrusive alternative was too slow on TSIM. We compared the experimental results from the two WCET tools, RapiTime and Bound-T, both for the overall execution time (end to end) and at the detailed levels of single subprograms and basic blocks of code, and for different experimental conditions. Figure 2 shows the process for instrumenting, compiling, executing and analysing the OBSW experiments, starting from the OBSW source-code on the left and ending with a comparison of the WCETs found by Bound-T and RapiTime on the right. (The arrow carrying the Control Flow Graph (CFG) from Bound-T to RapiTime is special to the PEAL project and was necessary only because RapiTime was not yet fully adapted to Ada programs on the LEON.)

42 34 XI Jornadas de Tiempo Real (JTR2008) Table 1: Questions and conditions for OBSW experiments Question Condition 1 Condition 2 Importance of the I-cache overall Importance of I-cache retained content from one task activation to the next (task is AOCS or MTL) Effect of a preemption on a task (task is AOCS or MTL) Effect of MTL task on AOCS task I-cache enabled and unfrozen for the whole OBSW (including interrupts) I-cache enabled and never flushed; code loaded by a task activation may be reused in next activation Flush the I-cache at start of task Let both the AOCS and the MTL use and update the I-cache (unfrozen) I-cache disabled Flush the I-cache at the start of each task activation Flush the I-cache also at some chosen point inside the task Freeze the I-cache during MTL execution; unfreeze during AOCS Effect of AOCS task on MTL task As above Freeze the I-cache during AOCS execution; unfreeze during MTL Effect of good memory layout on MTL I-cache unfrozen for MTL only; uncontrolled memory layout as chosen by the linker I-cache unfrozen for MTL only; memory layout of MTL modules chosen to reduce the number of cache conflicts Effect of bad memory layout on MTL As above As above, but MTL module layout chosen to increase the number of cache conflicts OBSW s ources RapiTime ins trumenter Firs t compilation Compilation LEON binary for tes t cas e Bound-T WCET bounds w/o cache Layout tool Layout (ld s cript) Tes t execution environment (TSIM + equipment s im.) Extended CFG with i-points (XML) Comparis on Execution trace with time s tamps RapiTime ET s tatis tics WCET es timates Figure 2: Compilation, execution and analysis of OBSW experiments 5.4 The D-cache experiments We made some experiments with the D-cache using small routines for CRC-16 computations. The CRC procedure takes two inputs: the data packet (an array of 512 octets) and a look-up table (a 256- element array of 16-bit constants). To increase the number of data references we made two variants of this procedure: the first variant computes the CRC and also copies the data packet into an output packet, octet by octet; the second variant computes the CRC and also compares the data packet to a second input packet, octet by octet. The experimental variables were the procedure variant, the data in the packet, the layout (addresses) of the data, D-cache disabled or enabled, and a possible D- cache flush in the middle of the CRC computation to simulate a preemption. These experiments were run on a LEON FPGA board, model GR-XC3S-1500, from Pender Electronic Design and Gaisler Research. We ran the experiments once with SRAM memory and once with SDRAM. We used Bound-T to get static WCET bounds for comparison. We did not use the RapiTime tool for these very simple programs. 5.5 Results of the experiments All numerical results in cache-performance experiments are very specific to the application, the experimental conditions and the processor, cache and memory architectures. The numbers that we present below are only samples or examples. They should not be taken as the typical impact of the LEON cache and certainly not as the maximum impact. Moreover, the impact is proportional to the cache-miss penalty, which is quite low on the AT697E, while larger cache-miss penalties are expected in future space systems.

43 2. Análisis Temporal 35 Bound-T versus RapiTime We found that the static WCET bounds from Bound-T were quite comparable to the measurement-based WCET estimates from RapiTime with the caches disabled. This is not surprising because when caches are disabled the execution time of a LEON code block is almost (but not quite) independent of context and both tools use the same method to compute the total WCET from the WCETs of the blocks. A similar comparison with the caches enabled is not possible because Bound-T does not yet have a static cache analysis. Over-estimation? The WCETs computed by Bound-T and RapiTime were clearly larger than the largest measured end-to-end execution times, by 22% for the AOCS and by 115% for the MTL. We do not know how much of this difference is due to overestimation, for example from infeasible paths, because we do not know the real WCET. The larger percentage for the MTL is probably explained by the larger logical complexity of the MTL code which makes it less likely that our experiments measured the real worst case (end-to-end) and more likely that the analyses included infeasible paths. Cache gain Enabling the I-cache for the OBSW decreased execution times by a factor of 2.32 to A factor of 4 is typically quoted as the LEON/AT697E gain from the cache; our smaller factor shows only the I-cache effect since the D-cache was always disabled in our OBSW experiments. Enabling the D-cache for the CRC experiments (with the I-cache already enabled) decreased execution time by a factor of 1.4 to 1.9. The smaller factor is for SRAM and the larger for SDRAM, showing the influence of the larger cache miss penalty in SDRAM. Suspension impact A typical OBSW task does not run continuously but is activated (triggered) periodically or by some event, performs some computation, and then suspends itself to wait for its next activation. While the task is suspended other intervening tasks may use the cache and change its contents; when the suspended task is activated again it will usually incur several cache misses as it reloads its own code and data into the cache. We call this the suspension impact. We measured suspension impact by comparing the execution time of a given task under three conditions: 1) no suspension impact because only this task is allowed to change the cache (the cache is frozen in all other tasks); 2) natural suspension impact because all tasks are allowed to change the cache; 3) the worst-case suspension impact for this task because we flush the cache at the start of the task. A very clear example of I-cache suspension impact was observed for the MTL task where execution time was 22% longer for condition (3) than for condition (1). The D-cache suspension impact for the CRC computation was 4% to 16%. Interestingly we also saw one case of beneficial suspension impact: a task ran faster under condition (2) than under condition (1) because an intervening task brought into the I-cache the code for a subprogram that the suspended task calls early in its (next) activation, but which is then evicted by other code in this task. Under condition (2) the cache misses for reloading this subprogram happen in the intervening task. Preemption impact If a task is preempted and the preempting task is allowed to change the cache, the preempted task usually incurs more cache misses after the preemption as it reloads its own code and data into the cache. We call this the preemption impact. It is difficult to measure because it depends on the point of preemption, not only on the cache conflicts between the two tasks. Our experiments simulated preemptions by flushing the cache at some chosen point in the preempted task. If the preemption point is placed very close to the start of a task it is similar to the suspension impact. Our OBSW experiments did not find very large I-cache preemption impacts, probably because we did not find the bad preemption points in the rather complex OBSW code. The D-cache preemption impact for the CRC computation increased the execution time by 2% to 11% for a single simulated preemption. Layout impact We also experimented with the layout impact, or how the execution time depends on the memory layout (memory addresses) of the relevant code and data, as illustrated in Figure 1. We chose the MTL function in the OBSW for this experiment because it contains loops that call several procedures, in a manner that resembles the pattern described in section 5.1. We created a good layout of the MTL procedures by placing them consecutively in memory; they occupy only 22 KB so the consecutive layout does not overload any I-cache set. We also created a bad layout by placing the procedures to start at the same address mod 8 KB, thus mapping all starting points to the same cache set. However, we later found a still worse hybrid layout in which the mid-points of the MTL procedures are placed at the same address mod 8 KB. We compared these layouts under two conditions: 1) the I-cache is flushed at the start of an MTL task activation; 2) the I-cache is not flushed at that point. In both cases the I-cache was frozen for all other tasks, so under condition (2) the I-cache has the same content at the start of an MTL activation as at the end of the preceding activation. Under condition (1) we found essentially no execution-time difference between the layouts. A closer look into the MTL code showed that no loop within one MTL activation calls more than 4 of these procedures, which means that even the bad layouts do not overload the 4-way I-cache. In contrast, under condition (2) the repeated activation of the MTL task forms an outer loop and this loop does call more than 4 procedures. Thus the 4-way I-cache can be overloaded (although not in all cache sets) and execution time increases by 11% for the hybrid layout over the good layout. Note that

44 36 XI Jornadas de Tiempo Real (JTR2008) these layouts are probably not the really worst and best layouts for this code. For the D-cache experiments we controlled the addresses of the data packet, the look-up table and the comparison packet or copy packet. The dataaccess pattern of the CRC computation is so simple that it was rather easy to create a very good layout and a very bad layout. Note, however, that the addresses (indices) used in the look-up table depend on the data in the input packet. We used a specific input packet to maximize the conflicts. As expected we found no D-cache layout impact in the pure CRC computation because it accesses only two data objects (the data packet and the lookup table) and so does not overload the 2-way D- cache. Nor was any layout impact observed in the variant that computes a CRC and makes a copy packet, because the third data object (the copy packet) is only written, and data writes in the LEON never evict any cached data (for data writes the LEON uses write-through with no allocation of cache lines in case of cache miss). However, the variant that computes a CRC and compares the data packet to a comparison packet has three data inputs and suffers a massive layout impact: the bad layout runs 35% to 63% slower than the good layout. In fact, if we completely disable the D-cache with the bad layout in SRAM we only add 8% to the execution time, which shows that the layout impact almost cancels the whole D-cache gain in this experiment. Reserving the cache for one task We did not experiment with selective cache locking as that feature is not supported in the LEON FT. As a rough alternative we used cache freezing to select which tasks update the cache. For example, we found that the MTL runs 17% slower if other tasks also update the cache. Summary of results To summarise, the size of the impacts we found could be significant with regard to typical OBSW deadlines and margins. The impacts could arise or stay dormant in rather complex ways. The impact is likely to increase with larger cache-miss penalties in future processor architectures. 6. Conclusions In this section we recapitulate the lessons we drew from the study and formulate some considerations on ways (whether technical or methodological) in which space industry could master the presence of I- and D-caches in newgeneration processors and mitigate the risk effects that may arise from them on the qualification of the execution performance of on-board software. Based on this work, we consider caches to be a significant problem for the reliable verification of real-time performance in OBSW. At least, when real-time performance is critical, the risk is high enough to be worth considering and countering in some way. A small modification (in layout only, for example) to an already verified system can result in a system with a different behavior with little warning to the user. The basic problem is that the performance of a cache (miss ratio) depends on so many factors that test-sets of practical size are unlikely to cover the worst case, especially if evolution of the software is considered (e.g. possible changes to the memory layout and to the code itself). Cache designers try to make the bad cases very unlikely. There is a critical area where the likelihood of a bad case is so small that it is missed in testing, but is too large to be risked in flight. This risk is inherent in the idea of a cache as a self-adjusting container for the current working set. The contents of the working set are dynamic, history-sensitive, and thus hard to predict, especially when there are concurrent activities. Furthermore, practical cache designs cannot contain the true working set, but are limited (by capacity and address conflicts) to an approximation of the working set. In extreme cases (i.e. with many cache conflicts) the approximation is quite poor, which may cause many misses. The risk of bad cases can be reduced somewhat by suitable software designs and coding styles. Caches are designed to make use of temporal and spatial locality of reference and thus will work better when the design and code create more local references (to code and data) than non-local ones. One recommendation is to favour local or lowlevel loops that do not call many subprograms from other modules, and to avoid high-level loops that do call many subprograms from other modules when the called subprograms do not contain local loops. There is much research in more predictable alternatives to cache memories, such as scratchpads, or more predictable ways of using caches, such as locking parts of the code and data in the cache. We feel that such ideas could be useful and practical for small, simple applications where the critical code or data are easy to identify. Actual OBSW is too complex for this, especially if several applications share one and the same computer. Industry does not want to return to such detailed memory management, which is also unportable. Still, some gains could be expected by locking the instructions and data of some time-critical tasks in cache across successive task activations, of course on a scale limited by the tiny size of the caches. The current LEON FT and AT697E do not provide locking for parts of the cache but a similar effect can be achieved by designing the memory layout so that some cache sets (some ranges of address mod 8 KB) are reserved for the critical code and data, placing at most 4 code blocks in each I-cache set and at most 2 data blocks in each D-cache set, and placing all noncritical code and data in other sets (other values of address mod 8 KB). In theory, a static analysis of the program, based on a detailed model of the processor, the caches and other relevant parts of the system (e.g. the MMU) can give a safe upper bound on the WCET of each task. Such WCET tools are currently used in the aerospace and automotive industries [5] but are not yet available for the LEON. However, the detailed processor model is hard to define and implement,

45 2. Análisis Temporal 37 and the WCET bound is often too large (pessimistic), especially for the D-cache. Our experiments illustrate how the pessimism depends on the software design and how the WCET bound can be improved by design changes or manual guidance of the analysis to exclude infeasible execution paths. However, including interrupts and preemptions in the WCET analysis or the schedulability analysis is still an issue for research, as is a good D-cache analysis for pointer-heavy programs. Measurement-based WCET tools such as RapiTime do not need a processor model, but only a way to just measure the execution time of small program fractions, by either hardware probes or software instrumentation. Though far better than simple end-to-end measurements, the WCET estimates obtained from these tools are not guaranteed safe upper bounds and can also become pessimistic through undetected infeasible execution paths. Current static and measurement-based WCET tools have an important common problem: they apply to a single memory layout, the current one. They cannot consider all possible layouts, except by trying them all. There is room for tools that visualise the layout and its possible cache conflicts and perhaps improve or optimize the layout to reduce conflicts. One such tool, the RapiCache cacheconflict analyzer, was initiated in this project. In conclusion, we find that, for the prospective use of LEON uniprocessors, caches currently represent the most significant problem for reliable verification of real-time performance. The MMU, and especially the TLB, can cause similar problems, but we have not studied this issue experimentally. For multiprocessors and systems-on-chip, analysing the timing and performance of traffic on the shared buses, like the AMBA bus for LEON systems, will also be a hard or perhaps even harder problem. References [1] Fast and efficient cache behaviour prediction for real-time systems, by C. Ferdinand and R. Wilhelm. Real-Time Systems 17, , [2] WCET analysis of probabilistic hard real-time systems, by G. Bernat, A. Colin and S.M. Petters. In Proceedings of the 23rd Real- Time Systems Symposium RTSS 2002, Austin, Texas, USA, [3] Bound-T Execution Time Analyzer. Tidorum Ltd. [4] RapiTime. Rapita Systems Ltd. [5] ait Worst-Case Execution Time Analyzers. AbsInt Angewandte Informatik GmbH. [6] Cache contents selection for statically-locked instruction caches: an algorithm comparison, by A. Martí Campoy, I. Puaut, A. Perles Ivars and J. V. Busquets Mataix. In Proceedings of the 17th Euromicro Conference on Real-Time Systems (ECRTS 2005), 6-8 July 2005, [7] Influence of memory hierarchies on predictability for time constrained embedded software, by L. Wehmeyer and P. Marwedel. In Design, Automation and Test in Europe (DATE 2005), 7-11 March 2005, Vol. 1, [8] WCET centric data allocation to scratchpad memory, by V. Suhendra, T. Mithra, A. Roychoudhury and T. Cheng. In Proceedings of the 26th IEEE International Real-Time Systems Symposium (RTSS '05), [9] PEAL Final Report, TR-PEAL-FR-001; WP13 - Study of Cache Usage Effects in Typical OBSW, TR-PEAL-TN-WP13; "Caches and the LEON", TR-PEAL-TN-003. Send requests for copies to Maria Hernek, maria.hernek@esa.int. PEAL is ESTEC/Contract 19535/05/NL/JD/jk.

46 38 XI Jornadas de Tiempo Real (JTR2008) A Stochastic Analysis Method for Obtaining the Distribution of Task Response Times Joan Vila-Carbó, and Enrique Hernández-Orallo, Departamento de Informática de Sistemas y Computadores. Universidad Politécnica de Valencia. Spain Abstract Real-time analysis methods are usually based on worst-case execution times (WCET). This leads to pessimistic results and resource infrautilisation when applied to highly variable execution time tasks. This paper uses a discrete statistical description of tasks, known as histograms, that enables a powerful analysis method which provides a statistical distribution of task response times. The analysis allows to study workloads with utilisations higher than 1 during some overloads. System behaviour is shown to be a stochastic process that converges to steady state probability distribution as long as the average utilisation is not higher than 1. The paper shows that workload isolation is a desirable property of scheduling algorithms that highly eases the analysis and makes it algorithm independent. An analysis method, referred as the interference method, which is algorithm dependent is also introduced and illustrated for the case of GPS algorithms. I. INTRODUCTION The histogram method is based on the idea of improving the results of the worst-case kind of analysis in the areas of real-time systems and network calculus. The basic hypothesis is that most workloads are inherently variable and the worst case is, in general, far from the average case. This is a well known fact in network traffic and multimedia systems, because most media encoding patterns produce bursty workloads. But it is also true for real-time systems, not only because multimedia processing is more and more common in these systems, but also because there are important sources for uncertainty, like cache memories or complex and unpredictable algorithms. In this scenario the worst case analysis produces very pessimistic results and leads to resource infrautilization. Two main approaches have been proposed in the literature to deal with real-time requirements: deterministic and statistical techniques. Temporal requirements in deterministic techniques are mainly based on the deadline mechanism. They allow no deadline violations in data processing or packet losses in data transmission. A lot of work has been done in this area during the last decades (see [1] for example). Providing statistical guarantees has been also analysed in the literature, although to a lesser extent than the deterministic model. Statistical techniques are usually regarded as QoS (Quality of Service) techniques. They usually guarantee a processor (or trasmission) bandwidth and allow some percentage of deadline violations or data losses in order to improve resource utilisation. This way, the work of [2] is representative This work was developed under grant of the Spanish Government CICYT under grant TIC C03-03 of a set of servers for jointly scheduling hard and soft realtime tasks, using variable execution times in soft tasks and probabilistic deadlines. The statistical model is much more common in network calculus, where it was first introduced in [3] for a channel-establishment test with statistical guarantees using the Earliest-Due-Date (EDD) packet scheduling policy. A statistical analysis of the GPS (Generalised Processor Sharing) flow based theory was introduced in [4]. One of the main problems with highly variable execution times is finding a simple task model described by a reasonable number of parameters that enables powerful analysis methods and provides accurate results. Some works based on the statistical model concentrate on matching statistical parameters of continuous distributions. This is the case of some traffic modelling of video sources [5], [6]. However, there is no consensus on a model [7], The proposals usually degenerate to a readily analysable Gaussian model on very large networks. An interesting approximation for traffic characterisation are histogram based models which describe tasks or network traffic as a discrete statistical probability mass function. In the area of real time systems one of the first works to model execution times using statistical variables and to calculate task interferences using the concept of statistical convolution is by Tia et al. [8]. The work by [9] uses the same approach but a different method. In [10] Lehoczky makes an stochastic analysis with intense traffic called Brownian movement. The work by Díaz, García et al. [11], [12] makes a complete analysis of the RM (Rate Monotonic) algorithm with task execution times characterised by histograms. The analysis includes a method for calculating task interference and the resulting stochastic process. In the area of network calculus the histogram model was used by Skelly et al. [13] to predict buffer occupancy distributions and cell loss rates for multiplexed streams. It was also used by Kweon and Shin [14] to propose an implementation of statistical real-time channels in ATM using a modified version of the Traffic Controlled Rate-Monotonic Priority Scheduling (TCRM). These works used an analysis method based on a M/D/1/N queueing system where traffic sources are approximated to a Poisson distribution with a rate λ which is modelled as a discrete random variable. This paper generalises the stochastic analysis for real-time tasks with highly variable task execution times modelled as discrete probability mass functions (histograms). The method introduces the hypothesis of workload isolation and shows that the stochastic analysis of the system can be done in an algorithm independent way when this hypothesis holds. Work-

47 2. Análisis Temporal 39 Interval Class Midpoint Probability Cumulative number interval probability i interval(i) c i x i x + i 0 [0, 20[ [20, 40[ [40, 60[ [60, 80[ [90, 100[ [100, 120[ Fig. 1: Grouped probability distribution Fig. 2: Probability mass function and cumulative distribution load isolation is only approximately met by some algorithms. When it does not hold, the paper shows how to extend the method using a given scheduling algorithm. This is known as the interference method and is solved for the case of GPS algorithms. II. HISTOGRAM BASICS The basic idea behind this work is improving the results of the deterministic analysis by providing a task characterisation that describes more accurately the workload variability than deterministic or simplistic descriptions and provides a powerful analysis method. The proposed model characterises tasks with variable execution times using histograms. A histogram is a form of a bar graph representation of a grouped probability distribution (fig. 1) which is a table that represents the values of a variable against their corresponding probabilities (or frequencies). The range of values of the variable is, in general, continuous and divided into intervals (also referred as classes). The probabilities of values in an interval are grouped together. All the intervals are the same width and are characterised by their lower limit, upper limit, midpoint and interval number. Figure 1 shows the grouped probability distribution of a sample workload and figure 2 shows the corresponding histogram. For convenience, the X-axis of the histogram will show the interval number or its midpoint rather than the interval limits. Histograms processing will be done using their probability mass functions. Given a variable X representing a task execution time characterised by a grouped probability distribution divided into n intervals, the probability mass function (pmf) of X, denoted as X (in calligraphical letters), is the sequence of values x i that represent the (grouped) probability that variable X takes a value in interval number i: X = [x i, i = 0...n], x i = P(lower(i) < X upper(i)) The values upper(i) and lower(i) represent the upper and lower limits of interval i of variable X. Figure 2 represents the pmf (in bars) and the cumulative distribution function (cdf) (in dashed lines) of the previous grouped probability distribution. Some important values in a histogram are the mean value (or expectation) E[X] = i x i i, and the maximum value M[X] = max(i : x i > 0). The goal of this pmf definition is to make abstraction of some variable attributes, particularly the range of values of X. Two random variables X and Y with different range values, denoted as range(x) = [0, X max ] and range(y ) = [0, Y max ], will be said to be equivalent if X = Y. This means that the ranges of variables X and Y are divided into the same number of intervals and the probabilities of all intervals are the same. This occurs frequently when scaling variables. For example, if some variable X represents the amount of data arrived over a time period, the computation time Y of a task is usually proportional to X, so histograms X and Y will be equivalent as long as they are expressed with the same number of intervals. Similarly, if some variable X represents the computation time of a task referred to a processor with rate (speed) r 1, then the execution time referred to a processor with rate r 2 will be Y = X r 2 /r1 but X and Y will be equivalent. A. Histogram number of classes and precision One of the most critical issues of the histogram method is the precision that can be obtained by making a continuous variable discrete. The key issue is to determine the number of classes of a histogram. This is in general a tradeoff between representation economy and precision: with too many intervals, the representation will be cumbersome and and histogram processing expensive, since the complexity of algorithms mostly depends on the number of classes but, on the other hand, too few intervals may cause loosing information about the distribution and masking trends in data. Another important problem is that histogram processing with a low number of classes results in important precision errors. It is paradoxical that these errors occur even if that low number of classes is enough to properly describe a given workload without loosing much information. The reason for those inaccuracies seems to be the effect of discontinuities in histogram processing through iterative algorithms. The solution proposed in this paper consists in reducing the discontinuities of histograms by artificially increasing the number of classes through a transformation called overclassing. This transformation is a function f : X Y consisting in uniformly distributing each probability x i in an interval [lower(i), upper(i)] of length f centered around i such that: y i = P(Y = j) P(X = i)/f, j [lower X (j), upper X (j)]. This transformation meets E[X] = f E[Y]. III. SYSTEM MODEL The model of real-time system assumed in this work can be defined as soft real-time and was originally devised in the scope of multimedia transmission and processing, although the proposed analysis can be also applied to some other areas,

48 40 XI Jornadas de Tiempo Real (JTR2008) Media source Buffer Processor Buffer Processor Playback Buffer Media Player Fig. 3: End-to-end processing specially those using tasks with highly variable processing times. In multimedia processing, computations are usually datadriven. Data are produced continuously by media sources and at highly variable rates. Computations are structured according to the pipeline paradigm (see figure 3) a processor reads data, processes them, and sends processed data to the next processor. The chain of processors forms an end-to-end system. Processors are interconnected through buffers: a processor output is enqueued into the next processor input buffer. This model is mostly assumed by streaming applications. When the last stage in a data stream is a media player, its input buffer is usually known as the playback buffer. One of the main problems of streaming is to avoid situations where the playback buffer is empty, because it causes disruptions in media playing. An empty playback buffer is mainly due to long and unpredictable delays in previous stages buffers. This occurs often during congestions, when some buffer lengths can become too long and may even overflow, causing data losses. Some other times, data losses are due to data dropping in routers to alleviate congestion. In the proposed system model assumed, every processor executes a set of tasks X = {X i : i = 1...m}, to process the corresponding data streams. Tasks are periodic and are characterised by their computation time and their period. One of the main differences of the proposed model with the classical real-time system model is that execution times are not characterised by a constant (WCET), but discrete statistical variables, whose pmfs will be denoted as X = {X i : i = 1...m}. Multimedia processing tasks, as most real-time tasks, are periodic. Task computation times will be always referred to the hyperperiod. This is because the proposed method analyses the system s period or hyperperiod, defined as the m.c.m. (minimum common multiple) of all task periods. These periods are usually related to application parameters like the frame frequency, or audio sampling period. We assume that there is certain flexibility in adjusting a task period (to some multiple or submultiple of the natural period. The goal is to make the hyperperiod as small as possible. Unlike hard real-time systems, in systems with highly variable execution times, temporal requirements do not involve strict deadlines. This requirement will be replaced by the concept of system stability. A task set executing on a processor will be said stable if the pending execution time of every task is bounded. The pending execution time of a task is defined as the amount of its computation time that cannot be executed by the end of a hyperperiod and hence, has to be enqueued to the next hyperperiod. This time corresponds to the processing time of data stored in the buffer. Stability will be the soft real-time required in the proposed system. A. Problem statement The problem addressed in this work is an end-to-end system as shown in figure 3. The analysis first considers a single node of the system. This node processes a set of data streams as inputs (see figure 4). Data are supplied trough buffers of finite or infinite capacity. The goals of the analysis are studying system stability, task response times and some other QoS parameters as data loss rates. The amount of data generated by a data source is, in general, time dependent. According to this, the execution times of the task set executing on a processor will be also a set of time dependent functions, and will be referred as: A(t) = {A i (t), i : 1...m}. We will start our discussion by expressing these functions as continuous time dependent functions. They will be later transformed into discrete statistical variables. The service time for a particular data source, denoted as S i (t) will be defined as the processor bandwidth allocated to data stream i. In the general case (figure 4), service time S i (t) is a function of the set of all traffic sources and the processor scheduling algorithm : S i (t) = (A(t)), i : 1...n. However, this problem can be simplified by requiring strong workload isolation: every data source is assigned a constant bandwidth that is time and workload independent: S i (t) = r i, t. This hypothesis allows to decompose the problem into n independent problems as the one illustrated in 5. This assumption is ideal because it only occurs in the fluid model of GPS used in networks calculus [15]. Some other algorithms exhibit a weaker form of this property that will be referred simply as workload isolation: the average of the service time over a time interval (for example the hyperperiod) is constant. This is true for some algorithms based on the idea of constant bandwidth ( [16], [17]). The general case where the service time cannot be decoupled from the rest of data sources will be analysed using the, so called, interference method. This kind of analysis has been used in [12] for an statistical study RM algorithm and in this paper for the case of the GPS algorithm. IV. ANALYSIS METHOD BASED ON WORKLOAD ISOLATION This section analyses system stability under the hypothesis of workload isolation in an algorithm independent fashion. Later sections develop an interference based method for the case of the GPS algorithm used in media processing and transmission. System stability requires to compute the pending computation time for a particular data stream in a given processor at time t,. This time corresponds to the required time to process

49 2. Análisis Temporal 41 A 0 Q 0 A 1 A m. Q 1 Q m S 0 S 1 S m Processor Fig. 4: General scenario A Q Processor Fig. 5: Workload isolation scenario the data stored in the buffer up to that time. Expressing the data arrival rate A(t) as a time dependent function and assuming that the buffer has a finite capacity l, the buffer length Q(t) of a data stream at a given time t can be expressed as: Q(t) = t 0 r φ l 0(A(t) S(t)) dt (1) where, S(t) is the processor service rate and operator φ limits buffer lengths so they cannot be negative and cannot overflow value l either. This operator is defined as follows: φ b a(x) = 0, for x < a x, for a x < b + a b, for x b + a This expression can be rewritten as a recurrence equation (known as Lindley s equation [18]) assuming a discrete time space T = t 0, t 1, t 2,... where t k = k τ for some integer value k and τ being the duration of the hyperperiod. This way, functions A(t), S(t) and Q(t) can be replaced by discrete functions that will be expressed as a function of discrete time k: Q(k) = φ l 0(Q(k 1) + A(k) S(k)) In this expression A(k) is the cumulative number of bits that the date source puts into the buffer during the k-th hyperperiod. Analogously the service rate S(k) is the number of cumulative bits that the processor removes from the buffer during the same hyperperiod. Assuming workload isolation, the service time for each data source can be expressed as a constant r: Q(k) = φ l 0(Q(k 1) + A(k) r) = φ l r(q(k 1) + A(k)) The foundation of the histogram method basically consists of eliminating the time dependence of A(k) of the previous expression by expressing it as a discrete random variable with pmf A = [a k, k = 0...n]. This way, previous equation can be expressed transformed into a statistical equation: (2) Q(k) = Φ l r (Q(k 1) A) (3) Algorithm HBSP(A,r,b) A: arrival rate r: service rate b: buffer length 1 Q(0) = [1] 2 k = 0 3 do 4 k = k I(k) = Q(k 1) A 6 Q(k) = Φ b r(i(k)) 7 while E[Q(k)] E[Q(k 1)] > ε 8 return Q(k) Fig. 6: HBSP algorithm where operator stands for the standard statistical convolution and the bound operator Φ b a() is defined as the statistical generalisation of the previously defined φ b a () operator: Φ b a(x) = Φ b a([x 0, x 1 x n ]) = a n = [ x i, x a+1, x a+2,...,x b+a 1, x i ] (4) i=0 i=b+a Notation Φ a (X) will be used as an equivalent for Φ a (X). Note that now Q(k) has become a discrete time stochastic process. The evolution in time of this stochastic process depends on the mean value of A. When M[A] < r, then the buffer length is always zero because it is easy to prove, from its definition, that Φ r (A) is zero in this case. The case when M[A] r is the most interesting one, because unlike worst-case kind of analyses, statistical analyses allow arrival rates to exceed occasionally processor capacity during transitory overloads and still have a stable system depending on E[A]. Two cases must be considered: infinite and finite buffer. In the infinite buffer case, the system converges to a steady-state pmf iff E[A] r. With a finite buffer, the process always converges because it is always bounded by operator Φ b a (). In figure 6 is an iterative algorithm to calculate the steady state solution of the buffer length problem of equation 3. It will be referenced as the HBSP (Histogram Buffer Stochastic Process) algorithm. V. ANALYSIS METHOD BASED ON INTERFERENCES The analysis method of previous sections assumes workload isolation, i.e., every data source has a constant service rate. The interference method assumes that, in general, the service rate for each workload depends, in general, of the processor bandwidth r, the task set execution times X, and the scheduling algorithm : S i (t) = (A(t)). This section develops a method when this hypothesis does not hold using the GPS family of algorithms. A similar study for the RM algorithm was developed in [11]. The interference of task set X on task A i using scheduling algorithm, referred as (X,A i ), is defined as the pmf of the response time of task A i when the processor executes task set X using scheduling algorithm. In other words, it is the time that the processor devotes to process A i when it is also executing X.

50 42 XI Jornadas de Tiempo Real (JTR2008) The calculus of (X, A i ) is -dependent and may have a high computational cost for some scheduling algorithms. This is the reason why workload isolation simplifies the analysis of real-time systems. System stability requires to compute the pending execution time in a hyperperiod. This time corresponds to the processing of data stored in the buffer, i.e. data that could not be processed during a hyperperiod. The buffer length in this situation can be obtained by adapting equation 3. In this equation, the expression Q(k 1) A is the required processing time in hyperperiod k assuming a virtual processor that only executes stream A (workload isolation). When workload isolation does not hold, it has to be replaced by the time that the processor devotes to process Q i (k 1) A i : Q i (k) = Φ l r( (X(k 1), Q i (k 1) A i )) (5) where X(k 1) is the interfering task set for hyperperiod k. After the first iteration, the task set that interferes A i may have changed. This is because workloads in the new hyperperiod have to be convolutioned with the corresponding pending workload. In other words, in each iteration k the interfering workload has to be recalculated as: X(0) = X X(k) = {Q i (k 1) A i, i = 1 m} (6) So now, system stability becomes more complex to compute and to prove. In this case, the condition for the convergence is: m i=1 E[X i] r although the proof has been omitted. A. Analysis of response times of the GPS algorithm The GPS algorithm [15] is based on sharing the processor bandwidth among a set of tasks in a weighted way. Each task X i is assigned a percentage of the processor bandwidth that will be referred as ω i. When some task finishes its execution, its processor bandwidth is reassigned to the rest of tasks proportionally to its percentage, so the relation ω i /ω j is preserved for every pair of tasks X i, X j. The GPS theory assumes an ideal fluid model in which the processor bandwidth can be splitted among task for any infinitesimal interval of time. This ideal behaviour has been approximated by some algorithms [16]. This section presents the analysis of the response time of the GPS algorithm using histograms as an example of the interferences method. The analysis will be first introduced through an example, and then generalised in algorithmical form. The example calculates the response time of a task X 0 as a consequence of the interferences of tasks {X 1, X 2 }. Task X 0 has an execution time that is uniformly distributed in the interval [0, 8]. Its histogram is shown in figure 7a in solid bars. In a first approximation to the problem, we will assume that interfering tasks X 1 and X 2 both have deterministic execution times of 4 units and 2 units respectively, also shown in figure 7a. This set of tasks is executed using a GPS algorithm with the following bandwidth percentages: ω 0 = 3/10, ω 1 = 6/10 and ω 2 = 1/10, so ω 0 = 1 2 ω 1, ω 0 = 3 ω 2, and ω 0 + ω 1 + ω 2 = Xo X1 X2 Wo (a) Task set and bandwidth ω (c) Interference in interval (2,6) (e) Resulting interference in (0,8) (b) Interference in interval (0,2) (d) Interference in interval (6,8) (f) Result of simulation Fig. 7: Example of the interferences method The first step to calculate the response time pmf of X 0 is to find out the order and the time by which deterministic tasks finish their execution. In the above example, it is easy to see that X 1 will finish first by time 4/ω 1 = 6.67, so up to this time it has been interfering task X 0. By that time, X 0 has executed 2 computation units because ω 0 = 1 2 ω 1. Once X 1 is finished, its bandwidth is reassigned to X 0 and X 2 in such a way that ω 0 /ω 2 is preserved. So the new bandwidth percentage of X 0 is now ω 0 = 3 10 /( ) = 3 4 and ω 2 = 1 4. Task X 2 will finish when it completes its 2 execution units. Since the relation ω 0 ω 2 = 3 is always preserved, X 2 will interfere task X 0 until X 0 executes ω0 ω 2 2 = 6 time units. Henceforth X 0 will have the full processor bandwidth so ω 0 = 1. The different values of ω 0 with respect the execution times of X 0 are represented with a dashed line in figure 7a. Once the evolution of ω 0 has been established, the pmf of the execution time of X 0 is calculated by intervals with the same ω 0. In these case, three intervals are considered: [0, 2[, [2, 6[ and [6, 8[. In interval [2, 6[, the value of ω 0 is 3/10. If the execution time of X 0 belongs to this interval, the task will be executed 1 ω 0 = 10 3 = 3.33 times slower than if it had the full processor bandwidth, so the execution times will have to be multiplied by this factor. This means that the response time will be obtained by scaling up the the histogram X-axis in this interval by a factor of This scaling is done via linear interpolation. The resulting histogram is shown in figure 7b. The proce-

51 2. Análisis Temporal 43 dure can be expressed as: interval[] = getinterval(0,2)= [ 1 8, 1 8 ] result[] = scale( 10 3,interval) = [ 1 28, 1 28, 1 28, 1 28, 1 28, 1 28, 1 28 ] In interval [2, 6[, the value of ω 0 is 3/4. If the execution time of X 0 is in this interval, the histogram interval will have 1 to be scaled up by a factor of ω 0 = 4 3. But the calculus of the response time in this interval needs to take into account the execution time of previously finished tasks. Task X 1 is finished just before this interval start, so the scaled interval must be shifted to the right 7 positions (that is the round up of 6.67) In statistical terms, this means the convolution of the scaled interval with a finished histogram that have 7 zeros and a one in position 8 (the one function). The result of this convolution is shown in figure 7c. Algorithmically: In interval [6, 8[, the value of ω 0 is 1. The histogram in this interval will have to be scaled up by a factor of 1. This value has to be convolutioned with the execution time of previously finished tasks, which in this case includes X 1 and X 2 (12). The result of this convolution is shown in figure 7d. The particularisation of the algorithm for this case is: interval[] = getinterval(6,8,x 0)= [ 1 8, 1 8 ] sint[] = scale(1,interval) = [ 1 8, 1 8 ] finished[] = one(12) csint[] = sint[] finished[] = = [0, 0,0, 0,0, 0,0, 0, 0,0, 0,0, 1 8, 1 8 ] result[]= sum([result[], csint[]]) = [ 1 28, 1 28, 1 28, 1 28, 1 28, 1 28, 1 28, 0.1, 0.1, 0.1, 0.1, 0.1, 1 8, 1 8 ] Variable result in previous algorithm accumulates the probabilities for each interval. The resulting histogram is obtained by summing up the probabilities of all intervals. This histogram is shown in figure 7e. This results almost match perfectly the results of a packet by packet simulation of figure 7f, checking this way the algorithm correctness. The general algorithm for computing the interference of a deterministic task set is shown in figure 8 and can be easily understood once the previous example has been introduced. The task whose response time has to be computed is represented as A, with A being a struct with two fields: A.ω which is the nominal processor utilisation of A and A.ρ[] which is an array with the pmf of the execution time of A. The set of deterministic tasks that interfere A is denoted by the array of structs DX[]. Each element i of the array is an structure with two fields: DX(i).ω which is the nominal processor utilisation of DX(i) and DX(i).τ which is the (deterministic) execution time of DX(i). The algorithm starts by invoking procedure getu tilisations to calculate the values of the utilisation percentage ω and their corresponding intervals. It also calculates the finish time of the interval. In the previous example: W[]=[ 3 10,3 4,1], T[]=[0,2,6,8], F[]=[0,6.67,12]. Once these values and intervals of ω have been obtained, the algorithm performs as detailed in the example: the values Algorithm result[] = detgp SInterference(A,DX[]) A: struct of the interfered task DX[]: array of struct of deterministic tasks result: pmf of the interfered task 1 [W[],T[],F[]] = getutilisations(a,dx[]); 2 result = []; 3 finished = [1]; 4 for i=1:length(t[])-1 5 interval[] = getinterval(t(i),t(i+1),a.ρ[]); 6 sint[] = scale(1/w(i),interval); 7 finished[] = one(f(i)); 8 csint[] = sint[] finished[]; 9 result[]=sum(result[],csint[]); 10 end Fig. 8: Algorithm for deterministic interferences with GPS. interval[] = getinterval(2,6,x 0)= [ 1, 1, 1, 1 ] sint[] = scale( 4,interval) = [ 1, 1, 1, 1, 1 ] finished[] = one(7) = [0, 0,0, 0,0, 0,0, 1] csint[] = sint[] finished[] = [0,0, 0,0, 0, 0,0, 0, 1, 1, 1, 1, 1 ] of the execution times for each interval are scaled up by a result[]=sum([result[], csint[]) factor of 1/ω and convolutioned with the execution time of previously finished deterministic tasks. Finally, the probability values of each interval are summed element by element using procedure sum. Algorithm getu tilisations (figure 9) calculates the values of the utilisation percentage ω of a task A as a consequence of the interference of a deterministic task set DX, the intervals of the pmf of A of these values, and the amount of finished deterministic task computation in each interval. It starts by scaling the execution times of deterministic tasks to the value of A, according to their utilisation factors: if, for example, task i has double utilisation factor than A, then its execution time should be divided by 2. Then, the interfering tasks are sorted according to their execution time. Value ω tot is initialised to the sum of all utilisation factors (usually 1). The heart of the algorithm is the loop which starts in line 8. The loop filters repeated values of time in array T[]. In each pass, ω tot is updated by subtracting the ω of the finished tasks and the new value of A.ω is readjusted according to the GPS algorithm as A.ω/ω tot. Once the interference of a set of deterministic tasks has been calculated, the algorithm for computing the interference of a set of tasks (not necessarily deterministic) can be obtained using the previous one. The general algorithm is presented in figure 10. The general idea is to decompose each non deterministic task as a set of deterministic tasks. For example, a task with a pmf [0, 0.3, 0.7] would be decomposed into two tasks with pmf s [0, 0.3] and [0, 0, 0.7]. The corresponding pmf s for each case should have to be weighted summed according to their probabilities: 0.3 and 0.7. The algorithm of figure 10 computes all the combinations of deterministic task sets that can be formed with an array of non deterministic tasks X[]. This is a double array DXA[][] where DXA(i)[] is a deterministic task set characterised by three parameters: DX(i).ω is the utilisation factor, DX(i).τ is the deterministic execution time, and DX(i).ρ is the probability of this execution time. The algorithm detgp SInterf erence is invoked for each deterministic task set DXA(i)[], and the results for each case are summed with weight DX(i).ρ.

52 44 XI Jornadas de Tiempo Real (JTR2008) Algorithm [W[],T[],F[]] = getutilisations(a,dx[]) A: struct of the interfered task DX[]: array of struct of deterministic tasks W[]: array of utilisations T[]: array of times of utilisation changes F[]: array of finished times 1 for i=1:length(dx[]) 2 DX(i).τ= round(dx(i).τ*(a.ω/dx(i).ω)); 3 end 4 DX[] = sortbytime(dx[]); 5 ω tot = A.ω + P X(i).ω 6 T[]=[0]; W[]=[A.ω/ω tot]; 7 F[]=[0]; f=0; 8 for i=1:length(dx[]) 9 if ( DX(i).τ (length(a))) 10 ω tot =ω tot - DX(i).ω; 11 if ( i==1 (i>1 & (DX(i).τ DX(i-1).τ))) 12 T[]=[T, DX(i).τ ]; 13 W[]=[W, A.ω/ω tot]; 14 else 15 W(end)=A.ω/ω tot; 16 end 17 end 18 end 19 if ( T(length(T)) < length(a) ) 20 T=[T,length(A)]; 21 end 22 for i=2:length(t[]) 23 F[]=[F, F(i-1)+(T(i)-T(i-1))/W(i-1)]; 24 end Fig. 9: Algorithm for determining the intervals of ω Algorithm result = GP SInterference(A,X[]) A: struct of the interfered task X[]: array of struct of the interfering tasks result: pmf of the interfered task 1 result=[0]; 2 for i=1:length(x[]) 3 DXA[][] =combinations(dxa[][], X(i)); 4 end 5 [l,w] = size(dxa[][]); 6 for i=1:l 7 DR(i)[]=detGP SInterference(A,DXA(i)[]); 8 q=1; 9 for j=1:length(x[]) 10 q=q*dxa(i)(j).ρ; 11 end 12 WDR(i)[]=sum(WDR(i)[], q*dr(i)[]); 13 end 14 for i=1:l 15 result[]=sum(result[], WDR(i)[]); 16 end Fig. 10: Algorithm for computing interferences using GPS for the GPS family of algorithms and it has been showed to have a much higher computational complexity. An interesting issue is how different are the solutions obtained with assuming workload isolation and the interferences method, but this has been left as future work. The developed method is applicable in the fields of real-time systems and network calculus. REFERENCES [1] J. W. Liu, Real-Time Systems, Vol. 1, Pearson, [2] L. Abeni, G. Buttazzo, Qos guarantee using probabilistic deadlines, in: Proc. of the Euromicro Confererence on Real-Time Systems, [3] D. Ferrari, D. Verma, A scheme for real-time channel establishment in wide-area networks, IEEE Journal of Selected Areas Communication 8 (2) (1990) [4] Z. L. Zhang, D. Towsley, J. Kurose, Statistical analysis of the generalized processor sharing scheduling discipline, IEEE Journal of Selected Areas Communication 14 (6) (1995) [5] W.E.Leland, M.S.Taqqu, W.Willinger, D.V.Wilson, On the self-similar nature of ethernet traffic (extended version), IEEE/ACM Transactions on Networking 2 (1) (1994) [6] M. Zukerman, T. D. Neame, R. G. Addie, Internet traffic modeling and future technology implications, in: IEEE Infocom, [7] R. G. Addie, M. Zukerman, T. D. Neame, Broadband traffic modeling: Simple solutions to hard problems, IEEE Communications Magazine (1998) [8] T. Tia, Z. Deng, M. Shankar, M. Storch, J. Sun, L. Wu, J. Liu, Probabilistic performance guarantee for real-time tasks with varying computation times. [9] M. Gardner, Probabilistic analysis and scheduling of critical soft realtime systems., Phd thesis, University of Illinois, Urbana-Champaign. (1999). [10] J. Lehoczky, Real-time queueing network theory, in: Proc. of the 18th IEEE Real-Time Systems Symposium, 1997, p [11] J. Dìaz, D. Garcìa, K. Kim, C. Lee, L. L. Bello, J. López, S. L.Min, O.Mirabella, Stochastic analysis of periodic real-time systems, in: Proc. of the 23rd IEEE Real-Time Systems Symposium,, 2002, p [12] J. Dìaz, Tecnicas estocasticas para el calculo del tiempo de respuesta en sistemas de tiempo real, Phd thesis, Universidad de Oviedo, Spain (2003). [13] P. Skelly, M. Schwartz, S. Dixit, A histogram-based model for video traffic behavior in an atm multiplexer, IEEE/ACM Transactions on Networking 1 (4) (1993) [14] S.-K. Kweon, K. G. Shin, Real-time transport of mpeg video with a statistically guaranteed loss ratio in atm networks, IEEE Transactions In Parallel and Distributed Computing 12 (4) (2001) [15] A.K.Parekh, R.G.Gallager, A generalized processor sharing approach to flow control in integrated services networks: The single node case, IEEE/ACM Transactions on Networking 1 (3) (1993) [16] J. Bennett, H. Zhang, Wf2q: Worst-case fair weighted fair queueing, in: Proc. IEEE INFOCOM, 1996, p [17] M. Spuri, G. C. Buttazzo, Scheduling aperiodic tasks in dynamic priority systems, Real-Time Systems 10 (2) (1996) [18] L. Kleinrock, Queueing Systems. Volume 2: Computer Applications, Wiley-Interscience, New York, VI. CONCLUSIONS This paper has presented a stochastic method for calculating response times in systems where the average load can be occasionally higher than one during systems overloads. System stability is proposed as a soft real-time requirement alternative to the deadline requirement. It has been shown that system stability can be studied in an algorithm independent fashion under the hypothesis of workload isolation. This hypothesis holds approximately for some scheduling algorithms. A method based on calculating task interferences has been developed

53 2. Análisis Temporal 45 D-P domain feasibility region in dynamic priority systems Patricia Balbastre, Ismael Ripoll and Alfons Crespo Department of Computer Engineering Technical University of Valencia, Spain {patricia,iripoll, Abstract In the design of a real-time application it is fundamental to know how a change in the task parameters would affect the feasibility of the system. Relaxing the classical assumptions on static task sets with fixed periods and deadlines can give higher resource utilisation and better performance. But the changes on task parameters have to be done always maintaining feasibility. In practice, period and deadline modifications are only necessary on single tasks. Our work focuses on finding the feasibility region of deadlines and periods (called D-P feasibility region) for a single task in the context of dynamic, uniprocessor scheduling of hard real-time systems. This way, designers can choose the optimal deadline and period pairs that best fit application requirements. We provide an exact and an approximated algorithm to calculate this region. We will show that the approximated solution is very close to the exact one and it takes considerably less time. 1 Introduction Real-time systems are often designed using a set of periodic activities running on top of a real-time operating system. Timing parameters, such as computation times, periods and deadlines, are susceptible to adjustment during the design phase of the system. In practice, deadlines and periods are chosen to satisfy certain performance requirements. For hard real-time applications, the feasibility of the schedule has to be guaranteed a priory. In the context of real-time task scheduling of flexible applications, Earliest Deadline First (EDF) seems more attractive than Rate Monotonic (RM), since it makes a better use of CPU resources. Traditionally, many of the works on EDF scheduling are focused on tasks with deadlines equal This work was partially granted by the Spanish Government Research Office (CICYT) under grant DPI C02-02 and TIN C03 and by the European Unions Sixth Framework Programme (FP6/2005/IST/ ) to periods. This assumption greatly simplifies the feasibility analysis, but limits the applicability of the results. In control applications, setting deadlines less than periods allows the reduction of jitter and delay. Variable delays and jitter may degrade the performance of the system even jeopardise its stability [25, 27]. In real-time databases, temporal freshness of data is assured and workload minimised by assigning a deadline lower than the transaction s period, such as in the More-Less model proposed in [28]. Flexible applications, such as in multimedia systems, need to execute periodic activities but task periods do not need to be as rigid as in control applications. Moreover, control applications can benefit from executing at different rates for different operating conditions. Relaxing the classical assumptions on static task sets with fixed periods and deadlines can give higher resource utilisation and better control performance. One of the typical strategies is to adapt on-line periods, computation times and deadlines in a way that optimises overall control performance [10, 19]. But the changes on task parameters have to be done always maintaining feasibility. Thus, it is necessary to provide a feasibility region to the control designers. This feasibility region can be obtained by means of a sensitivity analysis, which is a generalisation of feasibility analysis [8]. Sensitivity analysis is a promising approach to deal with design uncertainties and allows the system designer to keep track of the flexibility of the system. In the literature, sensitivity analysis has been applied considering how a change in a task affects the rest of the tasks of the system. We are interested in how a change in a task parameter affects the rest of parameters of the same task. 1.1 Motivating example Let us consider the task set with parameters listed in Table 1 (see Section 2 for notation details). The example illustrates a task set where deadlines are lower than periods, which is used in control applications to limit the jitter and latency of control tasks. The execution chronogram is de-

54 46 XI Jornadas de Tiempo Real (JTR2008) picted in Figure 1(a). As it can be seen in the chronogram, CPU is not fully utilised and task periods and deadlines can be reduced. Indeed generally, in a real-time control application, the shorter the period, the better the performance [24]. If T 1 is a control task but T 2 is not, then we are interested in the set of feasible deadline and period assignments for task T 1. Table 1. Two tasks example C D P T T (a) Two periodic tasks with D < P (b) T 1 = (2,2,7) (c) T 1 = (2,4,5) Figure 1. Execution chronogram of two periodic tasks with different period and deadline assignment Figures 1(b) and 1(c) show two alternatives of period and deadline assignment for task T 1 while maintaining feasibility. In the first case, the goal is to reduce the task deadline as much as possible, and then the period has been reduced. In the second case, the period has been shortened until its minimum value and afterwards the deadline has been reduced. In both cases the task set is in the feasibility borderline although the order in which the reduction is made (first period or deadline) leads to different task parameters. Which alternative is better from the application point of view? It depends on the kind of application. Even in control applications, assigning a short period is not always better than assigning a shorter deadline. In Figure 1(b), by assigning the minimum possible deadline to T 1 allows a very low jitter but, on the other hand, assigning the shortest possible period to T 1 as in Figure 1(c) can lead to a better control performance. Anyway, it depends on the dynamics of the controlled system. As stated in [1], the effect of delays is not the same in any control loop, meaning that, sometimes the best alternative is the one depicted in Figure 1(b) and sometimes is the one showed in Figure 1(c). Therefore, it is a designer s decision. To integrate the design phase with the real-time implementation, we will provide the range of feasible deadlines and periods for a task so that system designers can choose the appropriated values to quickly adapt to the system dynamics or to improve the system performance. 1.2 Related work Sensitivity analysis has focused on permissible changes on tasks WCET, mainly because this has been applied to fault tolerance design. In this sense, [15, 26, 22] define the Critical scaling factor, as the largest possible scaling factor for task computation times while the task set to remain schedulable. These works assume Rate Monotonic priority assignment and deadlines equal to periods. Sensitivity analysis for computation times using EDF scheduling has been performed by Balbastre et al. in [2]. Moreover, the Optional Computation Window (OCW) is defined as the possibility that a task can execute in n activations in a window of m consecutive invocations with an increased computation time than the initially assigned. Deadlines less than or equal to periods are assumed. The analysis tool MAST[21] includes a slack calculation tool that calculate the system, processing resource or transaction slacks by repeating the analysis in a binary search algorithm in which computation times are successively increased or decreased. This implies to execute a feasibility test in each iteration, making the computational complexity very high. Sensitivity analysis for task periods for dynamic priorities can be found in [9] by Buttazzo et al. In this paper, periods are modeled as springs with given elastic coefficients and minimum lengths. Requested variations in task execution rates or overload conditions are managed by changing the rates based on the spring elastic coefficients. This work assumes deadlines equal to periods and EDF scheduling. In [24] tasks periods are chosen to minimise a cost function, preserving feasibility in Rate Monotonic. A similar problem is addressed in [7, 8] using the exact feasibility region in the domain of task frequencies. The rate analysis in the domain of the embedded systems is analysed in [20] and for multimedia streams in [18]. However, we were concerned with the rate and deadline analysis for hard real-time systems. Regarding deadline sensitivity analysis, there are some papers which assign new deadlines to periodic tasks in order to achieve a secondary objective. Cervin et al. [11] cal-

55 2. Análisis Temporal 47 culate new deadlines for control tasks in order to guarantee close-loop stability of real-time control systems. Baruah et al. [4] developed two methods to minimise output jitter of tasks by assigning shorter relative deadlines. The first method does not achieve the minimum deadline but it has polynomial time complexity. The second method calculates the minimum deadline at the expense of a higher computational complexity. Finding the minimum deadline of a periodic task has been also independently addressed by Balbastre et al. [3] and by Hoang et al. [14]. The feasibility region for tasks deadlines in EDF when deadlines are less than periods is formally presented by Bini and Buttazzo [6] however, it does not present any algorithm to derive this region. This is mainly due to the intrinsic complexity of the problem. 1.3 Contributions and outline The main contribution of this paper is the characterisation of the feasibility region in the domain of deadlines and periods (D-P feasibility region) for one task. We will provide both the exact feasibility region and an approximated solution with reduced complexity. To calculate this region, it is necessary to calculate the minimum deadline and period of a task. The minimum deadline calculation is proposed in [3]. As far as the authors are aware, there is no work on finding the minimum period of a periodic task in EDF when deadlines are different than periods. This is another contribution of our work. Our final goal is to present efficient and applicable algorithms that can be directly usable by system designers. The remainder of this paper is organised as follows: Section 2 introduces the computational model and the assumptions used. Section 3 presents the problem definition. To calculate the feasibility region we need to compute the minimum deadline and period of a task, which is detailed in sections 4 and 5, respectively. In Section 6, the characterisation of the D-P feasibility region is detailed. We also have implemented the proposed algorithms, and we have conducted some simulations in order to assess the validity and effectiveness of our proposals. The results are presented in Section 7. Finally, in Section 8 some conclusions are highlighted and future lines of research are discussed. 2 System model and notation Let τ = {T 1,...,T n } be a periodic task system scheduled under EDF. Each task T i τ has the following temporal parameters T i = (C i,d i,p i ). Where P i is the period; D i is the deadline relative to the start time; and C i is the worst case execution time, C i D i. The total utilisation factor of τ is expressed as U τ = n i=1 C i P i. Each task T i produces a sequence of jobs J ik (k = 0,1,2,..) that must be completed by its absolute deadline d ik = kp i + D i. Furthermore, the following definitions are needed: Definition 1 The period improvement β j for a task T j τ is computed as the ratio between the shortened period (P j ) and the original period, that is: β j = 1 P j P j Thus, β j = 0 means that no reduction is achieved, whereas β j 1 means that the maximum possible period reduction was achieved. Definition 2 [5] Let H τ (t) = n i=1 C i t+pi D i P i. It denotes the maximum cumulative execution time requested by jobs of τ whose absolute deadlines are less than or equal to t. 2.1 EDF feasibility when D i P i The Earliest Deadline First scheduling algorithm was first described in [17], and it was shown to be optimal in [12]. Feasibility test for EDF (when D i < P i ) consists of checking that t the inequality H τ (t) t holds. Leung and Merrill showed in [16] that the schedulability condition must only be checked in the interval [0,P), where P is the hyper-period. Baruah et al. [5] proposed a more accurate feasibility test, with the same schedulability condition but with a shorter interval to check. Ripoll et al. [23] presented a more accurate interval. Moreover, it proposes a new schedulability test based on the idea of Initial Critical Interval (ICI), that is, the first interval [0,R τ ) in which there is no idle time. Other authors refer to this interval as the first busy period. These ideas are summarised in the next theorem: Theorem 1 [23] τ is feasible if and only if: where and H τ (t) t t < B τ B τ = min(l τ,r τ ) L τ = n i=1 C i(1 D i P i ) 1 U T In fact, there is no need to check that H τ (t) t for all time instants in the interval [0, B τ ] to determine whether τ is feasible or not, except in the designated scheduling points. Definition 3 The set S τ of scheduling points for τ is defined by: S τ = {d ik / 1 i n, 1 k K d ik P } where K = PPi denotes the number of activations of T i in [0,P).

56 48 XI Jornadas de Tiempo Real (JTR2008) 3 Problem statement As formally defined in [8] the feasibility region in the X-space for a task set τ is the set of values of X such that τ is feasible. In our paper, the feasibility region will be redefined: Definition 4 The feasibility region in the X 1,..,X m -space for a task T i is the set of values of X 1,..,X m of T i that such that τ is feasible. According to the latter definition, the D-P feasibility region for T i τ is the set of values (D i,p i ) such that the task set τ is feasible. For simplicity, we will refer to the points (D i,p i ) within the D-P feasibility region as feasible points. It is important to note that all timing parameters of tasks different from T i remain unchanged. To build this region, we need to know the shortest values for deadlines and periods, that is, we need to compute the minimum period and the minimum deadline of a periodic task. These calculations provide us two points in the D-P space, which are: Point A=(D A,P A ): D A results from first computing the minimum deadline of T i and then P A is derived from calculating the minimum period for T i with a deadline equal to D A. Point B=(D B,P B ): P B results from first computing the minimum period of T i and then D B is derived from calculating the minimum deadline for T i with a period equal to P B. The resulting region is depicted in Figure 2. Note that it is not possible that D B < D A, since D A is the minimum deadline of T i. For the same reason, it is not possible that P A > P B. It is possible that A=B. All the points in the lined area generate a feasible EDF schedule. We can state that the exact feasibility region coincides with the region depicted in Figure 2 in the continuous lines. However, the exact area from A to B (dotted lines) is unknown. One of the contributions of this paper is to demonstrate that all the points belonging to the line AB are feasible. Up to now, we only know that the area depicted in Figure 2 is a sufficient but not necessary feasibility region. Before studying the D-P feasibility region in greater depth, we need to know how to compute the minimum deadline and the minimum period of a periodic task, in order to obtain points A and B. This is the aim of the next two sections. 4 Minimum deadline of a periodic task Finding the minimum deadline of a periodic task has been independently addressed by Balbastre et al. in [3] P i A B D=P Figure 2. D-P Feasibility region of T i (first approach) and by Hoang et al. in [14]. Although both algorithms are optimal, in the sense that they obtain the minimum deadline and run in pseudo-polynomial time complexity, it is shown in [3] that Balbastre s algorithm (called Deadlinemin) is much faster than Hoang s algorithm. The Deadlinemin algorithm which minimises the deadline of a task T i is presented in Listing 1. The algorithm starts at t = d i0 and computes the slack time in the immediate previous scheduling point of t. If the slack is greater than C i, the algorithm continues backwards to the next scheduling point; else a candidate for the minimum deadline is stored. The procedure is repeated for all jobs of T i whose deadline is in the interval [0,R τ ). 5 Minimum period of a periodic task Up to know, there is not an efficient method to calculate the minimum period of a periodic task in EDF when D i P i. In the case of deadlines equal to periods, the minimum period of a periodic task in EDF is easily deduced assigning the remaining processor utilisation to the task we want to minimise. However, the minimisation when deadlines are less than periods is more complex. In this section, a method to minimise a task period is formally presented. Let s assume that we add a task T i to the feasible task set τ. The only known parameter of T i is its computation time C i. We want to calculate the minimum period of task T i. The following theorem provides a way to calculate it. Theorem 2 Let τ be a feasible set of n periodic tasks. Let τ = τ T i with T i = (C i,p i,p i ). Then, τ is schedulable in (0, I] if and only if: where P t i = Hτ (t) A i (t) +C i P i max 0 t I (Pt i ) and A i (t) = t H τ(t) C i. D i

57 2. Análisis Temporal 49 Listing 1. Deadlinemin algorithm 1 function Deadlinemin(T i ) is 2 deadline := C i ; 3 k := Rτ P i ; 4 D min i := 0; 5 for s in 0.. (k 1) loop 6 t := min{(sp i )+D i,r τ }; 7 deadline = C i ; 8 while t > sp i +C i loop 9 if j i j n / t=( P t j 1)P j + D j then 10 if t H τ (t) < C i then 11 deadline := H τ (t)+c i sp i ; 12 break while; 13 end if ; 14 end if ; 15 t := t 1; 16 end while; 17 D min i := max(d min i,deadline); 18 end for; 19 return D min i ; 20 end Deadlinemin; Proof Let t (0,I]. We have to demonstrate that the processor demand function of task set τ is still feasible, that is, H τ (t) t. Introducing a new task implies that the processor demand function increases. As τ = τ T i, then the following expression holds: H τ (t) = H τ (t)+c i t P t i Substituting P t i by its value and taking into account that A i (t)c i = t H τ (t), we have: H τ (t) = H τ (t)+c i t H τ (t) A i (t) +C i H τ (t)+c i t H τ (t) A i (t) +C i ( ) ta i (t) H τ (t)+c i H τ (t)+a i (t)c i ( ) ta i (t) H τ (t)+c i H τ (t)+ t H τ (t) ( ) tai (t) H τ (t)+c i t H τ (t)+c i A i (t) H τ (t)+ t H τ (t) t Then, if P i max t (Pi t ) any deadline is missed since H τ (t) t t (0,I]. Theorem 2 provides the way to calculate off-line the minimum period of a periodic task in EDF. Moreover, note that we always use H τ (t) instead H τ (t), so there is no need to re-calculate H τ (t) function in every scheduling point. The theorem allows to calculate the minimum period of a new task incoming the system and it also allows to reduce the period of an existing task. The algorithm to calculate the minimum period of a periodic task (Periodmin algorithm) is presented in Listing 2. Some comments about the pseudo-code are needed: To assure system s feasibility we must calculate Pi t for every scheduling point all over the feasibility bound B τ (I B τ ). However, note that this bound depends on task periods, thereby needing to re-calculate B τ in every t S τ. Let s call Bτ t the feasibility bound of τ when P i = Pi t. We need to define the initial value of Pi 0 to compute Bτ 0 in the first scheduling point. As we know that the period of T i will never be greater than the corresponding minimum period of a system with D = P, we set Pi 0 = C i 1 U τ. This first value of Pi 0 gives the initial checking interval of (0,Bτ 0 ]. In the next iteration Pi 1 can have the following values: Pi 1 = Pi 0 : The checking interval is still the same. P 1 i > P 0 i : Then, the new checking interval is (0,B1 τ ]. Note that, in this case, B 1 τ < B 0 τ. Therefore, when executing the Periodmin algorithm the checking interval (end_time=min(b t τ )) is reduced more and more at each iteration, making the algorithm highly efficient (see Section 7 for runtime simulations). The algorithm presented in Listing 2 has pseudopolynomial time complexity in the size of the problem instance. However, as stated in [4], this bound is quite reasonable in practice whenever the system has a relative small utilisation. Moreover, minor changes on some periods may reduce drastically the complexity to polynomial time [13]. 6 D-P feasibility region Section 3 introduces a first approach to the D-P feasibility region. Once we know how to calculate the minimum deadline and period of a periodic task, we can calculate points A and B. Listing 3 shows the pseudocode of the points A and B of the D-P feasibility region of T i, which takes pseudopolynomial time. The algorithm starts the calculation of point A with the original deadline and period of T i (D i, P i ), and stores it in D 0 and P 0 to assign them again to T i in the calculation of point B.

58 50 XI Jornadas de Tiempo Real (JTR2008) Listing 2. Periodmin algorithm 1 function Periodmin(T i ) is 2 P 0 i = C i 1 U τ ; 3 P i = P 0 i ; 4 τ = τ+t i ; 5 end time := B τ ; 6 t := 0; 7 while ( t end time) loop 8 t S τ ; 9 P t i = H τ(t) A i (t) +C i ; 10 if P t i >P i then 11 P i = P t i ; 12 end time = B τ ; 13 end if ; 14 end while; 15 return P i ; 16 end Period min alg ; Listing 3. A and B calculation 1 procedure Calculate AB(in T i ; out A,B) is 2 P 0 =P i ; 3 D 0 =D i ; 4 D A =Deadlinemin(T i ); 5 P A =Periodmin(T i ); 6 P i =P 0 ; 7 D i =D 0 ; 8 P B =Periodmin(T i ); 9 D B =Deadlinemin(T i ); 10 A=(D A,P A ); 11 B=(D B,P B ); 12 end Calculate AB; The exact D-P feasibility region can be larger than the one depicted in Figure 2, which means that there exists feasible points inside the rectangle defined by vertices A and B. Then, the exact D-P feasibility region is larger than the area depicted in Figure 2. Moreover, we state that the exact feasibility region is larger than or equal to the area represented in Figure 3. In the remainder of the section we will demonstrate that any point inside the lined area depicted in Figure 3 generates a feasible EDF schedule. Before, an interesting property will be introduced: Property 1 Let T 1 (C 1,D 1,P 1 ) and T 2 (C 2,D 2,P 2 ), where D 1 D 2 and P 1 P 2. There is a positive value k = D 2 D 1 P 1 P 2, 0 k such that: { (0,k] d 1k d 2k [k, ] d 1k d 2k P i A B D=P Figure 3. D-P Feasibility region of T i (second approach) Proof We are interested in the k-th activation of T 1 and T 2 in which the scheduling point d 1k is greater than or equal to the scheduling point d 2k. Substituting by the expression of a scheduling point in the k-th activation: d 1k = kp 1 + D 1 and d 2k = kp 2 + D 2 d 1k d 2k kp 1 + D 1 kp 2 + D 2 k D 2 D 1 P 1 P 2 An example to graphically illustrate this property is depicted in Figure 4. In this example, from the second activation of T 1 and T 2 (k = 1), all the i-th scheduling points of T 1 are greater than the i-th scheduling points of T 2. Figure 4. Example of Property 1 The following theorem demonstrates that the points in the line AB of Figure 3 are feasible points. Theorem 3 Let X(D X,P X ), Y(D Y,P Y ) and Z(D Z,P Z ) three aligned points in the D-P space, such that D X D Z D Y and P X P Z P Y. If A and B are feasible points then E is feasible too. Proof If X and Y are feasible, then 1 : H X (t) t and H Y (t) t 1 To simplify the notation, H W (t) denotes the cumulative execution time function of τ (H τ (t)), where T i = (C i,d W,P W ) τ, being W any point in the D-P space. D i

59 2. Análisis Temporal 51 We want to demonstrate that H Z (t) t. To do so, we will demonstrate that in any activation k the k-th scheduling point of Z is either greater than or equal to the k-th scheduling point of X or Y. By Property 1 the following condition holds: { (0, D X D Z P Z P X ] d Xk d Zk [ D Z D Y P Y P Z, ] d Yk d Zk The uncertainty occurs in the interval [ D X D Z P Z P X, D Z D Y P Y P Z ]. However, if the three points A, B and E are aligned in the D- P space, the lines A-E and E-B have the same slope. Therefore: D X D Z P Z P X = D Z D Y P Y P Z and we know that the scheduling points of Z are always greater than or equal to the scheduling points of X or Y of the same activation. Note that H τ (t) is a non-decreasing step function that increases C i units of time at every scheduling point of T i τ. As points X, Y and Z have the same computation time, the steps of H are of the same amount. If all the scheduling points of Z are greater than or equal to the scheduling points of X or Y in the same activation, then it is easy to see that: H Z (t) H X (t) t or H Z (t) H Y (t) t Theorem 3 states that if two points X and Y are feasible, then any point Z in the line XY will also be feasible. Then, any point E in the line AB of Figure 3 is feasible, since both A and B are feasible points. Listing 4 shows the algorithm to generate the approximated feasibility region, that is, the region depicted in Figure 3. We will refer to this algorithm as the Approx-DPregion algorithm. Once the points A and B have been calculated, all the points E in the region s perimeter between A and B are obtained using the characteristic equation of the AB line. Simulations have shown that there may exist points below AB line that are feasible points, meaning that the approximated region is sufficient but no necessary. It is an open issue to formally characterise the exact D-P feasibility region. Nevertheless, listing 5 shows a method to obtain the exact region (called Exact-DP-region), by means of calculating the intermediate points L between A and B by calculating the minimum deadline for a given period between P A and P B. Another way to calculate the exact D-P region is to calculate the minimum period for a given deadline between D A and D B. It is important to note that the calculation of points between A and B in Approx-DP-region algorithm presents a constant-time cost (O(1)), while the calculation of the exact perimeter with Exact-DP-region algorithm needs the execution of Deadlinemin algorithm which takes pseudopolynomial time and it implies the re-calculation of the hyperperiod (P ) and H τ function. Therefore, it is expected that Approx-DP-region algorithm to be much more efficient than Exact-DP-region algorithm. Indeed, in Section 7 we will show that the Approx-DP-region is a tight approximation to the exact D-P feasibility region and its runtime is considerably slower. Listing 4. Approximated D-P region algorithm 1 procedure Approx DP region(t i ) is 2 Calculate AB(T i, A,B); 3 slope = D B D A P B P A ; 4 b= (P B D A ) (P A D B ) P B P A ; 5 for P E in P A..P B loop 6 D E = P E b slope ; 7 end loop; 8 end DP region; Listing 5. Exact D-P region algorithm 1 procedure Exact DP region(t i ) is 2 Calculate AB(T i, A,B); 3 for P L in P A..P B loop 4 P i =P L ; 5 D L =Deadlinemin(T i ); 6 end loop; 7 end DP region; 7 Experimental evaluation This section describes a set of experiments to evaluate the performance of the Periodmin and the DP-region algorithms. A number of tests have been run, specifically, 10 4 synthetic task sets have been generated for utilisations varying from U τ =0.5 to U τ =0.95 in steps of 0.05, resulting in 10 5 total simulations. Each task was generated by randomly choosing the task period as an integer between 10 and 250, the task utilisation between 0 and 1, and then randomly selecting the task computation in such a way that the total system utilisation is approximately equal to the desired load. To generate tasks with deadlines less than periods, deadlines were randomly decreased by 40% to 90% of the task period. The generated task is added to the task set only if the set is schedulable. We have evaluated the performance and the runtime of Periodmin, Approx-DP-region and Exact-DP-region. To

60 52 XI Jornadas de Tiempo Real (JTR2008) 1 Period improvement 3 tasks 5 tasks 8 tasks 10 tasks tasks 5 tasks 8 tasks 10 tasks Periodmin runtime β j Time (milliseconds) Utilization (%) Utilization (%) Figure 5. Period improvement of Periodmin algorithm Figure 6. Periodmin algorithm runtime Periodmin vs Binary search (5 tasks) 14 Periodmin Binary search evaluate the runtime of the proposed algorithms, the number of processor cycles needed by the algorithm has been measured on an AMD Athlon 64 2 GHz (Dual-Core). In order to avoid some external interferences interrupts were disabled during the algorithms execution. Time (milliseconds) Evaluation of the Periodmin algorithm To evaluate the performance of the algorithm presented in Listing 2 in reducing the period of a task T i we have measured the average period improvement (β i ) defined in Section 2. For each task set, we have randomly chosen one of the tasks T i and we have applied the Periodmin algorithm to this task. The results are depicted in Figure 5. As it is expected, the period improvement is lower as the total utilisation increases. It is also worth to observe that a greater period reduction is achieved as the number of tasks increases. The main reason of this behaviour is the experiment design. As the workload generator tries to obtain a set of tasks for a specific utilisation, in general, the period of a task is greater in a set of 10 tasks than in a set of 3 tasks. As the reduction depends on the remaining utilisation, the minimum period is more or less the same for a task in a set of 3 or 10 tasks, thus the period improvement is greater as the number of tasks increases. We have also tested the complexity of the algorithm as a function of the total processor utilisation for different number of tasks. The runtime in milliseconds is depicted in Figure 6. Obviously, the complexity increases with the number of tasks and the utilisation. In order to compare the Periodmin algorithm with other methods, we have implemented a binary search to obtain the minimum period of a task. The results are shown in Figure 7. The binary search is implemented by iteratively reducing the search interval bounded by upper and lower Utilization (%) Figure 7. Periodmin vs Binary search runtime comparison values for the period. In each iteration, the feasibility test for the candidate period is executed to know whether to increase or decrease the next iteration period. Note that this method is much faster than successively reducing the task period from an upper bound like in the MAST tool [21] and even though, the Periodmin algorithm is faster. 7.2 Evaluation of D-P region algorithms Now, we are going to compare Approx-DP-region and Exact-DP-region algorithms. We expect that Approx-DPregion will be faster than Exact-DP-region, but this is useless if the approximated region greatly differs from the exact region. Therefore, the first experiment consists on evaluating how close is the approximated area the exact one. The metric used is D L D E, that is, the relationship between the points in the perimeter of the exact area (L) and the points belonging to the AB line (E). The results are presented in Figure 8, for 10 tasks where we can observe that the area

61 2. Análisis Temporal 53 calculated by Approx-DP-region algorithm is very close to the real area calculated by Exact-DP-region algorithm. Indeed, the error is below 0.8% in the worst case. For low utilisations, A and B are probably the same point in D-P space, meaning that the approximated region coincides with the exact region, and D L D E = 1. For higher utilisations, we can observe that D L D E is very close to 1, which means that the two regions differ in a few points. Similar results are obtained for different number of tasks. Figures 9 and 10 show the processor cycles needed by both algorithms depending on the utilisation of the task set, for 5 and 10 tasks. As both algorithms execute the Calculate AB algorithm, only the execution inside the for loop has been measured. Note that a logarithmic scale has been used to appreciate the great differences between both algorithms. As it is expected, Approx-DP-region algorithm runs several orders of magnitude faster than Exact-DP-region. As a conclusion, Approx-DP-region algorithm is a good alternative to efficiently obtain the D-P feasibility region. 8 Conclusions and future work This paper addresses the problem of finding the feasibility region for deadlines and periods of the same task. The aim is to provide to system designers a tool to choose the best D and P values to adapt to the dynamic environment and enhance the performance of the selected task. The feasibility region is re-defined, exact and approximated solutions are provided, so designers can analyse the influence of deadline variations of a task on the period of the same task and viceversa. The approximated region is very close to the exact region (less than 0.2% in most cases) and it is computationally less expensive (several orders of magnitude). To calculate this region, the minimum period of a task must be known. We have also presented a pseudo-polynomial method to calculate the minimum period of a periodic task. Our analysis is done for one task at a time. However, it is easy to see that the analysis can be performed for more than one task in a sequential way. The order in which tasks are selected determines the shape of the regions, but this would be the first step to construct the D-P feasibility region of the entire task set. This is one of our ongoing research. Our future work is also focused on extending the feasibility region to the domain of the task computation times. The idea is to construct a three domain feasibility region (D, P and C space) where designers can choose the best D-P-C combination without jeopardising feasibility. References [1] P. Albertos, M. Salgado, and M. Olivares. Are delays in digital control implementation always bad? In Asian Control Conference, Shangai, D L /D E Quality of the solution 10 tasks Utilisation Figure 8. Quality of the solution provided by Approx-DP-region algorithm Processor cycles 1e Approx Exact Approx vs Exact runtime (5 tasks) Utilization (%) Figure 9. Approx vs Exact D-P region. Runtime comparison (5 tasks) Processor cycles 1e+07 1e Approx Exact Approx vs Exact runtime (10 tasks) Utilization (%) Figure 10. Approx vs Exact D-P region. Runtime comparison (10 tasks)

62 54 XI Jornadas de Tiempo Real (JTR2008) [2] P. Balbastre, I. Ripoll, and A. Crespo. Analysis of windowconstrained execution time systems. Journal of Real-Time Systems, 35(2): , [3] P. Balbastre, I. Ripoll, and A. Crespo. Minimum deadline calculation for periodic real-time tasks in dynamic priority systems. In Transactions on Computers, volume 57, pages , [4] S. Baruah, G. Buttazzo, S. Gorinsky, and G. Lipari. Scheduling periodic task systems to minimize output jitter. In Sixth Conference on Real-Time Computing Systems and Applications, pages 62 69, [5] S. Baruah, A. Mok, and L. Rosier. Preemptively scheduling hard real-time sporadic tasks on one processor. In IEEE Real-Time Systems Symposium, pages , [6] E. Bini and G. Buttazzo. The space of edf feasible deadlines. In 19th Euromicro Conference on Real-Time Systems, [7] E. Bini and M. D. Natale. Optimal task rate selection in fixed priority systems. In IEEE Real-Time Systems Symposium, pages , [8] E. Bini, M. D. Natale, and G. Buttazzo. Sensitivity analysis for fixed priority real-time systems. In 18th Euromicro Conference on Real-Time Systems, pages 13 22, [9] G. Buttazzo, G. Lipari, and L. Abeni. Elastic task model for adaptive rate control. In IEEE Real-Time Systems Symposium, pages , December [10] A. Cervin and J. Eker. Feedback scheduling of control tasks. In Proceedings of the 39th IEEE Conference on Decision and Control, [11] A. Cervin, B. Lincoln, J. Eker, K. Arzen, and G. Buttazzo. The jitter margin and its application in the design of realtime control systems. In Proceedings of RTCSA, [12] M. Dertouzos. Control robotics: the procedural control of physical processors. In IFIP Congress, pages , [13] M. Garey and D. Johnson. Computer and Intractability, A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, [14] H. Hoang, G. Buttazzo, M. Jonsson, and S. Karlsson. Computing the minimum edf feasible deadline in periodic systems. In Proceedings of RTCSA, August, [15] J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: Exact characterization and average case behaviour. In IEEE Real-Time Systems Symposium, pages , [16] J. Leung and R. Merrill. A note on the preemptive scheduling of periodic, real-time tasks. Information Processing Letters, 18: , [17] C. Liu and J.W.Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. JACM, 23:46 68, [18] Y. Liu, S. Chakraborty, and R. Mardulescu. Generalized rate analysis for media-processing platforms. In Proceedings of the RTCSA, [19] P. Martí, J. M. Fuertes, and G. Fohler. Jitter compensation for real-time control systems. In IEEE Real-Time Systems Symposium, [20] A. Mathur, A. Dasdan, and K. Gupta. Rate analysis for embedded systems. ACM Trans. on Design Automation of Electronic Systems, 3(3): , [21] J. L. Medina, M. González-Harbour, and J. M. Drake. Mast real-time view: A graphic uml tool for modeling objectoriented real-time systems. In IEEE Real-Time Systems Symposium, [22] S. Punnekkat, R. Davis, and A. Burns. Sensitivity analysis of real-time task sets. In Proceedings of the Conference of Advances in Computing Science, pages 72 82, [23] I. Ripoll, A. Crespo, and A. Mok. Improvement in feasibility testing for real-time tasks. Journal of Real-Time Systems, 11:19 40, [24] D. Seto, J. Lehoczky, and L. Sha. Task period selection and schedulability in real-time systems. In IEEE Real-Time Systems Symposium, pages , [25] K. Shin and X. Cui. Computing time delay and its effects on real-time control systems. IEEE Trans. on Control Systems Technology, 3(2): , [26] S. Vestal. Fixed-priority sensitivity analysis for linear compute time models. IEEE Transactions on Software Engineering, 20(4): , April [27] B. Wittenmark, J. Nilsson, and M. Törngren. Timing problems in real-time control systems. In Proceedings of the American Control Conference, Jan [28] M. Xiong and K. Ramamritham. Deriving deadlines and periods for real-time update transactions. In IEEE Real-Time Systems Symposium, pages 32 43, 1999.

63 2. Análisis Temporal 55 Providing Memory QoS Guarantees for Real-Time Applications A. Marchand, P. Balbastre, I. Ripoll and A. Crespo Universidad Politécnica de Valencia, Spain {patricia, iripoll, Abstract Nowadays, systems often integrate a variety of applications whose service requirements are heterogeneous. Consequently, systems must be able to concurrently serve applications which rely on different constraints. This raises the problem of the dynamic distribution of the system resources (CPU, memory, network, etc.). Therefore, an integrated Quality of Service (QoS) management is needed so as to efficiently assign resources according to the various application demands. Within this paper, we focus on a dynamic approach of QoS management for memory resource allocation based on the Skip-Over model. We detail our solution and we show how it improves the service of task memory requests while providing them guarantees. Quantitative results using the TLSF allocator have been performed in order to evaluate the memory failure probability with and without memory QoS manager. Keywords: Management. 1 Introduction Real-time, Quality of Service, Memory Nowadays, new real-time applications require more flexibility being of major importance the ability to adjust system resources to load conditions. The system resources (CPU, memory, energy, network, disk, etc.) that an application can use, can be adapted to the global needs. Up to now, all the efforts have been focused on CPU, energy and network management, while memory has not been considered as a dynamic resource for the real-time community. Recently a new algorithm for dynamic memory allocation (TLSF) [16] that solves the problem of the worst case bound maintaining the efficiency of the allocation and deallocation operations allows the reasonable use of dynamic memory management in real-time applications. The proposed algorithm, with a constant cost Θ(1), opens new possibilities with respect to the use of dynamic memory in realtime applications. There exists an increasing number of emerging applications using high amounts of memory, such as multimedia systems, video streaming, video surveillance, virtual reality, scientific data gathering, or data acquisition in control systems. Viewing the memory as a dynamic resource to be shared between real-time tasks, implies to manage this resource and give some kind of guarantees. CPU scheduling techniques can be adapted and used to deal with memory management. In [14] is defined a memory management system that adjusts memory resources to meet changing demands and user needs. The architectural framework that realizes this approach allows adaptive allocation of memory resources to applications involving both periodic or aperiodic tasks. However, in this model memory overruns are not considered. Since fragmentation can lead to a significant system degradation, the possibility that a memory request fail must be taken into account in the real-time model and it must be avoided as much as possible. When considering CPU use, several scheduling policies have been proposed to perform the CPU adaptation using different points of view. In particular, resource-based algorithms have been developed to characterize the timing requirements and processor capacity reservation requirements for real-time applications ([17, 2, 1, 10, 8]). Some works based on a job skipping scheme [9, 12] and providing flexible task models, have also been introduced. More recently, enhanced scheduling results based on the EDL (Earliest Deadline as Late as Possible) algorithm [6] have been proposed in [15] to optimize resource allocation in a job skipping context. Additional approaches such as the weaklyhard model [3] provide more general frameworks for QoSbased systems. 1.1 Summary and contributions Memory allocation is the problem of maintaining an application s heap space by keeping track of allocated and freed blocks. The decision to be made by the memory allocator is where to place the requested block in the heap. The allocator has no information about when the blocks will be

64 56 XI Jornadas de Tiempo Real (JTR2008) freed after they are allocated. The order of these requests is entirely up to the application. In some cases, the allocator may be unable to satisfy the request because there does not exist a block of memory of the specified size. It is then up to the application program to deal with such failure in an appropriate manner. This paper proposes a framework to minimise the number of fails in memory requests. The methodology proposed is based in skippable tasks, specifically, we adapt the Skip- Over model used in CPU scheduling to manage memory overruns. 2 Skip-over based CPU overload management Different techniques have been proposed to deal with CPU overload management. Efforts are oriented towards scheduling techniques which are able to include quality features into the scheduling of tasks while letting these tasks meet certain real-time constraints. To represent such quality of service constraints, Hamdaoui and Ramanathan in [9] proposed a model called (m,k)-firm deadlines. It guarantees that a statistical number of deadlines will be met, by using a distance-based priority scheme to increase the priority of an activity in danger of missing more than m deadlines over a window of k requests. If m = k, the system becomes a harddeadline system. The inherent problem of this model is that, in some cases, constraints can be satisfied even when lots of consecutive instances do not meet their deadline (e.g. the (100,1000)-firm deadline is respected even if over 1000 activations of the task, 100 consecutive instances do not meet their deadline). This problem is solved for the special case m = k 1 for which the (m,k) model reduces to the Skip-Over model [12]. The skip-over scheduling algorithms skip some task invocations according to a skip factor. The overload is then reduced, thus exploiting skips to increase the feasible periodic load. West and Poellabauer in [18] proposed a windowed lost rate, that specifies a task can tolerate x deadlines missed over a finite range or window, among consecutive y instances. In [7], Bernat et al. introduce a general framework for specifying tolerance of missed deadlines under the definition of weakly hard constraints. In what follows, we focus on the significant Skip-Over approach. Known results about the feabilibity of periodic task sets under this model are also recalled. 2.1 Model description The Skip-Over model [12] deals with the problem of scheduling periodic tasks which allow occasional deadline violations (i.e. skippable periodic tasks), on a uniprocessor system. A task τ i is characterized by a worst-case computation time C i, a period T i, a relative deadline equal to its period, and a skip parameter s i. This parameter represents the tolerance of this task to miss deadlines. That means that the distance between two consecutive skips must be at least s i periods. When s i equals to infinity, no skips are allowed and τ i is a hard periodic task. Every task τ i is divided into instances where each instance occurs during a single period of the task. Every instance of a task is either red or blue [12]. A red task instance must complete before its deadline whereas a blue task instance can be aborted at any time. However, if a blue instance completes successfully, the next task instance is still blue. Two scheduling algorithms were introduced about ten years ago by Koren and Shasha in [12]. The first one proposed is the Red Tasks Only (RTO) algorithm. Red instances are scheduled as soon as possible according to Earliest Deadline First (EDF) algorithm [13], while blue ones are always rejected. The second algorithm introduced is the Blue When Possible (BWP) algorithm which is an improvement of the first one. Indeed, BWP schedules blue instances whenever their execution does not prevent the red ones from completing within their deadlines. In other words, blue instances are served in background relatively to red instances. 2.2 Feasibility of skippable periodic task sets Liu and Layland in [13] have shown that a task set {τ i ;1 i n} is schedulable if and only if its cumulative processor utilization (ignoring skips) is no greater than 1, i.e., n i=1 C i T i 1. (1) Koren and Shasha proved that determining whether a set of periodic occasionally skippable tasks is schedulable is NP-hard [12]. However, they have shown the following necessary condition for schedulability for a given set Γ = {τ i (C i,t i,s i )} of periodic tasks that allow skips: n i=1 C i (s i 1) T i s i 1. (2) In [4], Caccamo and Buttazzo introduced the notion of equivalent utilization factor defined as follows. Definition 1 Given a set Γ = {τ i (C i,t i,s i )} of n periodic tasks that allows skips, the equivalent utilization factor is defined as: Up = max i D(i,[0,L]) L 0 L (3) 2

65 2. Análisis Temporal 57 where D(i,[0,L]) = ( L T i L T i s i )c i. (4) They also provided a necessary and sufficient condition for guaranteeing a feasible schedule of a set of skippable tasks which are deeply-red (i.e. all tasks are synchronously activated and the first s 1 1 instances of every task τ i are red) [5]: Theorem 1 A set Γ of skippable periodic tasks, which are deeply-red, is schedulable if and only if U p 2.3 Illustrative example 1. (5) Hereafter is presented a scheduling example with two tasks τ i (C i,t i,s i ) defined according to the Skip-Over model. Task τ 1 is a hard real-time periodic task (s i = ) while task τ 2 allows deadline skips (s i = 3). s i which gives the tolerance of this task to memory failures. Thus, a real-time set of periodic tasks consists in τ i = (C i,t i,d i,m i,s i ). In addition, M i can be described by a 2-tuple M i =(g i,h i ) considering the maximum amount of memory g i requested each period, and the maximal time h i during which allocations are persistent in memory (expressed in terms of numbers of periods of task τ i ). Consequently, the amount of memory used by the application in the worst-case is given by i h i g i. Additionally, dynamic memory allocation presents a level of memory fragmentation or wasted memory. Following the processor model, this wasted memory could be considered as spatial overhead. In the paper this wasted memory will be refered as M w. Moreover, the allocator uses a data structure to organise the available free blocks (M ds ). Considering these aspects, the total amount of memory M T needed to fulfill the application requirements can be expressed as: M T = h i g i + M w + M ds (6) i τ 1 (4,6, ) τ 2 (1,2,3) skip skip skip Figure 1. A Skip-Over schedule 3.2 Memory feasibility of skippable periodic task sets For memory feasibility, we have shown in [14] that a task set {τ i ;1 i n} is memory-schedulable if and only if its cumulative memory utilization (ignoring skips) is no greater than the total memory M T assigned to the application. The system is overloaded (U p = n i=1 C i T i = = 1.17), but tasks can be schedulable provided τ 2 exactly skips one instance every 3. 3 Skip-Over based memory overload management 3.1 New task model and notations In this section, we formally define the task model used. Let τ = {τ 1,...,τ n } be a periodic task system. It is assumed that a periodic task, requiring dynamic memory, requests each period an amount of memory. This amount of memory is allocated as result of one or several dynamic memory requests. Allocated memory is freed after some time interval by the same or other task. Note that in this model, it is not relevant which task frees memory, once it has been allocated the relevant aspect is the holding time. Taking into account this behavior, each task τ i τ has the following temporal parameters: a worst-case computation time C i, a period T i, a relative deadline D i, the dynamic memory needs M i and an additional parameter n i=1 h i g i M T. (7) As an analogy to the processor demand criteria [11], we turned to another form of schedulability test: the memory demand criteria. Definition 2 Given a set Γ = {τ i (C i,t i,m i,s i )} of n skippable periodic tasks with memory constraints, the equivalent memory utilization factor is defined as: where M = max L 0 i D(i,[0,L]) (8) D(i,[0,L]) = ( L L L T ih i + L T ih i )g i. T i T i s i T i s i Proof: Let denote by D(i,[0,L[) the total memory demand within [0,L[ for task τ i. First, let us evaluate the amount of memory requested by a task τ i over the interval [0,L[. Within any interval [0,L[, the number of periods observed for every task τ i is equal to L T i, thus involving a T i 3

66 58 XI Jornadas de Tiempo Real (JTR2008) total demand for memory allocations equal to L T i g i. According to the Skip-Over definition, every task τ i is allowed to skip one instance every s i task activations. Thus, for every task τ i, the total skipped memory allocations within [0,L[ is T L i s i g i. Let us now evaluate the amount of memory released by a task τ i over the interval [0,L[. Without skips, this amount would be only equal to L T ih i T i g i, taking into account the fact that task τ i does not perform any memory releases within the interval [0,T i h i [. However, every skippable periodic task τ i does not release any memory every T i s i periods. Hence, we have to withdraw from the previous quantity, an amount of memory corresponding to skipped task instances (i.e. non-allocated memory), which is equal to L T ih i T i s i g i. Consequently, the total amount of memory remaining at time instant t = L for task τ i is D(i,[0,L]) = ( L T i L T i s i L T ih i T i + L T ih i T i s i )g i. It follows that the maximal memory utilization is given by M = max L 0 i D(i,[0,L]). Theorem 2 A set Γ of skippable periodic tasks, which are deeply-red, is memory-schedulable if and only if 3.3 Illustrative example M M T. (9) 4 R-MRM Implementation Framework The R-MRM (Robust-Memory Resource Controller) implementation framework is the enabling technology for efficiently managing memory-constraint tasks using Skip-Over principles. It provides users with an operational framework to manage real-time applications. This framework has the following generic characteristics: Ability to dynamically handle task memory requests, Provide a recovery service invocation for failed requests, Provide a robust and fair service for tasks. Robust service is the ability to queue memory requests and ensure guaranteed allocation of these requests to the memory. Primarily used for handling requests, this capability is crucial for responding to tasks, and for a reliable R- MRM implementation. This is typically implemented using reliable queues with store and forward as well as rejection capabilities. The implementation framework partly relies on the framework previously proposed in [14]. R-MRM is a component that mediates between tasks and the dynamic memory allocator as depicted in Figure 3. Hereafter is given an example of the memory allocations a task which allows skips can perform (see figure 2). Task τ i has period T i =10, skip factor s i =3 and the data persistant time in memory (h i ) is equal to 4 periods. 4*g i 3*g i 2*g i g i % 30% 100% 90% + 80% skip 100% 60% skip 80% Figure 3. R-MRM external interaction view T i h max i T i Figure 2. Memory requests of a skippable periodic task (s i = 3) Observe that skips do not necessarily happen in a regular fashion, but that the distance between every two skips is at least s i periods, thus providing a minimal Quality of Service (QoS) for memory allocations to tasks. That is, after suffering a memory failure (i.e. g i = 0 for the corresponding period), memory allocations must be satisfied during at least s i 1 periods of the task. R-MRM offers two kinds of operations that a task can perform: memory request and free request. The memory request function involves the memory size requested and a deadline for this request. If there is available memory (this is evaluated according to a memory granting policy) to serve the request, the allocation is done by means of the malloc function provided by the dynamic allocator. If not, the requester task is blocked until the request can be solved or the deadline is reached. Solving the request means that other tasks can free memory blocks during this interval (calling the free request function) that can fulfill the memory needs of this request. 4

67 2. Análisis Temporal 59 a rejection policy, a recovery policy. Figure 4. R-MRM internal view When a task wants to free memory previously allocated, it calls the free request function. Additionally to the freed of the allocated memory, tasks that were blocked waiting for available memory are re-evaluated, according to a recovery policy (indicated in Listing 2 by the reevaluate Failed requests queue label). The pseudo-code of the memory request and free request operations are shown in Listings 1 and 2. Listing 1. Memory request 1 function memory request(size, deadline ) is 2 if ( memory granting policy) 3 insert in Granted requests queue 4 malloc(size ); 5 else 6 if (red task ) 7 rejection policy ; 8 insert in Granted requests queue; 9 malloc (size ); 10 else 11 insert in Failed requests queue 12 the calling task is blocked 13 end if ; 14 end if ; 15 end memory request; Listing 2. Free request 1 function free request ( int ptr ) is 2 free (ptr ); 3 re evaluate Failed requests queue; 4 end free request ; When a blue blocked task reaches its deadline, it is immediately discarded by the R-MRM component, exits the Failed requests queue and a memory request failure is sent to the task. The R-MRM component embeds three kinds of policies to address all its functionalities: a memory granting policy, All these policies are implemented by the Skip-Over based Memory Controller sub-component in a centralized manner. It is a foundational piece of the R-MRM solution. It includes crucial information needed for defining and provisioning memory. It implements the necessary functions to keep track of the available memory in the system, manages the Failed and Granted request queues and controls the timer associated to the deadline of a blue blocked task in the Failed requests queue. 4.1 Framework Policies We can see that the component implements a memory granting policy that allows to determine whether the memory request can be granted or not. If it is not the case (i.e. the request fails), the re-evaluation of the failed requests is done according to a recovery policy. Moreover, the R-MRM component can apply a rejection policy so as to give priority to the most important tasks (i.e. red tasks in the Skip-Over model). In the following, we consider the case of a real-time application whose heap space has been properly sized according to condition (9) of Theorem 2. We are going to see in more detail the three policies aforementioned The memory granting policy We are interested in a granting policy that controls the access to the application s heap space efficiently according to the skip-over constraints of all the tasks. When a memory allocation request is submitted, the Skip-Over based Memory Controller evaluates the amount of memory already allocated and compares it to the total amount of memory granted to the application. Consequently, it is always possible to know the amount of available memory at the current time in order to make a decision about the acceptance of the request. Let t be the current time which coincides with the arrival of a memory request I. Upon arrival, request I(d,g,h) is characterized by its deadline d, its maximum amount of memory g and the maximal time h during which allocation is persistent in memory. We assume that several memory requests are present in the Granted request queue at time t. Let denote by I(t) = {I i (d i,g i,h i ),i = 1 to req(t)} the memory request set supported by the machine at t. Then, the acceptance problem to solve when any memory request I occurs, is reduced to the test of a necessary and sufficient condition: 5

68 60 XI Jornadas de Tiempo Real (JTR2008) Theorem 3 Memory is granted to request I if and only if, considering the request set I(t) I, we have: req(t)+1 h i g i M T (10) i=1 If there is enough memory available then the allocation is granted, otherwise the request will undergo a recovery process aiming at attempting the request later on The recovery policy If the memory granting policy determines that there is not enough memory to serve the request, the task is temporarily put into the queue named Failed requests queue (see Figure 4), waiting there to make another attempt at being accepted. According to the Skip-Over model, tasks in this queue are only blue and can exit the queue in the following cases: When it is freed (by other task or tasks) the sufficient amount of memory to serve its request. In this case, the request can be granted and the task exits the Failed requests queue. When the deadline is reached. Then, the task exits the Failed requests queue with a failure The rejection policy Considering the working hypothesis specifying that memory can always be granted to red tasks, the problem of a red request failure is always resolvable, provided one (or several) blue tasks previously accepted is (are) removed from memory. Therefore, the problem of the rejection decision consists merely on determining which blue task has to be rejected. The criterion set for rejection consists on identifying the blue task having the least actual failure factor. Hence, we propose as a metric the Task Failure Factor (TT F i ) as the ratio between the number of failures observed for task τ i since initialization time and its number of activations at time t: TT F i (t) = nb f ailures(t) t T i (11) That means that the ready blue task whose failure ratio TT F i (t) computed from the initialization time is least, is candidate for rejection. Ties are broken in favor of the task with the earliest deadline. Note that this is an arbitrary metric. For instance, we might equally base the rejection criterion upon a sliding window whose size would need to be specified as a function of task periods for example. The criterion could as well rely on the evaluation of the task having the greatest number of successive successes since the last failure. This could be the subject of another study. 5 Simulation Study In this section, we evaluate how effective the proposed task model and scheduling scheme can solve the problem of guaranteeing memory allocation according to the QoS specification inherently provided by the Skip-Over task model. The pursued objective is to show the influence of the skip parameter s i upon the probability of the system to experiment memory failures. 5.1 Experiments To evaluate the actual performance of our solution, we constructed a simulator that models the behavior of the R-MRM component. Its evaluation was performed by means of four simulation experiments, all operating with the memory request and free request operations presented in Listings 1 and 2. The proposed scenarios (see Table 1) were specially designed to provide a comprehensive and comparative analysis of the proposed approach regarding the memory and QoS requirements previously exposed. Test s i Table 1. Simulation scenarios This permits to evaluate the capability of the system in exploiting skips so as to resolve memory failures, thus always guaranteeing memory service for the most important tasks (i.e. red tasks). s i parameters have been considered identical for all tasks in order to clearly demonstrate the influence of the QoS specification with respect to the memory failure probability observed. Experiments were carried out for 100 randomly generated task sets with period T i uniformly distributed in the range , maximal amount of memory G max i uniformly distributed in the range Deadlines equals to periods. Additional parameters, G avg i and G stdev i, define a normal distribution (average and standard deviation) used by task τ i to request memory blocks that will be used during an interval randomly generated as a uniform distribution between h max i 5.2 Results and h min i periods. Our results are shown in several different ways in Figures 5 to 12. In all cases, the x-axis displays the different 6

69 2. Análisis Temporal 61 Number of failed requests Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Number of solved requests Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Memory level w.r.t total live memory (%) Figure 5. Number of failed requests according to s i Memory level w.r.t total live memory (%) Figure 7. Number of solved requests according to s i Number of retries Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Number of overruns Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Memory level w.r.t total live memory (%) Memory level w.r.t total live memory (%) Figure 6. Number of retries according to s i Figure 8. Number of overruns according to s i percentages of memory amounts provided for the application with respect to the total live memory (i.e. i h i g i ) given by the task specification itself. Four output parameters have been evaluated: the number of failed requests, the number of retries, i.e. the number of times a memory request is re-attempted, the number of solved requests, the number of overruns; this case occurring when a task reached its deadline without having received memory allocation. Figures 5, 6, 7 and 8 show the absolute memory failure probability for each output parameter for the four tested scenario. First, notice that all the curves decrease with increasing percentage of memory given to the application, which is a logical behavior. Note also that for the remarkable point where the amount of memory is exactly equal to the total live memory (i.e. 100%), we observe a non-zero memory failure probability. This is due to the spatial overhead induced by the dynamic memory allocator (here the TLSF). For a zero memory failure probability, the amount of memory assigned to the application must be higher to take into account the data structures needed by the dynamic memory allocator for functioning (see equation 6). On the other hand, as expected, we observe that the memory failure probability for s i = (i.e. no skips allowed) is significantly higher than in the other scenarios. Interestingly, curves for s i = 10, s i = 6 and s i = 2 have almost identical distribution for a memory level greater or equal to 95% of the total live memory. In Figures 9, 10, 11 and 12, the relative memory failure probability is shown for each scenario. First, note that even when s i =, the memory failure ratio is relatively low, thus underlying the good behavior of the TLSF allocator. For instance, for a memory level greater or equal to 100%, the failed requests ratio observed 7

70 62 XI Jornadas de Tiempo Real (JTR2008) Number of failures / Number of mallocs Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Memory level w.r.t total live memory (%) Number of overruns / Number of mallocs Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Memory level w.r.t total live memory (%) Figure 9. Influence of s i on the ratio of the number of failures to the number of mallocs Figure 11. Influence of s i on the ratio of the number of overruns to the number of mallocs Number of solved / Number of failures Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Memory level w.r.t total live memory (%) Figure 10. Influence of s i on the ratio of the number of solved requests to the number of failures Number of overruns / Number of failures Test 1 (si = inf) Test 2 (si = 10) Test 3 (si = 6) Test 4 (si = 2) Memory level w.r.t total live memory (%) Figure 12. Influence of s i on the ratio of the number of overruns to the number of failures 6 Conclusions is less than 1% of the total memory requests. Results for s i {2,6,10} improve the memory failure occurrence probability. For instance, with s i = 2 and a memory level equal to 100%, the R-MRM enjoys more than factor of 2 memory failure advantage over the R-MRM used without skips. Moreover, as expected, we can note that this advantage is all the less significant as the memory level is higher. For high memory levels (> 105%), the R-MRM tends to have the same behavior for any scenario considered. Finally, we see that the ratio of solved requests is higher for small values of s i. For instance, for a memory level equal to 100%, the R-MRM applied to memory requests with s i = 2 will resolve more than 1.7 as many failed requests as with the R-MRM with s i =. In summary, we can say that the R- MRM used with skips gives the best results from both the memory failure probability point of view and the memory failure resolution capabilities. While feasibility and schedulability analysis from the CPU point of view is well understood, memory analysis for real-time systems has received less attention. In this paper we addressed the problem of scheduling real-time task sets with memory constraints. In particular, we presented a memory feasibility analysis for skippable periodic task sets. The memory feasibility test contained in this paper represents the first known result for periodic real-time tasks based on the Skip-Over model. Our main contribution was actually to design a component for a skip-over based memory overload management and to evaluate it. Through the results, we showed to what extend the proposed R-MRM component can minimize the memory failure occurrence probability, while a QoS level (i.e., skip parameter established by the application programmer) is always guaranteed for tasks. The strong point of this approach relies on the memory guarantees provided by the component. We be- 8

71 2. Análisis Temporal 63 lieve that the present approach is promising for enhancing the performance of memory-constraint applications and applying memory analysis in the real-time field. References [1] L. Abeni and G. Buttazzo. Resource reservation in dynamic real-time systems. Journal of Real-Time Systems, 27(2): , [2] L. Abeni, T. Cucinotta, G. Lipari, L. Marzario, and L. Palopoli. Qos management through adaptive reservations. Journal of Real-Time Systems, 29(2-3): , [3] G. Bernat, A. Burns, and A. Llamosi. Weakly hard real-time systems. IEEE Trans. Comput., 50(4): , [4] M. Caccamo and G. Buttazzo. Exploiting skips in periodic tasks for enhancing aperiodic responsiveness. In Proceedings of the 18th IEEE Real-Time Systems Symposium (RTSS 97), San Francisco, California, pages , [5] M. Caccamo and G. Buttazzo. Optimal scheduling for faulttolerant and firm real-time systems. In Proceedings of fifth conference on Real-Time Computing Systems and Applications (RTCSA 98), Hiroshima, Japan, [6] M. Chetto and H. Chetto. Some results of the earliest deadline scheduling algorithm. IEEE Transactions on Software Engineering, 15(10): , [7] A. B. G. Bernat and A. Llamosi. Weakly-hard real-time systems. In In IEEE Transactions on Computers, volume 50, pages , [8] C. Hamann, J. Loser, L. Reuther, S. Schonberg, J. Wolter, and H. Hartig. Quality-assuring scheduling: Using stochastic behavior to improve resource utilization. In 22nd IEEE Real-Time Systems Symposium, pages , [9] M. Hamdaoui and P. Ramanathan. A dynamic priority assignment technique for streams with (m,k)-firm deadlines. IEEE Transactions on Computers, 44: , [10] K. Jeffay, F. D. Smith, A. Moorthy, and J. Anderson. Proportional share scheduling of operating system services for real-time applications. In IEEE RTSS, pages , [11] K. Jeffay and D. Stone. Accounting for interrupt handling costs in dynamic priority task systems. In Proceedings of the 14th IEEE Real-Time Systems Symposium (RTSS 93), Raleigh-Durham, NC, pages , [12] G. Koren and D. Shasha. Skip-over algorithms and complexity for overloaded systems that allow skips. In Proceedings of the 16th IEEE Real-Time Systems Symposium (RTSS 95), Pisa, Italy, [13] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM, 20(1):46 61, [14] A. Marchand, P. Balbastre, I. Ripoll, M. Masmano, and A. Crespo. Memory Resource Management for Real-Time System. In Proceedings of the 19th Euromicro Conference on Real-Time Systems, Pisa, Italy, [15] A. Marchand and M. Silly-Chetto. Dynamic real-time scheduling of firm periodic tasks with hard and soft aperiodic tasks. Real-Time Systems, 32(1-2):21 47, [16] M. Masmano, I. Ripoll, A. Crespo, and J. Real. Tlsf: A new dynamic memory allocator for real-time systems. In Proceedings of the 16th Euromicro Conference on Real-Time Systems, Catania, Italy, [17] C. W. Mercer, S. Savage, and H. Tokuda. Processor capacity reserves for multimedia operating systems. Technical report, Pittsburgh, PA, USA, [18] R. West and C. Poellabauer. Analysis of a windowconstrained scheduler for real-time and best-effort packet streams. In In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS 00), Orlando, Florida (USA),

72

73 3. Sistemas Operativos y Middleware

74

75 3. Sistemas Operativos y Middleware 67 Operating System Support for Execution Time Budgets for Thread Groups Mario Aldea Rivas and Michael González Harbour Universidad de Cantabria Santander, SPAIN {mgh, aldeam}@unican.es Abstract The recent Ada 2005 standard introduced a number of new real-time services, with the capability of creating and managing execution time budgets for groups of tasks. This capability has many practical applications in real-time systems in general, and therefore it is also interesting for realtime operating systems. In this paper we present an implementation of thread group budgets inside a POSIX real-operating system, which can be used to implement the new Ada 2005 services. The architecture and details of the implementation are shown, as they may be useful to other implementers of this functionality defined in the new standard. Keywords: Real-time systems, Execution time budgets, Thread groups, CPU time, Ada Introduction 1 In hard real-time systems it is essential to monitor the execution times of all tasks and detect situations in which the estimated worst-case execution time (WCET) is exceeded. This detection was usually available in systems scheduled with cyclic executives, because the periodic nature of its cycle allowed checking that all initiated work had been completed at each cycle. In event-driven concurrent systems the same capability should be available, and can be accomplished with execution time clocks and timers. This need for managing execution time is recognized in standards related to real-time systems. The POSIX standard [4] defines services for execution time measurement and budget overrun detection, and its associated real-time profiles [5] require implementations to support these services. The recent Ada 2005 standard introduced a number of new 1. This work has been funded by the Plan Nacional de I+D+I of the Spanish Government under grant TIC C03 (THREAD project), by Ada Core, and by the European Union s Sixth Framework Programme under contracts FP6/2005/IST/ (FRESCOR project) and IST (ARTIST2 NoE). This work reflects only the author s views; the EU is not liable for any use that may be made of the information contained herein. real-time services intended to provide applications with a higher degree of flexibility. In particular this standard defines capabilities for measuring the execution time of individual tasks, and the ability to detect and handle execution-time budget overruns. As real-time applications evolve towards an increased complexity level, issues such as composability of independently developed application components and support for legacy code introduce the need for supporting different levels of hierarchy in the scheduling mechanism, leading to a hierarchical concurrency model with different layers, and with capabilities for establishing boundaries for the protection of different parts of the application. In this context of hierarchical scheduling it is often required to bound the execution time of a group of activities that are inside the same protection boundary, so that they cannot interfere with other activities in other protection boundaries by using up more resources than they should. This need introduces a requirement on the underlying implementation to support the measurement of the execution times of groups of tasks, and the handling of potential budget overruns, in a way similar to what is usually done for individual tasks. Following this general requirement, the Ada 2005 standard defines services for execution-time budgets for groups of tasks, and is now a step forward in relation to the realtime extensions to POSIX, which still has no such service. In this paper we propose an implementation of a mechanism to support execution-time budgets for thread groups inside a POSIX operating system. The API of this implementation could be used as a basis for a future extension to POSIX. It will also be used to implement the task group budgets defined in Ada The architecture and details of the implementation are shown, as they may be useful to other implementers of this functionality defined in the new standard. Some performance metrics are provided. The paper is organized as follows. Section 2 discusses the current services that are available in the platform chosen for this implementation, MaRTE OS and GNAT, and that are related to thread group budgets. Section 3 introduces the

76 68 XI Jornadas de Tiempo Real (JTR2008) services designed to represent sets of threads. Section 4 discusses the implementation of the execution time clocks for groups of threads, while Section 5 does the same for budgets and their associated handlers. Section 6 provides some performance metrics and, finally, Section 7 gives our conclusions. 2. Background The implementation of execution time budgets for thread groups presented in this paper has been developed in MaRTE OS [1] [2], which is a real-time operating system (RTOS) that follows the POSIX.13 [5] minimum real-time system profile, and is mostly written in Ada. It is available for the ix86 architecture as a bare machine, and it can also be configured as a POSIX-thread library for GNU/Linux. The GNAT run-time library has been adapted to run on top of MaRTE OS, which is itself being extended in a joint effort between Ada Core and the University of Cantabria with the objective of providing a platform fully compliant with Ada 2005, available for industrial, research, and teaching environments. The implementation of thread group budgets presented in this paper is part of the effort to achieve this objective. Two of the new Ada 2005 real-time services are closely related to the thread group budgets and are already available in MaRTE OS and GNAT [3]: Timing events are defined in Ada 2005 as an effective and efficient mechanism to execute user-defined timetriggered procedures without the need to use a task. They are very efficient because the event handler may be executed directly in the context of the interrupt handler, avoiding the need for a server task. Execution time clocks and timers are defined in Ada 2005 as a standardized interface to obtain the execution time consumption of a task, together with a mechanism that allows creating handlers that are triggered when the execution time of a task reaches a given value, providing the means to execute a user-defined action when the execution time assigned to a specific task expires. Timing events have been implemented in MaRTE OS through a service that we call timed handlers, which are not only useful to implement their Ada counterpart, but are also useful to other applications as a general-purpose RTOS mechanism. MaRTE OS supports the execution-time clocks and timers defined in POSIX.1, which would be appropriate to implement their couterparts in Ada. However, the timers defined in POSIX to detect execution time overruns use an operating system signal to notify about their expiration. Signals are a very scarce resource inside an RTOS. Besides, the signal is usually handled through a thread that is waiting to accept the signal, but this is a mechanism that introduces relatively high overheads, mainly due to the need for the handler to be a thread, with the associated costs in context switches. This leads to the same reason for introducing the new "timing events" mechanism for regular time management. As a consequence, the Ada implementation of execution time clocks and timers has been achieved in MaRTE through the "timed handler" mechanism, which allows a direct handling of the event inside the hardware timer interrupt handler, thus avoiding the use of a signal and the subsequent double context switch that would be necessary otherwise. To implement thread group budgets inside MaRTE OS we will follow an approach similar to that followed for execution time budgets for individual threads, creating the appropriate execution time clocks for thread groups and extending the "timed handler" mechanism to also support these new clocks. 3. Thread sets Before creating the execution time clocks for thread groups or sets, it is necessary to specify a mechanism to represent the groups themselves. Instead of defining a mechanism specific to execution-time clocks, we have chosen to create an independent RTOS object that represents a group of threads. In this way, we will be able to address future extensions that require handling groups of threads using these same objects. Examples of such new services might be related to the requirements for supporting hierarchical scheduling, for instance to suspend or resume a group of threads atomically. A thread set is implemented by a record that may be extended in the future to add functionality. This record has the following fields: Set : A list of the threads belonging to the set. Iterator: A reference to the current thread in the list, used when iterating through marte_threadset_first and marte_threadset_next. A restriction has been made so that a thread can belong to only one thread set. This restriction is also made in the Ada 2005 standard, and its rationale is that in the hierarchical scheduling environment for which thread groups are useful, threads only belong to one specific scheduling class, and therefore to one specific set. This restriction allows a more efficient implementation, because at each context switch only one of the Consumed_Time fields of the set to which the running thread belongs needs to be updated. Threads can be added/removed to/from a thread set dynamically.

77 3. Sistemas Operativos y Middleware 69 Every thread has a pointer in its thread control block (TCB) to the set it belongs to. This field is null if the thread doesn t belong to any thread set. The C language API to manage thread sets from the application level is the following: // create an empty thread set int marte_threadset_create (marte_threadset_id_t *set_id); // destroy a thread set int marte_threadset_destroy (marte_threadset_id_t set_id); // empty an existing thread set int marte_threadset_empty (marte_threadset_id_t set_id); // add a thread to a set int marte_threadset_add (marte_threadset_id_t set_id, pthread_t thread_id); // delete a thread from a set int marte_threadset_del (marte_threadset_id_t set_id, pthread_t thread_id); // check thread membership int marte_threadset_ismember (marte_threadset_id_t set_id, pthread_t thread_id); // reset the iterator and get the first thread id int marte_threadset_first (marte_threadset_id_t set_id, pthread_t *thread_id); // advance the iterator and get next thread id int marte_threadset_next (marte_threadset_id_t set_id, pthread_t *thread_id); // check whether the iterator can be advanced int marte_threadset_hasnext (marte_threadset_id_t set_id) // get the set associated with the given thread int marte_threadset_getset (marte_threadset_id_t *set_id) pthread_t thread_id); 4. Execution time clocks for thread groups To implement execution time clocks for groups of threads we add the following information to the object that represents a thread set: Consumed_Time: CPU-time consumed for all the task in the group. Every time a thread of a given set leaves the CPU, the time consumed by this task since its last activation is added to the Consumed_Time of its thread set, even if there is no timed event associated with it, because the value of the execution-time clock may be read at any time by the application. Group_Timed_Event : A reference to the internal RTOS execution time event, used by the scheduling mechanism. A set can be associated with at most one such event. The API to obtain an execution-time clock from a thread set is: // destroy a thread set int marte_getgroupcpuclockid (marte_threadset_id_t set_id, clockid_t *clock_id); The returned id represents a clock that can be read and set through the standard POSIX API for clocks, i.e., using functions clock_gettime, clock_settime,... They can also be used as the base for POSIX timers and MaRTE OS timed events as any other clock defined in the system. They can not however be used as the base for the clock_nanosleep operation, as is also the case with the single-thread CPU-time clocks. POSIX leaves this behavior as unspecified and Ada does not define execution time as a type that can be used in the equivalent delay statements. POSIX requires type clockid_t to be defined as an arithmetic type, and therefore clock ids are implemented using a unsigned number of 32 bits. The value stored in a clock id can have different interpretations: Special values for the regular calendar-time clock CLOCK_REALTIME, the execution time clock of the current thread CLOCK_THREAD_CPUTIME_ID, and the monotonic clock CLOCK_MONOTONIC. A pointer to a thread control block when the clock is a thread CPU-time clock of a particular thread. A pointer to a thread set when it is a thread group clock. 5. Timed events based on a group clock Group clocks can be used as the base of timers and timed handlers. When a timer or a timed handler is armed, a MaRTE OS timed event is enqueued in the system event queues. Time-based events in MaRTE OS are of two kinds: standard time and execution time. They are kept in separate priority queues because they cannot be compared with each other for ordering. Events based on group clocks are a special case of execution time events. An execution time event has the following information: CPU_Time: The event will expire when the execution time consumed by the associated task reaches this value Group_Expiration_Time: The event will expire when the Consumed_Time field of the task set associated with the event reaches this value. This field is only used in events based on a group clock. Is_Based_On_Group_Clock: This is a boolean used to identify events based on group clocks Base_Clock: A clock id representing the clock used as the timing base of the event. It could be a thread CPUtime clock or a group clock.

78 70 XI Jornadas de Tiempo Real (JTR2008) Task_Where_Queued : A pointer to the task that has queued the event. Execution time events are kept in a queue associated with the task on which the event is based on, and stored as the CPU_Time_Timed_Event_Queue in the task control block. Every time a new thread gets the CPU, the events at the head of the standard-time events queue and of the running task s CPU_Time_Timed_Event_Queue queue are compared. The hardware timer is programmed to expire at the most urgent of the two. Events based on group clocks are special CPU-time events that jump between the CPU_Time_Timed_Event_Queue of the threads in the group. Each time the system schedules a task included in a thread set that has an event associated, the following actions are performed in the Do_Scheduling internal kernel operation: -- Set CPU_Time of the event according to the -- time consumed by T T.Set.Group_TE_Ac.CPU_Time := T.Used_CPU_Time + (T.Set.Group_TE_Ac.Group_Expiration_Time - T.Set.Consumed_Time); -- Move Group_TE_Ac from one task to another if T.Set.Group_TE_Ac.Task_Where_Queued /= null then -- Dequeue from the list it was queued Dequeue (T.Set.Group_TE_Ac, T.Set.Group_TE_Ac.Task_Where_Queued. CPU_Time_TEs_Q); end if; -- Enqueue in T's list Enqueue_In_Order (T.Set.Group_TE_Ac, T.CPU_Time_TEs_Q); T.Set.Group_TE_Ac.Task_Where_Queued := T; Dequeue and enqueue operations are very fast, because the number of CPU-time events associated to a task usually will be very small, either one or two: a CPU-time event and a group event. Consequently the number of extra operations required at each context switch to manage these clocks is kept small, and the implementation can efficiently schedule the threads with an acceptable overhead, as can be seen in the following performance metrics section. 6. Performance metrics The support for group budgets has already been implemented in MaRTE OS. Execution time accounting introduces a small overhead: enabling this service in MaRTE OS increments the context switch time by less than 5%. Group execution time accounting increments the context switch time by another 4%, representing a total of 9% increment with respect to a system with no CPU-time accounting in an x86 architecture. The overheads of the budget overrun detection are also relatively small. Table 1 shows a comparison of the overheads of two detection mechanisms, as measured in a 3.4GHz Pentium IV. The first one is implemented using a regular POSIX timer that sends a signal when the budget expires, and a handler thread that blocks waiting to accept the signal. The second mechanism is implemented using the new timed handler service. We can see that the overhead of the second mechanism is much smaller. Table 1. Overhead of budget overrun notification mechanism Metric 7. Conclusion As the complexity of real-time systems evolves, hierarchical scheduling and partitioning are mechanisms used to cope with it, by helping in establishing protection boundaries and easing the composability of independently-developed application components. One of the requirements of this partitioning is the time protection among the different groups of tasks in the hierarchy, which can be achieved by using thread group budgets as those specified in the new Ada 2005 standard. This paper has presented an implementation of the support needed to provide such budgeting services in a realtime operating system called MaRTE OS. The paper describes the architecture and details of the implementation, together with the rationale for the main design decisions, so that this information can be used by other implementers of this functionality, either as part of Ada run-time systems, or as part of a general-purpose RTOS. The implementation has proven to be straightforward, and the overheads introduced are small, both in the context switch times and in the budget overrun notification mechanism. As future work, the functionality defined in Ada 2005 for group budgets will be implemented. It is anticipated that support for the Ada group budgets will be a simple package built on top of the MaRTE OS implementation described in this paper. References Time (μs) (using timer and auxiliary thread) From user s thread to handler From handler to user s thread Total time: Time (μs) (using timed handlers) [1] Aldea Rivas M. and González Harbour M. MaRTE OS: Minimal Real-Time Operating System for Embedded Applications. Universidad de Cantabria. marte.unican.es/

79 3. Sistemas Operativos y Middleware 71 [2] Aldea Rivas M. and González Harbour M. MaRTE OS: An Ada Kernel for Real-Time Embedded Applications. Proceedings of the International Conference on Reliable Software Technologies, Ada-Europe-2001, Leuven, Belgium, Lecture Notes in Computer Science, LNCS 2043, May, 2001, ISBN: , pp. 305,316. [3] Aldea Rivas M. and Ruiz J.F.. Implementation of new Ada 2005 real-time services in MaRTE OS and GNAT. International Conference on Reliable Software Technologies, Ada-Europe-2007, Switzerland. [4] IEEE Std :2004 Edition, Information Technology Portable Operating System Interface (POSIX). The Institute of Electrical and Electronics Engineers. [5] IEEE Std Information Technology - Standardized Application Environment Profile- POSIX Realtime and Embedded Application Support (AEP). The Institute of Electrical and Electronics Engineers. [6] S. Tucker Taft, Robert A. Duff, Randall L. Brukardt, Erhard Ploedereder, Pascal Leroy (Eds.) Ada-2005 Reference Manual. Language and Standard Libraries. International Standard ISO/IEC 8652/1995(E) with Technical corrigendum 1 and Amendment 1. Springer, Number 4348 in Lecture Notes in Computer Science, Springer-Verlag (2006).

80 72 XI Jornadas de Tiempo Real (JTR2008) UNA MÁQUINA VIRTUAL PARA SISTEMAS DE TIEMPO REAL CRÍTICOS José A. Pulido, Santiago Urueña, Juan Zamorano y Juan A. de la Puente Grupo de Sistemas de Tiempo Real y Arquitectura de Servicios Telemáticos Universidad Politécnica de Madrid (UPM) Resumen La máquina virtual de ASSERT proporciona un entorno de ejecución con un comportamiento temporal determinista para sistemas de tiempo real con requisitos de alta integridad, como los utilizados en los sistemas embarcados en los vehículos espaciales. Para ello, se congura como una plataforma que admite componentes de software con un comportamiento temporal acorde con una serie de normas, y rechaza la ejecución de otros tipos de componentes. Los componentes de software se generan automáticamente a partir de una descripción de alto nivel del sistema, que permite incluir elementos derivados de modelos funcionales. Palabras clave: Sistemas de tiempo real, sistemas de alta integridad, desarrollo basado en modelos. 1. Introducción Los sistemas de tiempo real embarcados en vehículos espaciales a menudo tienen unos requisitos de seguridad muy estrictos. Esto se debe a la necesidad de asegurar que las acciones funcionales se ejecutan en los intervalos de tiempo marcados por el algoritmo correspondiente, ya que el funcionamiento incorrecto de muchos de estos sistemas puede comprometer seriamente la seguridad de los bienes, o incluso de las personas en el caso de una misión tripulada, por lo que se hace necesario asegurar que su comportamiento está determinado en todas las situaciones posibles. El proceso de veri- cación y validación de este tipo de sistemas es, por tanto, muy estricto. La Agencia Espacial Europea (ESA) ha desarrollado normas estrictas para el desarrollo y el control de calidad del software embarcado [4, 5]. Entre otros requisitos, estas normas establecen la necesidad de efectuar análisis de tiempos de respuesta en el software de tiempo real embarcado. La tendencia actual apunta hacia una separación entre los métodos que se emplean para el desarrollo del código funcional y el que se utiliza para garantizar el comportamiento temporal especicado. En el primer caso es corriente desarrollar el software a partir de modelos construidos con herramientas como Simulink R. El código generado de esta forma se integra manualmente en una estructura de concurrencia, que puede estar basada en un ejecutivo cíclico [9] o, más recientemente, en un núcleo de tiempo real [10]. Este tipo de intervención manual complica el proceso de desarrollo de software y también el proceso de vericación y validación del mismo, ya que a menudo ingenieros expertos en la parte funcional del proyecto deben encargarse de cumplir también con los requisitos de tiempo real relacionados con los aspectos concurrentes del sistema, complicando su labor y, por tanto, aumentando el número de errores introducidos a lo largo del proceso. El proyecto ASSERT (Automated proof based System and Software Engineering for Real-Time) tiene como objetivo la mejora de los procesos de desarrollo de sistemas y software en el dominio aeroespacial mediante la aplicación de métodos formales y modelos de componentes de software. Con ello se pretende desarrollar un conjunto de bloques constructivos para su utilización en entornos abiertos que se puedan compartir por diversos equipos de trabajo en este dominio. En este contexto se ha desarrollado un nuevo proceso de desarrollo de software que permite separar los aspectos funcionales (por ejemplo, algoritmos de control) de los mecanismos de concurrencia y tiempo real. Este enfoque permite que los ingenieros especializados en el dominio aeroespacial se centren en el diseño de los algoritmos funcionales, y que el código correspondiente se inserte de manera semiautomática en un conjunto de contenedores que denen los elementos de concurrencia tareas y elementos de sincronización entra ellas y temporización comportamiento periódico o esporádico, plazos de ejecución, etc. Los contenedores se ejecutan sobre una máquina virtual que garantiza que el comportamiento de los elementos de sincronización y temporización es correcto. En conjunto, el proceso de desarrollo de ASSERT asegura que los requisitos temporales especicados para el sistema se reejan en propiedades de las entidades de ejecución de forma correcta. En este artículo se describe el concepto y las ca-

81 3. Sistemas Operativos y Middleware 73 racterísticas de la máquina virtual de ASSERT. Previamente se efectúa una presentación introductoria del proceso de desarrollo seguido, y más adelante se describen algunos aspectos de implementación. Por último se hace referencia a la aplicación preliminar de estos conceptos en algunos proyectos piloto de tipo industrial. 2. El proceso de desarrollo de ASSERT El proceso de desarrollo de ASSERT [1] está basado en los siguientes principios: La denición de un metamodelo especíco que permite expresar las propiedades que interesan en el dominio de aplicación que se considera. La separación entre los aspectos relacionados con la funcionalidad, concurrencia, interfaz y aquellos dependientes de la plataforma de ejecución. La denición formal de propiedades y de transformaciones entre modelos que garantizan que se preservan las propiedades temporales. El metamodelo es una denición precisa de los elementos que se pueden utilizar para construir modelos de sistemas informáticos en un dominio determinado [7]. En el caso de ASSERT se ha denido un metamodelo basado en el perl de Ravenscar [2], un subconjunto de la parte de concurrencia del lenguaje Ada [8] orientado a la realización de sistemas de alta integridad, con un comportamiento temporal previsible y analizable. Este metamodelo se denomina RCM (Ravenscar Computation Model). La separación entre los aspectos funcional y temporal se lleva a cabo mediante la noción de contenedor. Un contenedor es un componente genérico en el cual se insertan los elementos funcionales del software, asegurando por construcción el resto de las propiedades del mismo. En ASSERT se distinguen dos tipos de contenedores: Los contenedores de aplicación (APLC, application-level containers). Denen los componentes del sistema (por ejemplo, controladores o sensores), incluyendo su contenido funcional y la interfaz a través de la cual interactúan entre sí, todo ello de una forma abstracta, independiente de la plataforma de ejecución. Los contenedores de máquina virtual (VMLC, virtual-machine-level containers ). Denen los componentes que se ejecutan directamente sobre la plataforma de ejecución, la máquina virtual de ASSERT (ASSERT VM). Ejemplos de VMLC son las tareas periódicas y esporádicas y los objetos de datos compartidos. La gura 1 muestra un esquema general del proceso de desarrollo establecido en ASSERT. En esta gura se resalta la separación entre las diferentes vistas del sistema (funcional, interfaz, concurrencia... ) y se distinguen a ambos lados de la línea vertical, la parte visible para los diseñadores del sistema (a la izquierda), de la parte que queda oculta tras un proceso de transformaciones automáticas. El proceso de desarrollo de ASSERT permite que los ingenieros encargados del desarrollo del sistema se concentren en el aspecto funcional del mismo realizando su trabajo mediante los modelos de mayor nivel de abstracción (APLC). Dichos elementos no son directamente ejecutables sobre la máquina virtual. Es el empleo de transformaciones automáticas lo que permite convertir el modelo anterior en otro semánticamente equivalente pero formado ya exclusivamente por VMLC. El último paso consiste en emplear una serie de patrones predenidos para generar a partir de los VMLC el código fuente que se ejecutará nalmente sobre la máquina virtual de ASSERT. Una característica importante de este esquema es que la semántica de todos los tipos de componentes se basa en un metamodelo único, RCM. Esta base formal permite la transformación de unos modelos en otros preservando sus propiedades, incluidas las temporales. La gura 2 representa cómo el contenido de un APLC es empotrado en la estructura ja de uno o más VMLC que ya incluye los mecanismos necesarios para preservar las propiedades denidas a alto nivel. 3. La máquina virtual de ASSERT La máquina virtual de ASSERT es la plataforma de ejecución para los elementos nales del sistema de tiempo real (VMLC). Para poder garantizar las propiedades de tiempo real especicadas en los niveles de abstracción superiores, sólo acepta entidades que son legales desde el punto de vista del modelo de cómputo elegido, RCM. Estas entidades, que toman la forma de componentes de máquina virtual, responden a los siguientes arquetipos: Componentes activos: periódicos o esporádicos, ejecutan una acción cada cierto tiempo

82 74 XI Jornadas de Tiempo Real (JTR2008) Vista funcional Código funcional Vista concurrencia Vista interfaz WCET Análisis viabilidad & sensibilidad Información de despliegue Código final Plataforma de ejecución Figura 1: Modelo de componentes y vistas de ASSERT APLC VMLC Componentes pasivos: contienen objetos utilizados únicamente por un componente activo. Por tanto, no necesitan ningún tipo de protección contra el acceso concurrente. TAREA MÓDULO DE CONTROL Concurrencia Interfaces MÓDULO FUNCIONAL Funcional Figura 2: Esquema de transformaciones de AS- SERT (en el caso de los componentes periódicos) o cada vez que ocurre un suceso determinado (en el caso de los esporádicos). Componentes protegidos: contienen objetos compartidos por dos o más componentes activos, protegidos contra el acceso concurrente para garantizar su integridad, proporcionando exclusión mutua en sus accesos. Los arquetipos de componentes responden al modelo de concurrencia de Ravenscar (RCM). La máquina virtual de ASSERT soporta la ejecución de ejemplares de estos arquetipos con distinto contenido funcional y atributos temporales especícos (período, tiempo de ejecución máximo, plazo de respuesta, etc.). Además, incluye mecanismos de supervisión que permiten detectar errores de temporización durante la ejecución del sistema y ejecutar acciones de corrección siempre que sea posible. Estos mecanismos permiten: Asegurar la ejecución periódica de los componentes periódicos. Vigilar el tiempo de respuesta de cada acción, detectando inmediatamente el incumpliento del plazo especicado. Medir el tiempo de ejecución en el procesador de cada acción. Atribuir una cuota de tiempo de ejecución a cada componente, y detectar su violación por exceso.

83 3. Sistemas Operativos y Middleware 75 Asegurar un tiempo mínimo entre activaciones para los componentes esporádicos. Utilizar el tiempo en el que el procesador permanece ocioso para corregir situaciones de sobrecarga sin comprometer el cumplimiento de los plazos por parte de las demás tareas. Con estas características, la máquina virtual de ASSERT no sólo permite ejecutar aplicaciones conforme al modelo cumputacional de Ravenscar, sino que además tiene la capacidad de implementar un mecanismo de particionado temporal que permite garantizar que la correcta ejecución de una aplicación no se va a ver afectada por problemas derivados de un comportamiento defectuoso por parte de otra aplicación, a pesar de que ambas sean ejecutadas por el mismo procesador. Además, la máquina virtual de ASSERT permite la ejecución distribuida de partes de un sistema en distintos computadores unidos por una red local con un comportamiento temporal determinista. La elección del modelo de cómputo de Ravenscar sugiere la utilización del lenguaje Ada para la implementación de la máquina virtual. El estándar actual del lenguaje [8] soporta todos los elementos del perl de Ravenscar, junto con una serie de mecanismos de temporización y supervisión que permiten satisfacer los requisitos anteriores y proporcionar aislamiento temporal. Por tanto, la máquina virtual de ASSERT se ha construido en torno a un compilador de Ada 2005 (GNAT 1 ), junto con un núcleo de tiempo real especializado para el perl de Ravenscar ORK [10]. La versión actual se ejecuta sobre un computador basado en el procesador LEON2, que es una versión resistente a la radiación de la arquitectura SPARC8 2. La arquitectura de la máquina virtual contiene, además, otros componentes destinados a soportar la distribución (gura 3): una pila de protocolos para el servicio de transferencia de mensajes (MTS) de SOIS 3 [3], para una red local Spacewire, y una capa de software de intermediación basada en PolyORB-HI [6], un middleware especializado para sistemas de alta integridad y compatible con el perl de Ravenscar. 4. Experiencia industrial En el marco del proyecto ASSERT se han denido tres proyectos piloto dirigidos por entidades públicas y privadas de gran peso en el sector aeroespacial, incluyendo a la Agencia Espacial Europea Spacecraft On Board Interface Services Figura 4: Computador LEON2 (ESA) con el n de validar las tecnologías y métodos desarrollados en el proyecto, incluyendo la máquina virtual descrita en este artículo. Dichos proyectos son: HRI: orientado al dominio de los satélites de larga duración sin posibilidad de mantenimiento; MPC: centrado en los sistemas distribuidos basados en otillas de satélites; MA3S: centrado en sistemas críticos con respecto a la misión embarcados en vehículos espaciales sin tripulación. La versión denitiva de la máquina virtual de AS- SERT se ha empleado con éxito en los proyectos mencionados, siendo especialmente relevante el proyecto HRI, en cuya presentación nal se ha mostrado un sistema consistente en un satélite con dos nodos (LEON2) comunicados por un bus Spacewire, que contiene un Imager y un Altimeter que permiten obtener datos topográcos para estudios posteriores. En dicho sistema se ha utilizado software real incluyendo la gestión de telemetría y telecomandos (datos enviados y recibidos por el satélite respectivamente), el control de guiado y navegación, más todo aquello necesario para la operación real de un satélite. 5. Conclusiones y trabajo futuro La aproximación al desarrollo de software de tiempo real descrita en el artículo se basa en los siguientes principios: Separación entre los aspectos funcionales del sistema de aquellos relacionados con la sincronización y temporización (modelo de concurrencia y tiempo real); Denición de un modelo formal de concurrencia, el modelo de cómputo de Ravenscar, en el que se basan los modelos del sistema con distintos niveles de abstracción;

84 76 XI Jornadas de Tiempo Real (JTR2008) Application code (generated) Application code (generated) ASSERT middleware ASSERT middleware SOIS MTS comms services SOIS MTS comms services Comms drivers ASSERT RT kernel Comms drivers ASSERT RT kernel LEON 2 hardware LEON 2 hardware SpaceWire communication channel Figura 3: Arquitectura de la máquina virtual de ASSERT

85 3. Sistemas Operativos y Middleware 77 Desarrollo del software basado en modelos y en transformaciones entre modelos que preservan las propiedades temporales del sistema; Ejecución del software sobre una plataforma especializada, la máquina virtual de AS- SERT, que garantiza la satisfacción de los requisitos temporales del sistema por construccion. Aislamiento entre aplicaciones que permite evitar que una aplicación defectuosa comprometa el cumplimiento de los plazos establecidos para las demás. Esta aproximación ha mostrado su viabilidad con la construcción de un sistema con hardware y software real coordinado por la Agencia Espacial Europea. En el futuro próximo se prevé mejorar la estructura y la implementación de la máquina virtual con los siguientes desarrollos: Particionado espacial, que permita el aislamiento en memoria de los datos referidos a una aplicación, proporcionando protección frente a accesos erróneos provocados por una aplicación ajena defectuosa. Mejora del aislamiento temporal, utilizando el nuevo hardware disponible en versiones futuras de la plataforma hardware LEON. Agradecimientos Este trabajo ha sido nanciado parcialmente por el Plan Nacional de I+D+I (proyecto TIN C03-01) y por el 7 o Programa Marco de la Unión Europea (proyecto ASSERT, IST )). Los autores desean agradecer a José Redondo y Jorge López su contribución a la implementación de la máquina virtual. Las ideas fundamentales sobre el modelo de componentes y transformaciones se deben al equipo dirigido por Tullio Vardanega en la Universidad de Padua. El trabajo sobre el middleware PolyORB-HI se ha desarrollado en la ENST de Paris bajo la dirección de Laurent Pautet y Jérôme Hugues. El software para MTS que forma parte de la máquina virtual ha sido desarrollado en SciSys por Stuart Fowell y Marek Prochazka. Referencias [1] Matteo Bordin and Tullio Vardanega. Correcteness by construction for high-integrity realtime systems: A metamodel-driven approach. In Nabil Abdennadher and Fabrice Kordon, editors, 12th International Conference on Reliable Software Technologies Ada-Europe 2007, number 4498 in LNCS, pages Springer-Verlag, [2] Alan Burns, Brian Dobbing, and George Romanski. The Ravenscar tasking prole for high integrity real-time programs. In Lars Asplund, editor, Reliable Software Technologies Ada-Europe'98, number 1411 in LNCS, pages Springer-Verlag, [3] Consultative Committee for Space Data Standards (CCSDS). CCSDS Spacecraft On-board Interface Services Green Book CCSDS G-0.4, December Draft. [4] ECSS. ECSS-E-40 Part 1B: Space engineering - Software - Part 1: Principles and requirements, November Available from ESA. [5] ECSS. ECSS-Q-80B Space Product Assurance - Software Product Assurance, Available from ESA. [6] Jérôme Hugues, Bechir Zalila, and Laurent Pautet. Middleware and tool suite for high integrity systems. In Proceedings of RTSS- WiP'06, pages 14, Rio de Janeiro, Brazil, December IEEE. [7] OMG. MDA Guide Version 1.0.1, Available at [8] S. T. Taft, R. A. Du, R. L. Brukardt, E. Plöedereder, and P. Leroy, editors. Ada 2005 Reference Manual. Language and Standard Libraries. International Standard ISO/IEC 8652/1995(E) with Technical Corrigendum 1 and Amendment 1. Number 4348 in Lecture Notes in Computer Science. Springer-Verlag, [9] Juan Zamorano, Alejandro Alonso, and Juan Antonio de la Puente. Building safety critical real-time systems with reusable cyclic executives. Control Engineering Practice, 5(7), July [10] Juan Zamorano and José F. Ruiz. GNAT/ORK: An open cross-development environment for embedded Ravenscar-Ada software. In Eduardo F. Camacho, Luis Basañez, and Juan Antonio de la Puente, editors, Proceedings of the 15th IFAC World Congress. Elsevier Press, 2003.

86 78 XI Jornadas de Tiempo Real (JTR2008) Middleware based on XML technologies for achieving true interoperability between PLC programming tools E. Estevez, M. Marcos, F. Perez D. Orive Automatic Control and Systems Engineering Department, University of the Basque Country Bilbao, Spain, ( Abstract: Industrial Process Measurement and Control Systems are used in most of the industrial sectors to achieve production improvement, process optimization and time and cost reduction. Integration, reuse, flexibility and optimization are demanded to adapt to a rapidly changing and competitive market. In fact, standardization is a key goal to achieve these goals. The international standardization efforts have lead to the definition of the IEC standard. Part 3 of this standard defines a software model for defining automation projects as well as 5 programming languages. Nowadays, a major part of Programmable Logic Controllers (PLC) vendors follows this standard, although each programming tool adds particularities and stores the automation project in different manner. But, although they may use the same software model and the same programming languages, source code reuse is not available. This work presents an infrastructure that allows transferring source code from one PLC programming tool to any other transparently to the users. The proposal consists of a textual expression of the software model and the programming languages, as well as the mechanisms, based on XML technologies, to achieve tool interoperability. 1. INTRODUCTION Nowadays most of the industrial sectors use Programmable Logic Controllers (PLCs) to achieve the control of their productive systems. In the last years, technological advances in these controllers allow the production improvement, process optimization and time and cost reduction. On the other hand, for many years, only proprietary programming languages could be used for vendor specific equipment. Although some languages, such as ladder diagram or instruction list were very widespread, their implementation used to be rather different. It was obvious the need of standardization in the field, covering from the hardware to configuration issues, up to the programming languages. In 1993, the International Electrotechnical Commission (IEC) published the IEC 61131, International Standard for Programmable Controllers (IEC, 2003). The IEC standard deals with the software model and programming languages for Industrial Process Measurement and Control Systems (IPMCS) (Lewis, R.W, 1998), (John, K.H and Tiegelkamp M, 2001). In this sense, it has provoked a movement to Open Systems in this application field. Thus, the so-called Open PLCs that are open architecture controllers that replace a PLC with a computer, have begun to appear in the market. Nowadays, most of the PLC vendors are doing a great effort for becoming IEC standard compliant. In fact, this offers great advantages to the control system engineers, as the programming techniques become vendor independent. Notwithstanding this, the standard does not specify an import/export format but the elements of the software model and the mechanisms to be offered to the user in order to graphically define an application. Thus, every tool uses its own storage format an offers commonly a set of Application Program Interface (API) functions or, alternatively, an import/export option. In this sense, it is impossible to reuse the code programmed in one tool in others. It is necessary to edit the code again. In order to achieve true reusability, interoperability among tools is needed. There are international organizations, such as, PLCopen (1992), a vendor- and product-independent worldwide association, whose mission is to be the leading association resolving topics related to control programming. Its main goal is to support the use of international standards in this field. PLCopen has several technical and promotional committees (TCs). In particular, TC6 for XML has defined an open interface between all different kinds of software tools, which provides the ability to transfer the information that is on the screen to other platforms. The extensible Markup Language (XML) (W3C, 2006a) was selected for defining the interface format and in April 2004, the first XML schema for the graphical languages was released for comments (W3C, 2004). Nevertheless, the PLCopen interface is not universally supported yet. Besides, the proposed interface focuses mainly on transferring what is in the screen and thus, adds graphical information as well as new elements to those used by the standard. Finally, it does not impose an architectural style, assuming that the code being transferred is correct. The goal of the work presented here goes further, as an interoperability middleware is proposed. It consists of a common XML format for representing the IEC

87 3. Sistemas Operativos y Middleware 79 software model and languages and the mechanisms to import/export information from/to every tool. The layout of the paper is as follows: section 2 briefly describes the elements of the IEC software model. Section 3 presents the interoperability middleware that acts as a common road for achieving true interoperability. Finally, section 4 illustrates an example of interoperability between two PLC programming tools. 2. THE IEC SOFTWARE MODEL This section describes the software model proposed by IEC standard in order to identify the architectural style and composition rules that any application IEC compliant must meet. The architectural style is identified in a Component-based fashion. The component-based strategy aims at managing complexity, shortening time-to-market, and reducing maintenance requirements by building systems with existing components. Generally speaking, software architectures are defined as configurations of components and connectors. An architectural style defines the patterns and the semantic constraints on a configuration of components and connectors. As such, a style can define a set or family of systems that share common architectural semantics (Medvidovic, N. and Taylor, R.N., 1997). In (E. Estevez, et al, 2007a) the different components and connectors for defining software model of IEC are identified. Two types of components can be distinguished: those that do not encapsulate code, Configurations, Resources and Tasks, and the Program Organization Unit (POU) that encapsulates code. These latter can be Programs, Function Blocks and Functions. communication between programs residing in different resources of the same configuration. VAR_GLOBAL at resource level identifies the communication between programs of the same resource. Finally, VAR_LOCAL identifies the communication among nested POUs. In Fig. 1 the IEC software model architectural style is illustrated using a meta-model expression. Every component and connector has its own characteristics as defined in (E. Estevez, et al, 2007a). The architectural style needs to be combined with a set of composition rules in order to assure a correct software architecture definition. In Table 1 the identified composition rules are summarized TABLE 1. COMPOSITION RULES FOR IEC SW MODEL Id Rule The type of Global Variable must be elementary or defined previously by the programmer The value of Global Variables must be in concordance with its type The type of POU formal parameters must be elementary or previous defined previously by the programmer The value of POU instance parameters must be in concordance with its type An Access variable must give permission to previously defined variable Resources of the same configuration must be downloaded to the same processor Resource POU instances only could be organized by tasks of the same resource 3. INTEROPERABILITY MIDDLEWARE The proposed interoperability middleware is formed by two main modules. The first one consists of representing the IEC software model in a standard and generic format. The latter is related to the integration mechanisms that allow connection the tools through the middleware, achieving code exchange. The proposed middleware is based on XML technologies. In particular XML schema (W3C, 2004) jointly with schematron rules (Rick, J., 2006) has been used for defining a common and generic format of the software model. This format takes into account both, the architectural style and the composition rules. Integration techniques involve related XML technologies, such as XML stylesheets and Document Object Model (DOM) (W3C, 2005) or Simple API of XML (SAX, 2004) jointly with a programming language. Fig. 2 illustrates the general scenario of the proposed middleware for achieving true interoperability between any PLC programming tools. Fig. 1. Architectural style of IEC software model On the other hand, in this model, the variables act as connectors. They represent the communication between software components. In fact, their visibility identifies the components that are involved in the communication: VAR_ACCESS identifies the communications between programs residing in different configurations. VAR_GLOBAL at Configuration level identifies Fig. 2. General Scenario of interoperability of PLC programming tools

88 80 XI Jornadas de Tiempo Real (JTR2008) Fig. 3 Architectural style of IEC software model in sweng markup Language The following sub-sections define both modules in more detail: a generic representation format of the software model and the integration mechanisms Standard format for Reusable Code As commented above, the generic format proposed for representing the IEC software model, uses the W3C schema and schematron rules. In particular, each model element is represented as XML schema element. The architectural style is performed making use of the choice, sequence and multiplicity mechanisms of W3C schema (E. Estevez, et al, 2007a,b). On the other hand, the composition rules are performed by means of the key/keyref constraints of the W3C schema (Van der Vlist, E., 2002) and also by means of schematron rules. Fig. 3 illustrates a general overview of the IEC standard software model. It illustrates the architectural style as well as the composition rule number five of Table 1 that is implemented by means of the key/keyref mechanism. The complete definition of this module is available in (swengml, 2007). This interface could be very useful in order to test how much the programming tool is IEC standard compliant Integration Techniques In this section, the different integration techniques that allow transferring information between programming tools and the interoperability middleware are analyzed. The integration techniques depend on the export/import format of the PLC programming tool, as well as on the API the tool offers. XML technologies can be very useful for implementing tool integration. Three tools categories can be distinguished: a) Tools that import/export information in XML format, such as Multiprog TM from KW software (KW software, 2006) or Unity Pro TM from Schneider (Unity Pro TM, 2007). b) Tools that import/export information in structured text format e.g. CoDeSys TM from 3S (Smart Software Solutions) (CoDeSys, 2006)(Automation Alliance, 2006). c) Tools that import/export information in any other format, e.g. ISaGRAF TM from ICS Triplex (ICS triplex, ISaGRAF, 2006) and Simatic Administrator TM from Siemens (Administrador Simatic, 2006). Following sub-sections describe the selected integration technique for each type of PLC programming tool. Tools with import/export option in XML format. If the tool provides XML interface or it allows exporting/importing projects to/from a XML file, the integration is practically direct. In this case, it is necessary to develop a XML stylesheet (W3C, 2006b). This XML technology can be used for processing an input XML file coming from the tool, filtering information and transforming it giving as output the reusable code expressed in the format proposed in this work (tool2standard.xsl). XSLT technology offers two type of templates that can be used to define the processing of the input file. The match template contains the processing to be applied to a particular XML element. This processing could be organized by means of the so-called name templates (Tidwell, D., 2001).The same XML technology is also used for filtering and transforming the reusable code expressed in standard form to the format of the target PLC programming tool (standard2tool.xsl). In the first case the XSL match and name templates are tool dependent. The match templates of the second case are known; as they correspond to the elements of the IEC standard. Notwithstanding this, the algorithms they contain are again tool dependant. In Table 2 the necessary templates for transforming the reusable code in standard format to tool format are summarized. Tools that import/export structured text. Although the number of tools that allows exporting/importing information in XML format is increasing, currently it is not the common. This sub-section describes the integration techniques for those tools that allow exporting/importing code to/from

89 3. Sistemas Operativos y Middleware 81 structured text format. In this case, it is necessary to know the file structure. match= sw:sweng TABLE 2. LIST OF STANDARD2TOOL.XSL TEMPLATES Templates <xsl:template match="sw:datatypes"> <xsl:template match="sw:pous"> <xsl:template match="*" mode="pou"> <xsl:template match="sw:interface"> <xsl:template match="sw:variables"> <xsl:template match="sw:body"> <xsl:template match="sw:fbd"> <xsl:template match="sw:configuration"> <xsl:template match="sw:resource"> <xsl:template name="globalvars"> <xsl:template match="sw:task"> <xsl:template match="sw:proginst"> Characteristics Main template, it guides de transformation. Generates the derived data types Organizes the POU structure Selects its type of POU A set of templates for generating the POU interface and functionality expressed in any of 5 languages of IEC A set of templates for generating the automation project itself There are different technologies that allow transforming structured text into XML, such as the Chaperon project (Stephan M, 2000). It can also be achieved by developing an application having as input file the structured text file and making use of DOM or SAX methods for generating an XML file. Finally, a XML stylesheet will be necessary for transforming this XML file to the standard format (see Fig. 4). provided by the tool API, as well as methods and functions offered by SAX or DOM. Thus, the integration techniques for capturing information from the tool and to express it in a generic format consist of an application that generates an initial XML file and an XML stylesheet for filtering and transforming the information to the standard format. The application programming language depends on the tool API and it also needs: The functions provided by the tool API for getting information from the automation project. The functions or methods provided by SAX or DOM. They are very useful for generating an initial XML file. In the same way, the integration techniques for setting information coming from the generic XML file to the tool consist of an XML stylesheet that adds the tool particularities to the XML file. Furthermore, it is necessary an application that reads this XML file and sets all information in the storage format of the target tool. In this sense, the programming language of this application also depends on the tool API and it makes use of: SAX or DOM methods for reading and manipulating the input XML file Tool API functions for setting information into target tool storage format. Fig. 4. General scenario for importing/exporting reusable code in structured text format If the tool needs to import source code, XML stylesheets can be used to transform the code expressed in the generic XML format to a text file that follows the structure that the tool expects. Besides the templates of Table 2, it is also necessary to indicate to the XSLT processor the extension of the resulting text file. Fig. 4 illustrates the general scenario for exporting and importing reusable code. Tools that import/export in any other format. Currently, this is the more common case, in which tools neither have an XML interface nor export/import code to/from structured text. In this case the integration technique consists of developing an application that makes use of functions Fig. 5. General scenario for getting/setting information via tool API Fig. 6 illustrates the general scenario for getting from the tool information via its API and transforming it to the generic format. In the same way this figure also illustrates the mechanisms for setting the code expressed in the standard format (ReusableCode.xml) to the target tool storage format. 4. CASE STUDY This section illustrates the proposed interoperability framework as applied to transfer one POU programmed in CoDeSys TM from 3S to Multiprog TM from KW software. The first tool allows exporting/importing information to/from structured text format. The second follows the interface proposed by PLCopen TC6 XML. The code to be transferred

90 82 XI Jornadas de Tiempo Real (JTR2008) is a very simple example of a function (inrange) written in Function Block Diagram language. This POU checks if the content of a variable is between a minimum and a maximum. To do this, three standard functions have been used (LT, GT and AND). Fig. 6 illustrates this POU programmed in CoDesys TM PLC programming tool. The first step, to achieve the interoperability is to export the code in text format. This definition contains the POU interface, formed by three input formal parameters, and the body expressed by reserved commands which represent the functionality, originally in FBD, in text format (note that this is a particularity of the CoDeSys TM tool). In the second step (see Fig. 5 ) an application programmed in Java using DOM translates the textual file into an XML file. An XML stylesheet is applied to this file obtaining the ReusableCode.xml (illustrated in Fig. 6). Fig. 6. POU example programmed in CoDeSys TM Fig. 7. Project expressed in generic format (ReusableCode.xml) Fig. 8. Example in PLCopen TC6 XML format

91 3. Sistemas Operativos y Middleware 83 The third step consists of transforming the generic format XML file to the PLCopen TC6 XML grammar. This transformation is done by means of a XML stylesheet. The resulting file of the transformation is illustrated in Fig. 8. This file contains graphical information, sometimes as attributes of a XML element (e.g. a block has the width and height attributes), and sometimes as a new element of the schema (e.g. a child element of a block is its position in the screen). Finally, the Multiprog TM tool can import this file. The resulting project is illustrated in Fig. 9. Fig. 9. InRange POU reused in Multiprog TM KW Thus, by means of proposed integration middleware the inrange POU initially programmed in CoDeSys TM can be reused in Multiprog TM PLC programming tool. It is important to remark that the transformations, which are task of the middleware, are transparent to the user. 5. CONCLUSIONS This paper has presented a formal approach that allows transferring source code among different vendors of PLC programming tools. Therefore, true interoperability between tools is achieved by means of the proposed middleware. The potentiality of XML technologies have been used for developing the interoperability middleware. In particular XML schema jointly with schematron rules form the core of the proposed middleware. They have been used to express in XML the IEC software model, taking into account the architectural style and composition rules. The middleware also offers mechanisms for tool integration making use of related XML technologies, such as XSLT, SAX and DOM. These techniques allows both transforming any tool model to a generic XML format, and obtaining a vendor understandable information from this generic XML format. 7. ACKNOWLEDGEMENTS This work was financed by the MCYT&FEDER under DPI and DIPE REFERENCES Administrador Simatic, available at: Automation Alliance, 2006.available at: CoDeSys, CoDeSys of Smart Software Solutions, available at: E. Estévez, M. Marcos and D. Orive (2007a). Automatic Generation of PLC Automation Projects from Component-Based Models. The International Journal of Advanced Manufacturing Technology. Springer London. [Online] Available at: 6/ E. Estévez, M. Marcos, D. Orive, E. Irisarri and F. Lopez (2007b). XML based Visualization of the IEC Graphical Languages. Proc. of the 5th International Conference on Industrial Informatics, pp: (INDIN 2007). Vienna, Austria. IEC (2003). International Electrotechnical Commision. IEC International Standard IEC Programmable Controllers. Par3: Programming Languages. ICS triplex, ISaGRAF, available at: John, K.H and Tiegelkamp M. (2001). Programming Industrial Automation Systems. Springer. KW software, available at: Lewis, R.W. (1998). Programming Industrial Control Systems using IEC IEE Control Engineering Series. Medvidovic, N. and Taylor, R.N. (1997). Exploiting architectural style to develop a family of applications. IEE Proc. Software Eng. 144 (5 6), pp: PLCopen (1992), Web-site: Rick J. (2006). Resource Directory (RDDL) for Schematron 1.5. Web Site: SAX (2004). Simple API of XML (SAX). Web Site: Stephan M. (2000), Chaperon Project. Web Site: swengml, (2007). IEC Markup Language. Available at: Tidwell, D.(2001). XSLT, Ed. O REILLY. Unity Pro TM, 2007.Unity Pro TM from Schneider is available at: Van der Vlist, E. (2002), XML Schema. Ed. O REILLY. W3C (2004). XML Schema Part 0: Primer (Second Edition), W3C REC-xmlschema Available at: W3C (2005). Document Objecto Model (DOM), Web Site W3C (2006a). extenslble Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation. Available at: / W3C (2006b). Extensible Stylesheet Language (XSL) Version 1.1, W3C Proposed Recommendation PR-xsl

92 84 XI Jornadas de Tiempo Real (JTR2008) Real-Time Distribution Middleware from the Ada Perspective 1 Héctor Pérez, J. Javier Gutiérrez, Daniel Sangorrín, and Michael González Harbour Computers and Real-Time Group Universidad de Cantabria, Santander, SPAIN {perezh, gutierjj, daniel.sangorrin, mgh}@unican.es Abstract. Standards for distribution middleware sometimes impose restrictions and often allow the implementations to decide on aspects that are fundamental to the correct and efficient behaviour of the applications using them, especially when these applications have real-time requirements. This work presents a study of two standard approaches for distribution middleware and the problems associated with their implementations in real-time systems. Moreover, the paper considers the problem of integration of the distribution middleware with a new generation of scheduling mechanisms based on contracts. The standards considered are RT-CORBA, and the Distributed Systems Annex (DSA) of the Ada programming language. Key words: distribution middleware, real-time, real-time communications, RT- CORBA, Ada DSA, performance. 1 Introduction 2 The concept of a distributed application is not new; it has existed since two computers were first connected. However, the programming techniques of these systems have evolved greatly and they have become especially relevant in the last decade. Traditionally, message-passing mechanisms were used for communication among the parts of a distributed application where the communications among the application parts were done explicitly by the programmer. Since then, new object distribution techniques have evolved, for instance using Remote Procedure Calls (RPCs) that allow operations to be transparently used regardless of whether the functionality is offered in the local processor or in a remote one. The object distribution paradigm is probably the most relevant in current industrial applications, and an important example is the CORBA standard [12]. This standard also includes other distribution techniques of highest level such as CCM (CORBA Component Model), or DDS (Data Distribution Service), but their degree of 1. This work has been submitted to the 13th International Conference on Reliable Software Technologies, Ada-Europe This work has been funded in part by the Spanish Ministry of Science and Technology under grant number TIC C03-02 (THREAD), and by the IST Programme of the European Commission under project FP6/2005/IST/ (FRESCOR). This work reflects only the author s views; the EU is not liable for any use that may be made of the information contained herein.

93 3. Sistemas Operativos y Middleware 85 acceptance in industry is still lower. CORBA provides an object distribution mechanism based on a language for the specification of interfaces (IDL, Interface Definition Language) that enables the use of different programming languages in the development of an application. In addition to distribution standards, there are programming languages that allow the development of distributed applications. This is the case of Java (a de facto standard) with its specification for distributed systems, Java RMI (Java Remote Method Invocation) [17], based on the distribution of objects. Also, the Ada standard allows distribution through its DSA (Distributed Systems Annex, Annex E) [19], which supports both distribution of objects and RPCs. This work will focus on the real-time perspectives of distribution with the CORBA and Ada standards. It does not consider Java RMI as real-time aspects have not yet been fully addressed in commercial implementations. RT-CORBA [13] offers the CORBA specification for real-time systems, and although Ada s DSA is not specifically designed for real-time systems, there are works that demonstrate that it is possible to write real-time implementations within the standard [14][7][8]. The aims of this work are to make a comparative study of the functionality offered by these standards for implementing distributed real-time applications, an analysis of some of their implementations from the viewpoint of management of calls to remote resources, and an experimental evaluation of the response times that can be obtained in remote calls in order to get an idea of the overheads introduced. Sometimes, the real-time distribution middleware is developed over operating systems and network protocols that are not real time. In this work some of the middleware implementations are evaluated over a real-time platform. The evolving complexity of real-time systems has lead to the need for using more sophisticated scheduling techniques, capable of simultaneously satisfying multiple types of requirements such as hard real-time guarantees and quality of service requirements, in the same system. To better handle the complexity of these systems, instead of asking the application to interact directly with the scheduling policies, scheduling services of a higher level of abstraction are being designed, usually based on the concept of resource reservations [2]. The FRESCOR European Union project [3] in which we are participating is aimed at investigating these aspects by creating a contract-based scheduling framework. In this framework the application specifies its requirements for resource usage through service contracts that are negotiated with the system to obtain a resource reservation that optimizes the quality of service, while providing guarantees of a minimum level of resource usage. The FRESCOR framework is aimed at managing multiple resources such as processors, communication networks, memory, disk bandwidth, and energy. In such an environment it is necessary to study the relationship between the distribution middleware and the scheduling framework. In [8], some initial ideas were given about the integration of middleware and advanced scheduling services, and in this paper we extend those ideas to address the problem of handling distributed transactions. Our research group has been working for several years with real-time distributed systems programmed in Ada. We believe that although the DSA is not very popular, it

94 86 XI Jornadas de Tiempo Real (JTR2008) enables straightforward programming of distributed systems, and it makes it possible to incorporate into the real-time middleware concepts such as distributed transactions [8] and flexible scheduling [2][3]. Thus, another objective of this work is to establish the basis for incorporating the experience acquired in systems programmed in Ada into the world of RT-CORBA. The document is organized as follows. Section 2 is dedicated to the presentation of the basic characteristics of the distribution middleware based on RT-CORBA and Ada s DSA, and their implementations. Section 3 analyses in detail the aspects of scheduling, distribution mechanisms, and management of the remote calls proposed in the two standards and their implementations. The evaluation and comparison of the response times of calls to remote operations for these implementations is dealt with in Section 4. Section 5 proposes the integration of the distribution middleware with the scheduling framework for flexible scheduling. Finally, Section 6 draws the conclusions and considers future work. 2 Real-Time Distribution Middleware This section will describe the scheduling models of RT-CORBA and of the DSA for the execution of remote calls and will discuss how the distributed transaction model can be supported. Furthermore, the different implementations to be analysed, all of which are open source code, will be briefly introduced. A distributed transaction is defined as a part of an application consisting of multiple threads executing code in multiple processing nodes, and exchanging messages with information and events through one or more communication networks. In a transaction, events arriving at the system trigger the execution of activities, which can be either task jobs in the processors or messages in the networks. These activities, in turn, may generate additional events that trigger other activities, and this gives way to a chain of events and activities, possibly with end-to-end timing requirements [8]. This model is traditionally used for analysing the response time in real-time distributed applications. The main characteristics of the architecture proposed by RT-CORBA in its specification [12] with respect to scheduling are the following: Use of threads as scheduling entities, for which an RT-CORBA priority can be applied and for which there are functions for conversion to the native priorities of the system on which they run. Use of two models for the specification of the priority of remote calls (following the Client-Server model): Client_Propagated (the invocation is executed in the remote node at the priority of the client, which is transmitted with the request message), and Server_Declared (all the requests to a particular object are executed at a priority preset in the server). In addition, it is possible for the user to define priority transformations that modify the priority associated with the server. This is done with two functions called inbound (which transforms the priority before running the server's code) and outbound (which transforms the priority with which the server makes calls to other remote services).

95 3. Sistemas Operativos y Middleware 87 Definition of Threadpools as mechanisms for managing remote requests. The threads in the pool may be preallocated, or can be created dynamically. There may be several groups of threadpools, each group using a specific priority band. Definition of Priority-Banded Connections. This mechanism is proposed for reducing priority inversions when a transport protocol without priorities is used. Ada s DSA does not have any mechanism for transmission of priorities and so its implementation is left up to the criterion of the implementation. What is specified is that it must provide mechanisms for executing concurrent remote calls, as well as mechanisms for waiting until the return of the remote call. The communication among active partitions is carried out in a standard way using the Partition Communication Subsystem (PCS). The concurrency and the real-time mechanisms are supported by the language itself with tasks, protected types and the services specified in Annex D. In [4], a mechanism for handling the transmission of priorities in the DSA is proposed. This mechanism is in principle more powerful than that of RT-CORBA, as it allows total freedom in the assignment of priorities both in the processors and in the communication networks used. The specification of RT-CORBA incorporates a chapter dedicated to dynamic scheduling, which basically introduces two concepts: The possibility of introducing other scheduling policies in addition to the fixed priority policy, such as, EDF (Earliest Deadline First), LLF (Least Laxity First), and MAU (Maximize Accrued Utility). The scheduling parameters are defined as a container that can contain more than one simple value, and can be changed by the application dynamically. The Distributable Thread that allows end-to-end scheduling and the identification of Scheduling Segments each one of which can be run on a processor. This concept is similar to the distributed transaction presented in [8]. Ada included in its latest revision the scheduling policies EDF and Round Robin as part of its Real-Rime Systems Annex (Annex D). Nevertheless, it does not contemplate the existence of distributed transactions. Neither RT-CORBA nor Ada s DSA consider the possibility of passing scheduling parameters to the communications networks. This work analyses and assesses the following implementations of RT-CORBA and the DSA: TAO [18] is an open source implementation of RT-CORBA that has been evolving for several years. The applications are programmed in C++ and the version we have used (1.5) runs on Linux and TCP/IP. It is offered as an implementation of the complete specification. PolyORB [15][20] is presented as a schizophrenic middleware that can support distribution with different personalities such as CORBA, RT-CORBA, or DSA. It is distributed with the GNAT compiler [1] and in principle it is envisaged for applications programmed in Ada. The version used (2007) supports CORBA and some basic notions of RT-CORBA (priorities and their propagation), and allows

96 88 XI Jornadas de Tiempo Real (JTR2008) distribution through the DSA although it does not incorporate the scheduling mechanisms. The execution platform is Linux and TCP/IP. GLADE [14] is the original implementation of the DSA offered by GNAT [1] to support the development of distributed applications with real-time requirements. The scheduling is done through fixed priorities and implements two policies for distribution of priorities in the style of RT-CORBA (Client Propagated and Server Declared). The 2007 version is used, and once again the execution platform is Linux and TCP/IP. RT-GLADE is a modification of GLADE that optimizes its real-time behaviour. There are two versions: in the first one [7] free assignment of priorities in remote calls is permitted (both in the processors and in the communication networks). The second version [8] proposes a way of incorporating distributed transactions into the DSA and giving support to different scheduling policies in a distributed system. The execution platform is MaRTE OS [9] and the network protocol is RT- EP [10]. This communication protocol is based on token passing in a logical ring over standard Ethernet, and it supports three different scheduling policies: fixed priorities, sporadic servers, and resource reservations through contracts [2][3]. 3 Analysis of Distribution Middleware Implementations The objective of this section is to analyse the mechanisms for management of remote calls used by the implementations of RT-CORBA or DSA to support their respective specifications. It also discusses about the properties of the solutions adopted and whether they can be improved, both in the standards and in their implementations Implementations of RT-CORBA and DSA From the viewpoint of management of remote calls, TAO defines several elements that can be configured [16]: Number of ORBs. The ORB is the management unit of the calls to a service. There may be several or only one, given that each ORB can accept requests from different parts of the application. The strategy of the concurrency server. Two models are defined: The reactive one, in which a thread is executed to provide service to multiple connections; and a thread-per-connection, in which for each new connection the ORB creates a thread to serve it. The threadpools. Two types of thread groups are defined with two different behaviours. In the ORB-per-Thread model each thread has an associated ORB that accepts and processes the services requested. In the Leader/Followers model the user can create several threads and each ORB will select them in turns so they await and attend new requests arriving from the network. For the management of remote calls, PolyORB defines the following configurable elements [15]:

97 3. Sistemas Operativos y Middleware 89 ORB tasking policies. Four policies are defined: - No_Tasking: the ORB does not create threads and uses the environment task to process the jobs - Thread_Pool: a set of threads is created at start-up time; this set can grow up to an absolute maximum, and unused threads are removed from it if its size exceeds a configurable intermediate value. - Thread_per_Session: a thread is created for each session that is opened - Thread_per_Request: a thread is created for each request that arrives and is destroyed when the job is done Configuration of the tasking runtimes. It is possible to choose among a Ravenscarcompliant, no tasking, or full tasking runtime system. ORB control policies. Four policies are defined that affect the internal behaviour of the middleware: - No Tasking: a loop monitors I/O operations and processes the jobs - Workers: all the threads are equal and they monitor the I/O operations alternatively - Half Sync/Half Async: one thread monitors the I/O operations and adds the requests to a queue, and the other threads process them - Leader/Followers: similar to TAO, the threads take turns in awaiting and processing requests The implementation of the DSA carried out in GLADE defines a group of threads to process the requests with similar parameters to those of PolyORB in terms of the number of threads (minimum number of threads created at start-up time, stable value and absolute maximum), and uses another two intermediate threads for the requests; one awaits the arrival of requests from the network, and the other one processes these requests and selects one of the threads of the group to finally process the job. The modifications made to GLADE to obtain the first version of RT-GLADE eliminated one of the intermediate threads, so that there was a thread waiting for requests arriving from the net, which in turn activated one of the threads of the group to carry out the job (similar to the Half Sync/Half Async of PolyORB but without the intermediate queue). In the second version of RT-GLADE, an API was provided to allow an explicit configuration of the threads that execute the jobs, and they are designed to wait directly on the net. This is done through the definition of communication endpoints which handle the association with the remote thread and support the scheduling parameters for the network. These parameters, that can be complex, are associated with the appropriate entity when a distributed transaction is installed and do not need to be transmitted each time the remote service is called. TAO, PolyORB, and GLADE all use the priority assignment policies defined in RT- CORBA. In contrast, in the first version of RT-GLADE [7] free assignment of priorities is allowed for the remote services and for the request and reply messages. This approach enables the use of optimization techniques in the assignment of priorities in distributed systems.

98 90 XI Jornadas de Tiempo Real (JTR2008) On the other hand, in the second version of RT-GLADE [8], the definition of the connection endpoints allows the programming of distributed transactions, which are identified just by specifying a small number at the beginning of the transaction. Moreover, the transaction is executed with the scheduling parameters associated to its threads and messages. This concept is similar to the distributable thread of RT- CORBA, except that this specification never takes the network scheduling into account. TAO implements this part of the dynamic scheduling of RT-CORBA, in which dynamic changing of the scheduling parameters of a scheduling segment is permitted [5]. In this work, we have made a prototype porting of PolyORB to the MaRTE OS [9] real-time operating system and we have adapted it to the RT-EP real-time network protocol [10]. The personality of CORBA (PolyORB-CORBA) allows the use of the control policies of the ORB defined in PolyORB. The DSA personality of PolyORB does not currently allow the definition of any particular control policy. For this personality (PolyORB-DSA), a basic version of the scheduling defined in [8] has been implemented over our real-time platform Discussion This section discusses the analogies and differences found both in the implementations and in the standards themselves, and their suitability for real time, through the objectives that the real-time distribution middleware must pursue. Among these objectives there are the following ones: Allow a schedulability analysis of the complete application. Although the middleware is executed in the processor and it is no more than a user of the networks through clearly separate interfaces, in many cases the timing behaviour of the networks has a strong influence on the overall response times, and therefore the networks should be scheduled with appropriate techniques [6]. The middleware should incorporate the ability to specify the scheduling parameters of the networks through suitable models. RT-GLADE could be used as a reference [8]. Transactions or distributable threads. In agreement with the previous point, the transactions or distributable threads should incorporate all the information about scheduling in the processors and networks, either in the model proposed by RT- CORBA or in the one proposed in RT-GLADE [8]. Control of remote calls. The task models implemented in TAO and PolyORB can be used as a reference, adding an extra case in which there is one thread per kind of request, directly waiting on the net (as in the second version of RT-GLADE). The latter case can be useful in flexible scheduling environments when threads execute under contracts and the cost of negotiating or changing contracts is very high. In the case when there are intermediate threads for managing remote calls (GLADE, RT-GLADE or PolyORB) it is important to control their scheduling parameters. This is also the case of groups of threads in which threads can execute with different parameters each time.

99 3. Sistemas Operativos y Middleware 91 Allow the free assignment of scheduling parameters. This is the approach used in RT-GLADE. In RT-CORBA there is a specification for static real-time systems, and an extension for dynamic real-time systems (see Section 3 in [13]). The specification for static systems imposes restrictions on the assignment of priorities, but these restrictions are removed in the specification for dynamic systems, in which it is possible for the implementation to define scheduling policies. Integration of the distribution middleware itself with more complex scheduling frameworks that allow building flexible real-time systems. Finally, and although this could be considered a subjective criterion, the distribution middleware should pursue the aim of programming simplicity. CORBA is far away from fulfilling this aim. 4 Evaluating Distribution Middleware Implementations The objective of this section is to provide an idea about the overhead introduced by the analysed implementations in a distributed application. A hardware platform consisting of two 800-MHz AMD Duron processors and a 100-Mbps Ethernet, and the following two software platforms have been used: Linux kernel with TCP/IP to evaluate the implementations of TAO, PolyORB with CORBA personality and GLADE. MaRTE OS 1.6.9a with RT-EP to evaluate PolyORB-CORBA, PolyORB-DSA and MaRTE OS 1.4 for RT-GLADE. The first version of RT-GLADE is used, which is operative over the same platform to which PolyORB was ported. The tests will measure the execution time of a remote operation that adds two integers and returns the result. The measurement is carried out from the time when the call is made until the response is returned. This operation will be carried out in two modes: alone and with four other clients carrying out the same operation, but with a lower priority. The objective is not to obtain exhaustive measurements of the platform, but an idea of the performance that can be achieved with the middleware. In all the tests the operation to be evaluated is executed 10,000 times, and the average, maximum, and minimum times are evaluated. Table 1 shows the results of the measurements taken with the Linux platforms, using the middleware configurations that introduce the least overhead. For the case of a single client in TAO the reactive concurrency model with a single thread in the group has been used. In PolyORB the model with full tasking without internal threads has been used for the experiment with one client. For the five-client case both in TAO and in PolyORB a configuration of a group of 5 threads with a Leader/Followers model has been used. In GLADE, a static group of threads equal to the number of clients is defined. The priority specification model for TAO, PolyORB and GLADE was the client propagated one. In order to make the middleware overhead measurements comparable, the temporal cost of using the net is also evaluated. Thus, Table 1 includes the average, maximum and minimum times for the case when a message is sent and a

100 92 XI Jornadas de Tiempo Real (JTR2008) Table 1. Response times in Linux for one and five clients Times for one client (μs.) Times for the highest priority client with five clients (μs.) Avg. Max. Min. Avg. Max. Min. TAO PolyORB-CORBA GLADE Stand-alone network Table 2. Response times in MaRTE OS for one and five clients Times for one client (μs.) Times for the highest priority client with five clients (μs.) Avg. Max. Min. Avg. Max. Min. PolyORB-CORBA PolyORB-DSA RT-GLADE Stand-alone network response is received; the program on the server side answers immediately upon reception. In the results obtained for a client in Linux, it can be observed that GLADE achieves better times than TAO and PolyORB, which demonstrates that it has a lighter code. The maximum times that are obtained for TAO should be noted, because they are remarkably higher than the average times in both cases of one and five clients. The explanation may be that Linux is not designed for hard real-time systems and introduces jitter. The average numbers for one and five clients show large differences in PolyORB and GLADE, while in TAO they are relatively similar. We can conclude that TAO makes a better management of the priorities and the queues on this platform. Table 2 shows the results of the measurements carried out over the three implementations on the MaRTE OS/RT-EP platform. The configuration of PolyORB- CORBA is the same as for Linux. The PolyORB-DSA configuration creates a task explicitly to attend the remote requests. The configuration of the group of threads for RT-GLADE is made equal to the number of clients, that is, five. As for the RT-EP network, the parameter corresponding to the delay between arbitration tokens is set to a value of 150 μs. This value limits the overhead in the processor due to the network. A simple transmission in the network is also evaluated for the same reason as in the case of Linux. From the results obtained in the evaluation on the real-time platform, it can be observed that, firstly, the network protocol has a greater latency and it makes the times of a simple round-trip transmission higher than in Linux; the trade-off is that this is a predictable network with less dispersion among the values of the measurements. Furthermore, the minimum and average times of RT-GLADE for one client are also

101 3. Sistemas Operativos y Middleware 93 greater than those of GLADE over Linux, although the maximum time remains within a bound indicating a much lower dispersion. An important part of the response times obtained for RT-GLADE is due to the network, but is also due to the operating system and the dynamic memory manager used [11] (to make the timing predictable). If we observe the times of RT-GLADE for five clients, we can see that only the minimum time is worse than in GLADE, although with less difference; in contrast the average time and above all the maximum are now clearly better. The increase in all the times of RT-GLADE with respect to the case of one client is reasonable and can be justified by the blocking times that can be suffered both in the processor and in the network. In the measurement of the times of PolyORB-CORBA over MaRTE OS we have found a great disparity of the measurements for five clients depending on the priorities used in them. After analysing the PolyORB code, we found that the implementation made by the Leader/Followers model does not really fit with this model. Instead of having a single thread awaiting the remote request to later execute it and send the response, there is still a thread that breaks the reception of messages from the network and the remote execution. Thus, PolyORB implements two groups of threads: one for RT-CORBA (it is necessary to create it explicitly to support the model of priorities), and the other which corresponds to the concurrency support of CORBA. The threads that serve the clients' requests are taken from the RT-CORBA group, but the intermediate threads are taken from the other group and they are executed at the intermediate priority of the system, which can introduce large priority inversions depending on the priorities of the servers. This is a part which must be improved to guarantee lower bounds of the worst-case response times. In any case, the measurements reflected in Table 2 for PolyORB-CORBA with five clients have been obtained in a best-case scenario in which the low-priority clients are not preempted by any of the threads in the thread pool. Furthermore, in PolyORB-DSA we have substituted the scheduling with a very simple version implementing an experimental prototype of the model defined in [8]. The response times obtained are worse than those of RT-GLADE, but better than those of PolyORB-CORBA. Therefore, with respect to the times of PolyORB over MaRTE OS, it is again shown, by comparing the results with those of RT-GLADE, that the pure implementation of the DSA can be much lighter than that of RT-CORBA. Comparing the tests of PolyORB-CORBA for one and for five clients it can be seen that there is an important difference between the minimum and maximum times for five clients, which is due to the priority inversion introduced by the intermediate tasks. 5 Integration of the Distribution Middleware with a Contract- Based Scheduling Framework The FRESCOR (Framework for Real-time Embedded Systems based on COntRacts) EU project [3] has the objective of providing engineers with a scheduling framework that represents a high-level abstraction that lets them concentrate on the specification of the application requirements, while the system transparently uses advanced real-

102 94 XI Jornadas de Tiempo Real (JTR2008) time scheduling techniques to meet those requirements. In order to keep the framework independent of specific scheduling schemes, FRESCOR introduces an interface between the applications and the scheduler, called the service contract. Application requirements related to a given resource are mapped to a contract, which can be verified at design time by providing off-line guarantees, or can be negotiated at runtime, when it may or may not be admitted. As a result of the negotiation a virtual resource is created, representing a certain resource reservation. The resources managed by the framework are the processors, networks, memory, shared resources, disk bandwidth, and energy; additional resources could be added in the future. Careful use of virtual resources allows different parts of the system (whether they are processes, applications, components, or schedulers) to use budgeting schemes. Not only can virtual resources be used to help enforce temporal independence, but a process can interact with a virtual resource to query its resource usage and hence support the kinds of algorithms where execution paths depend on the available resources. When distribution middleware is implemented on operating systems and network protocols with priority-based scheduling, it is easy to transmit the priority at which a remote service must be executed inside the messages sent through the network. However, this solution does not work if more complex scheduling policies, such as the FRESCOR framework, are used. Sending the contract parameters of the RPC handler and the reply message through the network is inefficient because these parameters are large in size. Dynamically changing the scheduling parameters of the RPC handler is also inefficient because dynamically changing a contract requires an expensive renegotiation process. The solution proposed in [8] consisted in explicitly creating the network and processor schedulable entities required to establish the communication and execute the remote calls. The contracts of these entities are negotiated and created before they are used. They are then referenced with a short identifier that can be easily encoded in the messages transmitted. For identifying these schedulable entities the transactional model is used and the identifier, called an Event_Id, represents the event that triggers the activity executed by the schedulable entity. In the current FRESCOR framework, support for the transactional model is being built. A tool called the Distributed Transaction Manager (DTM) is a distributed application responsible for the negotiation of transactions in the local and remote processing nodes in a FRESCOR system that implements the contract-scheduling framework. Managing distributed transactions cannot be done on an individual processing node because it requires dynamic knowledge of the contracts negotiated in the other nodes, leading to a distributed consensus problem. The objective of the Distributed Transaction Manager is to allow the remote management of contracts in distributed systems, including capabilities for remote negotiation and renegotiation, and management of the coherence of the results of these negotiation processes. In this way, FRESCOR provides support for distributed global activities or transactions consisting of multiple actions executed in processing nodes and synchronized through messages sent across communication networks.

103 3. Sistemas Operativos y Middleware 95 The implementation of the DTM contains an agent in every node, which listens to messages either from the local node or from remote nodes, performs the requested actions, and sends back the replies. In every node there is also a DTM data structure with the information used by the corresponding agent. Part of this information is shared with the DTM services invoked locally from the application threads. This architecture could benefit from the presence of a distribution middleware, by making the agents offer operations that could be invoked remotely, thus simplifying the current need for a special communications protocol between the agents. The current version of the transaction manager limits its capabilities just to the management of remote contracts. In the future, the DTM should also provide a full support for the transactional model, integrated with the distribution middleware. For this purpose the following services would need to be added to it: Specification of the full transaction with identification of its activities, remote services and events, and contracts for the different resources (processors and networks). Automatic deployment of the transaction in the middleware. This would require: - choosing unused Event_Ids for the transaction events - choosing unused ports in the involved nodes, for the communications - creating send endpoints for the client-side of the communications, using the desired contracts and networks - creating receive endpoints for the reception of the reply in the client-side of the communications, using the desired networks, ports, and event ids. - creating the necessary RPC handlers with their corresponding contracts - creating the receive endpoints of the server-side of the communications using the desired contracts and networks - creating the send endpoints of the server-side of the communication using the desired contracts and networks. All this deployment would be done by the DTM from the information of the transaction, which could be written using a suitable deployment and configuration language. After this initialization, the transaction would start executing, its RPCs would be invoked and the middleware would automatically direct them through the appropriate endpoints and RPC handlers almost transparently. We would just specify the appropriate event ids. With the described approach we would achieve a complete integration of the distribution middleware and the transactional model in a system managed through a resource reservation scheduler. 6 Conclusions and Future Work The work presented here reports an analysis and evaluation of some implementations of distribution middleware from the viewpoint of their suitability for the implementation of real-time systems. Specifically, the following aspects have been

104 96 XI Jornadas de Tiempo Real (JTR2008) highlighted: the way remote calls are managed, the mechanisms for establishing the scheduling parameters, and the importance of giving support to the transactions or distributable threads. The time measurements have been carried out over Linux as the native operating system of the middleware analysed, and over a real-time platform based on the MaRTE operating system and the RT-EP real-time network protocol, to which PolyORB has been ported in this work. In the measurements obtained, it can be observed that the implementations of Ada s DSA are lighter than the implementations of RT-CORBA. This demonstrates that Ada could be a good option for programming distributed systems, and that it could find its niche in medium-sized embedded distributed real-time systems. The measurements on the real-time platform also show that the predictability has a cost in terms of overhead in the network and in memory management. Furthermore, new mechanisms for contract-based resource management in a distributed real-time system have been identified, and the necessity to integrate the distribution middleware with them has been described, together with some ideas on future work needed to support this integration. Our work will continue with experimentation on the PolyORB real-time platform that we already have, given our experience in Ada and in GLADE. The objective will be to progress with the improvement of the real-time aspects of this platform both for the DSA and for RT-CORBA, and to integrate the distributed transaction model along with their managers and the new contract-based scheduling mechanisms for processors and networks using the ideas described in this paper. References 1. Ada-Core Technologies, The GNAT Pro Company, 2. M. Aldea, G. Bernat, I. Broster, A. Burns, R. Dobrin, J.M. Drake, G. Fohler, P. Gai, M. González Harbour, G. Guidi, J.J. Gutiérrez, T. Lennvall, G. Lipari, J.M. Martínez, J.L. Medina, J.C. Palencia, and M. Trimarchi. FSF: A Real-Time Scheduling Architecture Framework. Proc. of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2006, San Jose (CA, USA), FRESCOR project web page: 4. J.J. Gutiérrez, and M. González Harbour. Prioritizing Remote Procedure Calls in Ada Distributed Systems. Proc. of the 9th International Real-Time Ada Workshop, ACM Ada Letters, XIX, 2, pp , June Y. Krishnamurthy, I. Pyarali, C. Gill, L. Mgeta, Y. Zhang, S. Torri, and D.C. Schmidt. The Design and Implementation of Real-Time CORBA 2.0: Dynamic Scheduling in TAO. Proc. of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'04), Toronto (Canada), May J. Liu. Real-Time Systems. Prentice Hall, J. López Campos, J.J. Gutiérrez, and M. González Harbour. The Chance for Ada to Support Distribution and Real Time in Embedded Systems. Proc. of the International

105 3. Sistemas Operativos y Middleware 97 Conference on Reliable Software Technologies, Palma de Mallorca, Spain, in LNCS, Vol. 3063, Springer, June J. López Campos, J.J. Gutiérrez, and M. González Harbour. Interchangeable Scheduling Policies in Real-Time Middleware for Distribution. Proc. of the 11th International Conference on Reliable Software Technologies, Porto (Portugal), in LNCS, Vol. 4006, Springer, June MaRTE OS web page, J.M. Martínez, and M. González Harbour. RT-EP: A Fixed-Priority Real Time Communication Protocol over Standard Ethernet. Proc. of the 10th International Conference on Reliable Software Technologies, York (UK), in LNCS, Vol. 3555, Springer, June M. Masmano, I. Ripoll, A. Crespo, and J. Real. TLSF: A New Dynamic Memory Allocator for Real-Time Systems. Proc of the 16th Euromicro Conference on Real-Time Systems, Catania (Italy), June Object Management Group. CORBA Core Specification. OMG Document, v3.0 formal/ , July Object Management Group. Realtime CORBA Specification. OMG Document, v1.2 formal/ , January L. Pautet, and S. Tardieu. GLADE: a Framework for Building Large Object-Oriented Real-Time Distributed Systems. Proc. of the 3rd IEEE Intl. Symposium on Object- Oriented Real-Time Distributed Computing, (ISORC'00), Newport Beach, USA, March PolyORB web page, I. Pyarali, M. Spivak, D.C. Schmidt, and R. Cytron. Optimizing Thread-Pool Strategies for Real-Time CORBA. Proc. of the ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM 2001), Snowbird, Utah, June Sun Developer Network, TAO web page, S. Tucker Taft, Robert A. Duff, Randall L. Brukardt, Erhard Ploedereder, and Pascal Leroy (Eds.). Ada 2005 Reference Manual. Language and Standard Libraries. International Standard ISO/IEC 8652:1995(E) with Technical Corrigendum 1 and Amendment 1. LNCS 4348, Springer, T. Vergnaud, J. Hugues, L. Pautet, and F. Kordon. PolyORB: a Schizophrenic Middleware to Build Versatile Reliable Distributed Applications. Proc.of the 9th International Conference on Reliable Software Technologies, Palma de Mallorca (Spain), in LNCS, Vol. 3063, June 2004.

106

107 4. Sistemas Distribuidos

108

109 4. Sistemas Distribuidos 101 Integración de RT-CORBA en robots Eyebot Manuel Díaz, Daniel Garrido, Luis Llopis, Raúl Luque {mdr, dgarrido, Depart. Lenguajes y Ciencias de la Computación Universidad de Málaga, Grupo de Ingeniería del Software ( GISUM ) Resumen El desarrollo de sistemas empotrados puede beneficiarse de la utilización de los paradigmas, metodologías o técnicas propuestos por la Ingeniería del Software. En concreto, la utilización de CORBA podría simplificar considerablemente las comunicaciones al poder abstraer la plataforma de comunicaciones. Sin embargo, las restricciones de este tipo de sistemas, hacen difícil la utilización de implementaciones CORBA las cuales estaban diseñadas para sistemas con mayores capacidades de procesamiento o memoria. En este trabajo, se presenta la integración de ROFES, una implementación de RT-CORBA y minimumcorba, en un tipo de robot denominado Eyebot que incorpora un amplio conjunto de elementos tales como motor, ruedas, radio, cámara, etc. Se presenta además una aplicación desarrollada utilizando ROFES que permite de forma remota controlar el robot. 1. Introducción Desde hace ya tiempo la progresiva implantación de los sistemas empotrados distribuidos de tiempo real es un hecho en todos los ámbitos de nuestra vida diaria. No obstante, y como tampoco es desconocido, el desarrollo de estos sistemas no es una tarea fácil y puede ser beneficiada por la aplicación de las técnicas, modelos, paradigmas, etc. proporcionados por la Ingeniería del Software. En este sentido, por parte de OMG (Object Management Group) se definen tanto la especificación RT-CORBA [OMGRT02] [OMGRT05] como minimumcorba [OMG98] para la utilización de CORBA en este tipo de sistemas. Los beneficios que la utilización de CORBA puede aportar son amplios, desde la utilización de un paradigma de comunicación de alto nivel basado en objetos, hasta la independencia de plataforma de comunicaciones o sistema operativo para el desarrollador. Sin embargo, existen pocas implementaciones que permitan utilizar RT-CORBA o minimumcorba en sistemas empotrados con fuertes restricciones de memoria, procesamiento, etc. En este artículo se presenta la migración de ROFES [ROFES], una implementación de RT-CORBA, para su utilización en robots tipo Eyebot [EYEBOT], los cuales disponen de bastantes posibilidades para el desarrollo de aplicaciones empotradas distribuidas, pero que no obstante, tienen restricciones como las comentadas que hacen difícil la utilización de CORBA [GORAPPA]. La estructura del artículo es como sigue: en la siguiente sección se presenta una breve introducción a ROFES junto con RT-CORBA y minimumcorba. La sección 3 presenta las principales características del robot Eyebot mostrando tanto aspectos hardware como software. La sección 4 detalla el proceso de migración de ROFES a los robots Eyebot. La sección 5 presenta una aplicación desarrollada para el manejo del robot utilizando ROFES. Finalmente, se presentan algunas conclusiones. 2. RT-CORBA y minimumcorba El proyecto Real-Time CORBA for Embedded Systems (ROFES) implementa una versión de CORBA [OMG02][VINOS][SCHMIDT] para sistemas empotrados de tiempo real siguiendo los estándares de la OMG Real Time CORBA Specification versión 1.1 de Agosto de 2002 y minimumcorba del 17 de Agosto de Este proyecto lo está desarrollando el grupo Chair of Operating System de la Universidad RWTH-Aachen de Alemania. La especificación RT-CORBA define un conjunto opcional de extensiones y una serie de servicios de planificación que permiten utilizar CORBA como un componente más de un sistema de tiempo real. El objetivo de estas extensiones es ofrecer un soporte para la gestión de recursos que asegure que las actividades que realizan las aplicaciones con restricciones de tiempo real se ejecuten de manera predecible. La especificación RT-CORBA 1.1. impone un modelo de planificación basado en prioridades fijas y depende de un sistema operativo de tiempo real que planifique las hebras que representan las actividades del sistema y que ofrezca un mutex como mecanismo de sincronización para el acceso a los recursos compartidos. En el caso de que algunos de estos mecanismos no sean proporcionados por el sistema operativo, tendrán que ser desarrollados, situación ésta, que se ha producido en el

110 102 XI Jornadas de Tiempo Real (JTR2008) trabajo presentado en este artículo en determinadas áreas tales como sincronización o hebras. Los sistemas empotrados de tiempo real necesitan una versión reducida de CORBA que puedan ejecutar, dado que normalmente las implementaciones CORBA estándares requieren de sistemas con mayores capacidades de memoria o procesamiento. Esta versión reducida se denomina minimumcorba, la cual define un perfil, o subconjunto, de CORBA. Esta especificación ha sido reemplazada actualmente por la especificación Common Object Request Broker Architecture for embedded (CORBA/e) del 3 de Agosto de 2006 [OMG06]. No obstante, la implementación RT-CORBA utilizada, ROFES, hace uso de minimumcorba. Las características de CORBA omitidas por este perfil tienen importancia en aplicaciones CORBA para sistemas genéricos. Sin embargo, estas características producen algún coste, en términos de recursos, y hay un significativo tipo de aplicaciones para las cuales este coste no se puede justificar, como es el caso de los sistemas empotrados. minimumcorba define un único perfil que preserva los principales beneficios de CORBA: portabilidad de las aplicaciones e interoperabilidad entre ORBs. Se reconocen los siguientes objetivos cuando se elige este perfil: Cualquier característica conservada debe tener una gran aplicabilidad en el mundo de los sistemas de recursos limitados. minimumcorba debería ser completamente interoperable con CORBA. minimumcorba debería soportar el IDL completo. En general, se omiten las características que proporcionan los aspectos dinámicos de CORBA. usuario. Además, posee una memoria RAM de 1 MB, extensible a 2 MB, que permite almacenar y ejecutar los programas de usuario. Sensores: incluye sensores para la detección de movimientos del robot y sensores infrarrojos para detectar la presencia y situación de obstáculos cercanos. Cámara digital: resolución de 80x60 píxeles y 24 bits, situada en la parte frontal, ofreciendo la posibilidad de capturar imágenes del entorno del robot. Motores encargados de mover las ruedas. Puerto serie a través del cual se pueden descargar aplicaciones al Eyebot, y también para la comunicación entre el robot y un ordenador. Módulo de radio, con tecnología Wifi o Bluetooth, para comunicar un robot con otros o con un ordenador. Puerto paralelo usado como depurador externo. Pantalla LCD donde se pueden mostrar mensajes del programa en ejecución. Botones de interacción con el usuario. Micrófono y altavoz interno para capturar y reproducir sonidos. Como puede observarse, el hardware del robot incluye diferentes elementos que le proporcionan una gran versatilidad. Las figuras 1 y 2 muestran diferentes vistas del robot. 3. Descripción del Eyebot El denominado Eyebot es un robot móvil, diseñado para ser usado en el campo de la educación e investigación de la robótica y de sistemas empotrados de tiempo real. En esta sección se describen brevemente el hardware del robot Eyebot, el sistema operativo RoBIOS (Robot Basic Input Output System) [BRAUNL] y la interfaz de programación proporcionada por el fabricante. Las características y/o limitaciones de estos elementos influyen en el proceso de migración de ROFES para su utilización en los robots. Figura 1. Vista frontal del Eyebot 3.1. Elementos hardware El hardware del Eyebot incluye entre otros, los siguientes elementos: Microprocesador Motorola a 35 MHz con una memoria Flash Rom de 512 KB, de las cuales 128 KB están reservadas al sistema operativo y el resto para los programas del Figura 2. Vista trasera del Eyebot

111 4. Sistemas Distribuidos Sistema Operativo El robot Eyebot incluye un sistema operativo denominado RoBIOS que consta de tres elementos: consola, tabla HDT con los dispositivos hardware conectados y la interfaz de programación. El sistema operativo gestiona los diferentes recursos del sistema y muestra al usuario una consola mediante la cual se puede interactuar con el robot. Desde ésta, se pueden cargar y ejecutar programas, hacer comprobaciones del funcionamiento de los sensores y motores del robot. En el sistema operativo existe una tabla, denominada HDT (Hardware Description Table), en la que se definen los diferentes dispositivos hardware conectados realmente al robot. Para facilitar el acceso a los distintos recursos desde los programas de usuario, la RoBIOS ofrece además una interfaz de programación con funciones adecuadas para tipo de dispositivo como por ejemplo la cámara Interfaz de programación La interfaz de programación del Eyebot consiste en un conjunto de funciones para acceder y manipular los distintos sensores, actuadores y dispositivos conectados al robot. Estas funciones están escritas en C y se enlazan con el código del programa usuario, de manera, que resulta sencillo acceder a los distintos elementos. Se describen aquí algunas de las características de dicha API que son significativas para la migración de ROFES. El sistema operativo RoBIOS posee dos tipos distintos de planificadores: cooperativo y expulsivo. Cooperativo: la tarea, que se ejecuta, no es expulsada del planificador hasta que ella misma cede el testigo a otra tarea. La tarea que recoge el testigo, será una tarea que esté en la cola de listos y cuya prioridad sea la más alta. Expulsivo: la tarea en ejecución es aquélla con mayor prioridad no bloqueada o suspendida, permitiendo la expulsión de tareas de menor prioridad. ROFES está basado en la utilización de hebras, por lo que será necesario utilizar y/o adaptar las hebras proporcionadas por RoBIOS. Como mecanismo de sincronización, RoBIOS proporciona semáforos, los cuales también será necesarios para implementar por ejemplo los mutexes ofrecidos por ROFES. Finalmente, se ofrece también la posibilidad de utilizar temporizadores que pueden ser necesarios para los mecanismos de planificación. El desarrollo de aplicaciones para el Eyebot se basa en un proceso de compilación cruzada a través de Linux en el que los programas se descargan a través del puerto serie. 4. Migración de ROFES al robot Eyebot El proyecto ROFES implementa un ORB para ejecutarse sobre sistema empotrados de tiempo real como el Eyebot. Pero incluso así, los recursos hardware disponibles en este robot son insuficientes para poder ejecutar dicho ORB, por ejemplo, en cantidad de memoria. También, se encuentran limitaciones software, es decir, el compilador cruzado para el Eyebot no proporciona todas las funciones necesarias para construir el proyecto ROFES para el robot. Para solventar el problema de la memoria se optó por seguir la siguiente estrategia de implementación: No se va a disponer de las funciones encargadas de abrir y leer librerías compartidas, sólo se pueden crear las librerías estáticas de ROFES. No se va a disponer de los métodos necesarios para los protocolos de comunicación que implementa la librería ROFES y que no son utilizados. Teniendo en cuenta lo anterior, se elimina todo el código relacionado con la apertura y lectura de librerías compartidas y todas las clases que implementan los distintos protocolos de comunicación no utilizados. También, se elimina todo el código específico para los sistemas operativos en los cuales se puede construir la librería ROFES. Con respecto a las limitaciones software, se encuentran los siguientes problemas: No se dispone de la librería para las hebras POSIX. Únicamente se pueden utilizar para las hebras las funciones proporcionadas por la RoBIOS del Eyebot. Las funciones proporcionadas por el compilador cruzado que sean reimplementadas por la RoBIOS no pueden utilizarse ya que en el momento del enlace, el compilador indica que no encuentra la librería que contiene esa función. Esto ocurre, por ejemplo, con la función sleep(n), que duerme el proceso n segundos, y la RoBIOS proporciona la función OSWait(n), que duerme el proceso n / 100. Insuficiente tamaño de la pila de los programas. Solventados todos estos problemas, se obtiene un ORB capaz de ejecutarse sobre el Eyebot utilizando las librerías de ROFES. La comunicación de los ORBs se realiza a través del puerto serie, pudiéndose añadir los protocolos de comunicación que utilicen los módulos de Bluetooth o Wifi proporcionados por el fabricante del Eyebot. Además de las anteriores limitaciones, durante el desarrollo de la migración fue necesario realizar un estudio de la estructura del código fuente de ROFES para poder realizar la modificación de diferentes partes del ROFES. También fue necesario subsanar algunos

112 104 XI Jornadas de Tiempo Real (JTR2008) errores presentes en el propio ROFES por tratarse de un proyecto en desarrollo. Cambios en el ORB. Por los problemas mencionados de limitación de memoria, se han eliminado del núcleo del ORB todos aquellos elementos no indispensables o que ya no tenían sentido como, por ejemplo, los relativos a los protocolos de comunicación que ya no se iban a utilizar en el robot. Algunos de los principales cambios han sido: Limitación del número de ORBs que se pueden inicializar en el Eyebot a 1. Esto realmente no supone una gran limitación para el desarrollador, puesto que habitualmente sólo se inicializa 1 ORB. Cambios en la configuración: se han eliminado algunas de las opciones de configuración admitidas por el ORB y se han modificado otras para permitir por ejemplo la comunicación con los mecanismos que ofrece el robot. Así por ejemplo, la opción ORBDefaultEndpoint admite los valores rs232, bluetooth y wireless, si la comunicación se realiza a través del puerto serie, del módulo de bluetooth o del módulo wireless, respectivamente. No ha sido necesario modificar el ORB de tiempo real. Cambios en el adaptador de objetos. El adaptador de objetos es el elemento del ORB que permite realizar la localización de los objetos remotos. La principal modificación ha consistido en eliminar el código asociado a los protocolos de comunicación no utilizados. También ha sido necesario realizar esta modificación en el adaptador de objetos de tiempo real. Gestión de prioridades. El rango de prioridades admitidas por las hebras en el Eyebot viene definido por las constantes MIN_PRI y MAX_PRI con los valores 1 y 8, respectivamente. Es necesario realizar cambios en ROFES para que se reconozcan estos valores. Así, por ejemplo, es necesario redefinir la constante logical_div_native cuyo valor permite traducir la prioridad CORBA a la prioridad nativa y viceversa. El valor de esta constantes es la máxima prioridad CORBA dividida entre la constante MAX_PRI, ya que las hebras del Eyebot sólo pueden tomar ocho valores distintos. La definición de estas constantes es: int RTCORBA::PriorityMapping::os_maxPriority = MAX_PRI; int RTCORBA::PriorityMapping::os_minPriority = MIN_PRI; #define logical_div_native (32767 / MAX_PRI) Como la clase PriorityMapping es la encargada de traducir las prioridades de las hebras entre la prioridad CORBA y la del sistema operativo, se ha eliminado, en esta clase, todo el código implicado en la definición, obtención, de las prioridades admitidas por los sistemas operativos en los cuales puede construirse el proyecto ROFES. También, se han reimplementado los métodos to_native y to_corba para tener únicamente en cuenta, los valores válidos de la prioridad para RoBIOS. El código del método to_native quedaría como sigue: if ((corba_priority < minpriority) (corba_priority > maxpriority)) { ROFES_LEAVE; return false; } native_priority = corba_priority / logical_div_native; // Las prioridades empiezan en 1 native_priority++; if (native_priority > os_maxpriority) native_priority = os_maxpriority; ROFES_LEAVE; return true; Cabe destacar que los métodos the_priority(priority) y the_priority() de la clase RTCORBA::Current, encargados de establecer y obtener la prioridad de la hebra actual, no tienen funcionalidad debido a que la RoBIOS no proporciona ninguna función o método que permita acceder y/o modificar la prioridad de las hebras una vez inicializadas. El valor de la prioridad de las hebras en el Eyebot únicamente se establece en el momento de su creación. Sincronización. La clase CORBA::OSMutex utiliza las operaciones para los mutex específicas de cada sistema operativo. La RoBIOS sólo proporciona tres funciones de semáforos: OSSemInit, OSSemP y OSSemV. Estas funciones han sido utilizadas en los métodos lock y unlock para poder simular los mutexes proporcionados por ROFES. Conjuntos de hebras. El principal problema en la migración del proyecto ROFES al Eyebot ha sido la creación y gestión de las hebras a través de las funciones proporcionadas por la RoBIOS. El problema radica en que la función OSSpawn espera, como uno de sus parámetros de entrada, un puntero a una función C, y no un puntero a un método de C++, que contiene el código a ejecutar.

113 4. Sistemas Distribuidos 105 Afortunadamente, los punteros a métodos de C++ estáticos son compatibles con los punteros normales a funciones de C. Pero, surge un nuevo problema: los métodos estáticos sólo pueden usar otros métodos y atributos estáticos, lo cual es algo limitado. Para solventar estos problemas, se crea un atributo estático encargado de almacenar la referencia del objeto creado y un método estático que llama al método C++ que contiene el código que debe ejecutar la hebra. De las clases implicadas en la implementación de los Threadpools en ROFES sólo es necesario modificar la clase Threadpool_impl. Se ha eliminado todo el código relacionado con las funciones encargadas de crear y gestionar las hebras en los sistemas operativos en los cuales se puede construir el proyecto ROFES. Otra modificación realizada ha consistido en modificar el tamaño de la pila para las hebra. Así, por ejemplo, en caso de usar las funciones de la cámara el valor debe ser al menos de bytes. Cambios en GIOP. El denominado General Inter-ORB Protocol se encarga de definir los mensajes necesarios para establecer las comunicaciones entre servidores y clientes. En el caso de ROFES, éste incluye implementaciones de GIOP para algunas plataformas como TCP/IP o CAN. Para el proceso de migración a RoBios, ha sido necesario modificar las clases GIOPDevice y GIOPServer añadiendo la posibilidad de utilizar el puerto serie en el robot. En la clase GIOPDevice se ha eliminado todo el código relacionado con la implementación referente a la red CAN y con la lectura de las librerías compartidas. En la clase GIOPConnector se ha añadido el código necesario para crear los dispositivos encargados de la comunicación a través del puerto serie, de bluetooth y de wireless, tal y como muestra el siguiente código: _major = profile->major(); _minor = profile->minor(); if (proto) proper = proto->transport_protocol_properties; if (profile->id() == IOP::TAG_RS232IOP) { _device = new RS232IOP::RS232Device(proper); #ifdef HAVE_BLUETOOTH } else if (profile->id() == IOP::TAG_BLUETOOTHIOP) { _device = new BLUETOOTHIOP::BLUETOOTHDevice(proper); #endif #ifdef HAVE_WIRELESS } else if (profile->id() == IOP::TAG_WIRELESSIOP) { _device = new WIRELESSIOP::WIRELESSDevice(proper); #endif... Ha sido también necesario modificar el mecanismo de referencias IOR de los objetos de ROFES, ya que han sido eliminados los protocolos de comunicación no utilizados y se han añadido nuevos protocolos como por ejemplo, la comunicación a través del puerto serie. Se han realizado por tanto cambios en diferentes partes de ROFES para que se tengan en cuenta los nuevos protocolos. Protocolo de comunicación por puerto serie. La implementación de este protocolo realizada en el proyecto ROFES no puede utilizarse porque la RoBIOS proporciona sus propias funciones para comunicarse con el puerto serie. La clase RS232Device permite el envío y recepción de los mensajes GIOP a través del puerto serie del Eyebot. Se muestra a continuación como quedaría el método receiveheader(*buff), el cual lee del puerto serie los doce primeros bytes correspondiente a la cabecera del mensaje GIOP: while(bytes < 12) { diff = OSRecvRS232(&(buff[bytes]), SERIAL1); if (diff == 0) { //Lectura correcta bytes ++; } else if (diff!= 10) { //Error al leer if (!hang_on--) { //No mas intentos ROFES_LEAVE; return false; } } else { ROFES_PRINT_ERROR1("error=%d\n", diff); ROFES_LEAVE; return false; } } 5. Consola para el robot Eyebot Una vez obtenida una versión de CORBA capaz de ejecutarse en el Eyebot, se diseña una aplicación para el Eyebot la cual crea un ORB, activando un objeto que proporciona acceso a los distintos componentes hardware del robot. Es decir, el objeto activo en el ORB del Eyebot tiene como objetivos: Leer los sensores infrarrojos PSD. Obtener las velocidades del Eyebot. Controlar el servo de la cámara y del pateador. Controlar el tipo de movimiento y velocidad del Eyebot. Obtener imágenes de la cámara del Eyebot.

114 106 XI Jornadas de Tiempo Real (JTR2008) Comunicar a los clientes, si los hubiera, los errores producidos por la ejecución de las distintas funciones de la RoBIOS. Se ha desarrollado además una aplicación que se comunique con el objeto anterior y cuyas funciones son: Mostrar los valores leídos por los sensores infrarrojos PSD del Eyebot. Mostrar los valores de la velocidad lineal y angular con las que se mueve el Eyebot. Establecer la nueva posición de los servos del robot. Establecer el nuevo tipo de movimiento del robot, es decir, fijar la nueva velocidad lineal y angular, el ángulo de giro y la distancia a recorrer. Mostrar las imágenes obtenidas por la cámara del Eyebot. Informar al usuario de las distintas excepciones producidas durante el envío y recepción de las peticiones al objeto. Para que las aplicaciones puedan realizar sus funciones, se define la siguiente interfaz en IDL: module TestEyebot { typedef char Image[15252]; interface ControlEyebot { exception NoCamera {}; exception InitErrorCamera {}; exception ReleaseCamera {}; exception WrongHandleServo {}; exception WrongHandleVW {}; void initcamservo() raises (NoCamera, InitErrorCamera, WrongHandleServo); void releasecamservo() raises (ReleaseCamera, WrongHandleServo); Image getimage(); void turnservo(in short phi) raises (WrongHandleServo); void setspeed(in float v, in float w) raises (WrongHandleVW); void getspeed(out float v, out float w) raises (WrongHandleVW); void drivestraight(in float dist, in float v) raises (WrongHandleVW); void driveturn(in float phi, in float w) raises (WrongHandleVW); void drivecurve(in float dist, in float phi, in float v) raises (WrongHandleVW); void getvaluepsd(out short front, out short left, out short right); void hit() raises (WrongHandleServo); }; }; El tamaño del tipo Image no se ha definido como una constante debido a que el compilador IDL de ROFES no realiza correctamente la traducción a C++ de las constantes. El tamaño de este tipo debe ser igual al tamaño de los tipos definidos en la RoBIOS para almacenar imágenes. Este tipo almacena una imagen en color de tamaño 82x62x3 obtenida por la cámara del Eyebot. Las excepciones definidas controlan los errores que pueden ocurrir durante la ejecución de las distintas funciones de las RoBIOS usadas y su significado es el siguiente: NoCamera: No existe ninguna cámara instalada en el Eyebot. InitErrorCamera: Error al reservar los recursos necesarios para la utilización de la cámara. ReleaseCamera: Error al liberar los recursos asignados a la cámara. WrongHandleServo: El manejador del servo es erróneo. WrongHandleVW: El manejador de la interfaz V - Omega es erróneo. Los siguientes métodos permiten interactuar con la cámara del robot: initcamservo: Resetea e inicializa la cámara del Eyebot y el servo que controla la posición de la cámara. releasecamservo: Libera los recursos asignados a la cámara del robot y los asignados al servo de la cámara. getimage: Obtiene una imagen en color de 82x62 de la cámara. turnservo: Fija la nueva posición del servo de la cámara. Los siguientes métodos permiten obtener y/o modificar el tipo de movimiento realizado por el robot: setspeed: Establece las nuevas velocidades lineal y angular del Eyebot. getspeed: Obtiene el valor actual de la velocidad lineal y angular del Eyebot. drivestraight: El robot avanza o retrocede en línea recta dist metros a una velocidad v. driveturn: El robot gira a la izquierda o a la derecha phi grados a una velocidad w. drivecurve: El robot avanza o retrocede describiendo una curva de phi grados durante dist metros a una velocidad v. El método getvaluepsd lee los valores leídos por los tres sensores infrarrojos que posee el Eyebot. Finalmente, el método hit mueve el servo del golpeador. Como ejemplo, el método getimage devuelve una imagen en color de tamaño 82x62 capturada por la cámara del Eyebot. Para la implementación de este método, se pensó, en un principio, en comprimir la imagen en el formato PNG, pero es necesario migrar la librería PNG a la RoBIOS y añadirle a la imagen

115 4. Sistemas Distribuidos 107 capturada, la cabecera de la imagen para el formato PNG ya que la funciones de la RoBIOS devuelven una matriz con el mapa de píxeles de la imagen. Esto conlleva un gasto de los escasos recursos de los que se dispone en el Eyebot para la ejecución de CORBA, y la ganancia de tiempo en enviar la imagen comprimida a través del puerto serie no es significativa con el tiempo que se tarda en enviar la imagen sin comprimir. Por tanto, la implementación de este método es: colimage colimg; TestEyebot::Image_slice* img = Image_alloc(); CAMGetColFrame(&colimg, FALSE); img = Image_dup((Image_slice*) colimg); return img; Como puede observarse se utilizan funciones de la API proporcionadas por RoBIOS tales como CAMGetColFrame. La figura 3 muestra la aplicación cliente desarrollada con la librería Qt [QT3], y desde la que es posible controlar el robot Eyebot. 6. Conclusiones Figura 3. Vista frontal del Eyebot En este trabajo se ha presentado la integración de un entorno CORBA sobre un sistema empotrado de tiempo real. Las especificaciones RT-CORBA y minimumcorba ayudan a que las limitaciones propias de los sistemas empotrados no sean un impedimento para conseguir un núcleo de CORBA con sus principales características y capaz de convertirse en un componente más del sistema empotrado de tiempo real. Se ha podido comprobar que aun teniendo un sistema operativo tan básico como RoBIOS del Eyebot, es posible obtener una núcleo CORBA con sus principales características capaz de comunicarse con otros entornos CORBA o formar parte de un sistema CORBA más complejo. Para finalizar, cabe destacar, que la especificación de minimumcorba ha quedado obsoleta. Esta especificación ha sido reemplazada por la especificación de Common Object Request Broker Arquitecture for embedded, el perfil CORBA/e, que añade más funcionalidades a las versiones de CORBA que se ejecutan en sistemas empotrados, ya que, estos sistemas disponen, cada día que pasa, de más recursos. Referencias [BRAUNL] T. Bräunl Embedded Robotics Mobile Robot Design and Applications with Embedded Systems. Editorial: Springer. [EYEBOT] [GORAPPA] S.Gorappa, J.A. Colmenares, H. Jafarpour, R. Klefstad, Tool-based Configuration of Real-time CORBA Middleware for Embedded Systems, ISORC [OMG98] Object Management Group minimumcorba [OMG02] Object Management Group The Common Object Request Broker: Architecture and Specification. Version 3.0 [OMG06] Object Management Group Common Object Request Broker Architecture (CORBA) for embedded Specification. [OMGRT02] Object Management Group Real-Time CORBA Specification. Version 1.1 [OMGRT05] Object Management Group Real-Time CORBA Specification. Version 1.2 [QT3] [ROFES] Chair of Operating System de la Universidad RWTH-Aachen de Alemania. [SCHMIDT] D.C. Schmidt, F. Kuhns, An overview of the Real-time CORBA Specification in IEEE Computer special issue on Object-Oriented Real-time Distributed Computing, June [VINOS] M. Henning, S. Vinoski Programación Avanzada en CORBA con C++.Editorial: Addison Wesley

116 108 XI Jornadas de Tiempo Real (JTR2008) An Ada 2005 Technology for Distributed and Real-Time Component-based Applications Patricia López Martínez, José M. Drake, Pablo Pacheco, Julio L. Medina Departamento de Electrónica y Computadores, Universidad de Cantabria, Santander, SPAIN {lopezpa,drakej,pachecop,medinajl}@unican.es Abstract: The concept of interface in Ada 2005 facilitates significantly its usage as the basis for a software components technology. This technology, taking benefit of the resources that Ada offers for real-time systems development, would be suitable for component-based real-time applications that run on embedded platforms with limited resources. This paper proposes a model based technology for the implementation of distributed real-time component-based applications with Ada The proposed technology uses the specification of components and the framework defined in the LwCCM standard, modifying it with some key features that make the temporal behaviour of the applications executed on it, predictable, and analyzable with schedulability analysis tools. Among these features, the dependency on CORBA is replaced by specialized communication components called connectors, the threads required by the components are created and managed by the environment, and interception mechanisms are placed to control their scheduling parameters in a per-transaction basis. This effort aims to lead to a new IDL to Ada mapping, a prospective standard of the OMG. Keywords: Ada 2005, Component-based technology, embedded systems, realtime, OMG standards 1 Introduction 1 While in the general-purpose software applications domain the component-based software engineering (CBSE) approach is progressing as a promising technology to improve productivity and to deal with the increasing complexity of applications, in the embedded and real-time systems domain, instead, its usage has evolved significantly slower. The main reason for this delay is that the most known CBSE technologies like EJB,.NET, or CCM, are inherently heavy and complex, they introduce not easily predictable overheads and do not scale well enough to fit the significant restrictions on the availability of resources usually suffered by embedded systems. Trying to find an appropriate solution to this problem, european research projects like COMPARE [1] and FRESCOR [2], tackle from different points of view, the 1. This work has been funded by the European Union s FP6 under contracts FP6/2005/IST/ (FRESCOR project) and IST (ARTIST2 One). This work reflects only the author s views; the EU is not liable for any use that may be made of the information contained herein.

117 4. Sistemas Distribuidos 109 development of a real-time component-based technology compatible with embedded systems. Their approach is based on the usage of the Container/Component model pattern defined in the LwCCM specification developed by OMG [3], but avoiding the usage of CORBA as communication middleware, which is too heavy for this kind of applications. With this pattern, the interaction of the component with the run-time environment is completely carried out through the container, whose code is generated by automatic tools with the purpose of isolating the component developer from the details about the code of the execution environment. The recent modification of the Ada language specification [4], so called Ada 2005, provides an enhanced option for the implementation of fully Ada native componentbased technologies, which is really suitable for embedded platforms. Over the known assets of Ada for real-time and distributed applications, like native support for concurrency, scheduling policies, synchronization mechanisms, and remote invocations; Ada 2005 includes the concept of interface, which allows to implement directly the services offered and required by components (Facets and Receptacles in LwCCM respectively). Additionally, Ada 2005 handles incomplete types, which enable the definition of cross-references, very frequently used in component based applications. This paper proposes a component-based technology based on Ada. It implements the LwCCM framework, with the container/component model, and both the code of the environment and the code of the components are written in Ada The technology incorporates mechanisms to the running environment, and extends the specification of the components, in such a way that the timing behaviour of the final application is totally controlled by the automatically generated execution environment. In this way, real-time models of the application can be elaborated and analysed in order to verify its schedulability when it is run in closed platforms, or to define the resource usage contracts required to operate in open environments like FRESCOR[2][5]. The description and deployment of applications and components in the technology follow the Deployment and Configuration of Component-Based Distributed Applications standard of the OMG [6] (D&C). The paper is focused in the description of the framework that is the base of the technology, particularly on the resources used to guarantee the required predictability. Various proposals dealing with the adaptation of CBSE to real-time systems have appeared in the last years, though none of them have fully satisfied the industry requirements [7]. In the absence of a standard, some companies have developed their own solutions, adapted to their corresponding domains. Examples of that kind of technologies are Koala [8], developed by Philips, or Rubus [9], developed by Arcticus Systems and used by Volvo. These technologies have been successfully applied in the companies they were respectively created, but they do not enable the arising of an inter-enterprise software components market. However, they have served as the basis of other academic approaches. Robocop component model [10], is based on Koala and adds some features to support analysis of real-time properties; Bondarev et al. [11] have developed an integrated environment for the design and performance analysis of Robocop models. Similarly, Rubus has been used as the starting point of the SaveCCT technology [12]; the component concept in SAVE is applied at a very low granularity, eventhough, under appropriate assumptions for concurrency, simple RMA analysis can

118 110 XI Jornadas de Tiempo Real (JTR2008) be applied and the resulting timing properties introduced as quality attributes of the assemblies; SaveCCT focuses on control systems for the automotive domain. In a similar way, COMDES-II [13] encapsulates control tasks following a hierarchical composition scheme, applied in an ad-hoc C based RT-kernel. The technology presented in this paper follows the idea proposed by PECT (Prediction-Enabled Component Technology) [14]. It proposes to include sets of constraints in the component-based technology that allow to predict the behaviour of an assembly of components before its execution, based on properties of the components. In our case, this approach is applied to obtain the complete real-time model of the application. Though the Ada language is significantly used in the design and implementation of embedded real-time systems, we have not found references of its usage in support of component-base environments. This is probably due to the lack of support for multiple inheritance in the old versions of the language. The rest of this paper is organized as follows. Section 2 describes the two main processes involved in a components technology, emphasizing the main contributions of the proposal. Section 3 describes in detail the reference model of the framework, and the aspects included for developing analyzable applications. Section 4 details the architecture and classes to which a component is mapped in the technology and finally, Section 5 explains our conclusions and future work. 2 Real-time component-based development A component technology defines two different development processes, shown in Figure 1. The components development process comprises the specification, implementation, and packaging of components as reusable and independently distributable entities, while the development of component-based applications includes specification, configuration, deployment and launching of applications built as assemblies of available components. Both processes are independent and they are carried out by different agents in different stages, however, they require to be coordinated because the final products of the first process are the inputs for the second. So, in order to guarantee their coherence, a component technology must define a set of rules about the kind of products and information that are generated in each phase of the process, and the formats in which they are supplied. A key aspect in a component technology is the opacity of the components. This means that during the process of application development, components must be used without any knowledge of internal details of their implementation or code. To achieve this, a component is stored as a package that includes, together with the implementation code, other files which supply models and complementary information (metadata) about different aspects (functional and nonfunctional) of the component, required for its usage. A component development process starts when the specifier, who is an expert in a particular application domain, elaborates the specification of a component that brings a concrete functionality demanded in the domain. The developer implements this specification according to a certain technology and elaborates the models that describe the installation requirements of the component. This work is supported by automatic tools, which generate the skeletons for the code of the component based on the

119 4. Sistemas Distribuidos 111 Components development Required functionality Specifier Component specification (idl3 file) Ada 2005 Code generation Automatic tool Developer Repository Component Implementation Real-Time Model Packager Packager Tool Component Description (code,metadata, real-time model ) Application development Platform description & model Component package Executor Application Specification Real-time Requirements Design Tool Assembler/ Planner Deployment Plan Workload Model RT Model Compiler Launching Tool Application RT Model RT Analysis Tool Application Execution Fig. 1. Main processes in a component technology selected technology. Therefore, the developer task is reduced to design and implement the specific business code of the component without having to be aware of internal details about the technology. Finally, the packager gathers all the information required to make use of the component, and creates and publishes the distributable element that constitutes the component. Relevant aspects of the proposed technology related to components development are: The methodology for functional specification of components and the framework proposed by the LwCCM specification have been adopted as the basis for the technology. Hence, a container/component model is used in the component implementations, but CORBA is replaced by simpler static communication mechanisms with predictable behaviour, and suitable for the execution platform. Remote communication between components is achieved by using Connectors. These are components whose code is completely and automatically generated by the tools and encapsulate all the support for interactions among components. Component implementations are generated in Ada2005, so the set of Ada packages to which a component is mapped, as well as the code structure of all the elements that form the LwCCM framework have been defined. An automatic code generation tool has been developed. It takes the specification of a component as input and generates all the code elements that provide support for the component inside the framework, including the frames in which the developer must write the business code of the component. The technology follows the D&C specification for the description of the package which represents the distributable component. In order to apply the technology to hard real-time component-based applications, both standard specifications, D&C and LwCCM, have been extended with new elements that are used to describe the temporal behaviour of components and the requirements they impose on the resources in order to meet timing requirements:

120 112 XI Jornadas de Tiempo Real (JTR2008) - D&C specification has been extended in order to associate a temporal behaviour model to the specifications and implementations of components. This realtime model is used to describe the temporal responses of the component and the configuration parameters that it admits. Although this paper does not detail the modelling approach used, which is explained in [15], the basic idea is that the real-time model of a component is a parameterized model, which describes the component temporal behaviour having references to the models of the platform in which the component is executed and to the models of other components that it uses in order to implement its functionality. These real-time models have the composability properties required to generate the real-time model of the complete application by composition of the individual real-time models of the software and hardware components that form it. This real-time model can be used to obtain the response time of services, analyze schedulability or evaluate the scheduling parameters required to satisfy the timing requirements imposed to the application. In our case, the real-time models of the components are formulated according to the MAST model [16], so that the set of tools offered by the MAST environment can be used to analyze the system. - With the purpose of controlling the threading characteristics (number and assignment of threads and scheduling parameters) of the components used in the technology, the functional specification of a component, as it is described in LwCCM, has been refined. A component can not create threads inside its business code. Instead of that, for each thread that a component requires, it declares a port in its specification. This port implements one of the predefined interfaces OneShotActivation or PeriodicActivation (see Section 3). When the component is instantiated, the environment provides the thread for the execution of the port. Scheduling parameters for the thread will be assigned as configuration properties of the instance in the deployment plan. - New mechanisms have been introduced in the container/component model to define the scheduling parameters with which each invocation received by a component is executed. The run-time control of these parameters is done by means of interceptors which can be introduced in the framework for each operation offered by a component. The application development process consists in assembling component instances, choosing them from those which have been previously developed, and stored in the repository of the design environment. This process is carried out by three different agents in three consecutive phases. The assembler builds the application choosing the required component instances and connecting them according to their instantiation requirements. This work is led by the functional specification of the application, the real-time requirements of the application, and the description of the available components. The result of this first stage is a description of the application as a composite component, which is useful by itself. The planner (usually the same as the assembler) takes this description and decides the deployment of the application, which means that it chooses the nodes in which each component instance will be installed, and the communication mechanism between instances. The result of this stage is the deployment plan, which completely describes the application and the way in which it

121 4. Sistemas Distribuidos 113 is planned to be executed. Finally, the executor deploys, installs, and executes the application, taking the deployment plan and the information about the execution platform as inputs. This labour is usually assisted by automatic tools. Relevant aspects of the proposed technology regarding application development are: The D&C specification is taken as the basis for the process of designing and deploying an application. D&C defines the structure of the deployment plan that leads this process. It describes the component instances that form the application, their connections, the configuration parameters for each instance and the assignment of instances to nodes. A deployment tool processes the information provided by the deployment plan. It selects the code of the components suitable for the target platform and generates the code required to support the execution of the components in each node. Specifically, it automatically generates the connectors, which provide the communication mechanisms between remote component instances, as well as the code for the main procedures executed on each node. The specific aspects included to support hard real-time applications are: - Once the planner has developed the deployment plan, the local or remote nature of each connection between component ports is defined. Then, an automatic tool generates the code of the connectors based on the selected communication service and its corresponding configuration parameters, which were assigned to the connection in the deployment plan. The communication service used must hold a predictable behaviour, hence, the tool generates also the real-time models that describe the temporal behaviour of those connectors. These models will be later composed with the real-time models of the other components in order to build the analysis model of the complete application. - Based on the deployment plan, a tool elaborates the real-time model of the application by composition of the real-time models of the components that form it (connectors included) and the models of the platform resources, which should also be stored in the repository. This model is used either to analyze the schedulability of the application under a certain workload, or to calculate the resource usage contracts necessary to guarantee its operation in an open contractual environment [5]. These contracts will be negotiated, prior to the application execution, by the launching tool. - The execution environment includes a special internal service and interception mechanisms that manage in an automated way the scheduling parameters of the threads involved in the application execution. The configuration parameters of this service, whose values may be obtained by schedulability analysis, are specified in the deployment plan and assigned to the service at launching time. 3 Reference model of the technology The proposed technology is based on the reusability (with no modification) of the business code of the components, and the complete generation by automatic tools of the

122 114 XI Jornadas de Tiempo Real (JTR2008) code that adapts the component to the execution environment. This code is generated according to the reference model shown in Figure 2. It takes the LwCCM framework as a starting point, and adds to it the features required to control the real-time behaviour of the application execution. Each of the elements that take part in the execution environment are explained below. Component: A component is a reusable software module that offers a well-defined business functionality. This functionality is specified through the set of services that the component offers to other components, grouped in ports called facets, and the set of services it requires from other components, grouped in ports called receptacles. With the purpose of having complete control of the threading and scheduling characteristics of an application, and in the look for being able to analyze it, components in our technology are passive. The operations they offer through their facets are made up of passive code that can call protected objects. But this does not mean that there can not be active components in the framework, concurrency is provided by means of activation ports. When a component requires a thread for implementing its functionality, it declares a port that implements one of the two special interfaces defined in the framework: OneShotActivation or PeriodicActivation. These two kinds of ports are recognized by the environment, which creates and activates the corresponding threads. The interface OneShotActivation declares a run() procedure, which will be executed once by the created thread after the component is instantiated, connected and configured. The interface PeriodicActivation declares an update() procedure, which will be invoked periodically. A component can declare several activation ports, each of them representing an independent unit of concurrency managed by the component, and which are independent of the business invocations. Activation ports are declared in the component specification (in the IDL file), and all the elements required for their execution are created by the code generation tool. Their configuration parameters, which includes the scheduling parameters of the threads as well as the activation period (in case of PeriodicActivation ports) are assigned for each component instance in the deployment plan. Adapter: It represents the part of the components code which provides the run-time support for the business code. All the platform related aspects are included in the adapter. Its code is automatically generated according to the component/container model. With this programming approach the component developer does not need to know any detail about the underlying technology, he is only in charge of business code development. Connector: It represents the mechanism through which a component communicates Client Component (bussiness code) Receptacle Connector instance Proxy fragment Servant fragment Facet Interceptor Server Component (business code) Client Adapter Scheduling Attribute Service Environment services Server Adapter Fig. 2. Reference model of the technology Activation port Execution environment

123 4. Sistemas Distribuidos 115 with another component connected to it by a port. In our technology, a connector has the same structure as a component, but its business code is also generated by the deployment tool, based on: The interface of the connected ports. The connectors are generated from a set of templates which are adapted so that they implement the operations of the required interface. The location of the components (local vs remote), and the type of invocation (synchronous or asynchronous). Combinations among these different characteristics lead to different types of connectors: - For local and synchronous invocations the connector is not necessary, the client component invokes the operation directly on the server. - For local and asynchronous invocations the connector requires an additional thread to execute the operation (through activation ports). - If the invocation is distributed, the connector is divided in two fragments: the proxy fragment, which is instantiated in the client node, and the servant fragment, which is instantiated in the server node. The communication between the two fragments is achieved by means of the communication service selected for the connection. In this case, the connector can also implement synchronous or asynchronous invocations, including the required mechanisms in the proxy fragment. The communication service or middleware used for the connection and its corresponding configuration parameters, which are assigned for each connection between ports in the deployment plan. Interceptors: The concept of interception is taken from QoSforCCM [17]. It brings a way to support the management of non-functional features of the application. An interceptor allows to incorporate calls to the environment services inside the sequence of an invocation by executing certain actions before and after the operation is executed on the component. The support for interceptors is introduced in the adapter, so it is hidden to the component developer. Their introduction is optional for each operation, and it is specified in the deployment plan. In our technology they are used to control the scheduling parameters with which each received invocation is executed. Based on the configuration parameters assigned to it in the deployment plan, each interceptor knows the scheduling parameter which corresponds to the current invocation, and uses the SchedulingParameterService to modify it in the invoking thread. With this strategy, the following scheduling parameters assignment schemes can be implemented: Client propagated: The scheduling parameters are those of the client that makes the invocation. Server declared: The scheduling parameters are defined in the server component and they are the same for all the received invocations. Transaction controlled: The scheduling parameters of an invocation depends on the transaction[16] and the particular step inside the transaction in which the invocation takes place. This scheme enables better schedulability results since it

124 116 XI Jornadas de Tiempo Real (JTR2008) allows to impose at run-time scheduling parameters that may be different for each invocation in the context of an end-to-end flow [18]. The values of these parameters are obtained from the analysis using holistic priority assignment tools. SchedulingParameterService: It is an internal environment service which is invoked by the interceptors to change the scheduling parameters of the invoking thread. The kind of scheduling parameters that will be effectively used depends strongly on the execution platform, it may be a single priority, deadline, or the contract to use in the case of a FRESCOR flexible scheduling platform. 4 Architecture of a component implementation There are two complementary aspects that a component implementation must address: The component has to implement the functionality that it offers through its facets, making use of its own business logic and the services of other components. The implementation must include the resources necessary to instantiate, connect and execute the component in the corresponding platform. This aspect is addresed by implementing the appropriate interfaces which allow to manage the component in an standard way. In our case, those defined by LwCCM. Each aspect requires knowledge about different domains. For the first aspect, an expert on the application domain corresponding to the component functionality is required. For the second, however, what it is required is an expert on the corresponding component technology. The proposed architecture for a component implementation tries to find an structural pattern to achieve independency of the Ada packages that implement each aspect. Besides, packages that implement the technology related aspects are to be automatically generated according to the component specification. With this approach, the component developer only has to design the business code of the component. The proposed architecture is based on the reference one proposed by LwCCM, but adapted for: Making use of the abstraction, security and predictabilitiy characteristics of Ada. Including the capacity for controlling threading characteristics of the components. Facilitating the automatic generation of code taking the IDL3 specification of the component as input and generating the set of classes that represent a component in the technology. Providing a well-defined frame in which the component developer designs and writes the business code. In the proposed technology, the architecture of a component is significantly simplified as a consequence of the usage of connectors. When two connected components are installed in different nodes, the client component only interacts locally with the proxy fragment of the connector, while the server component only interacts locally with the servant fragment of the connector. Therefore, all the interactions between components are local, since it is the connector who hides the communications mechanisms used for the interaction.

125 4. Sistemas Distribuidos 117 <interface> ServerContainerInterceptorRegistration register_server_interceptor() <interface> ClientContainerInterceptorRegistration register_client_interceptor() <interface> CCMObject provide_facet() connect() ComponentX_Wrapper <interface> CCM_ComponentX_context get_connection_the_portu() 1 thecontext ComponentX_context <interface> InterfaceA <interface> ServerInterceptor receive_request() thehome theporta_facet 1..n 1 theexecutor 1 wrapper_interfacea delegated:ccm_interfacea 0..n interceptor_for_opera <interface> CCM_ComponentX <interface> CCMHome create_componete() Fig. 3. Example of Component Wrapper Structure for ComponentX For each component, four Ada packages are generated. Three of them are completely generated by the tool, while the last package leaves the blank spaces in which the component developer should include the business code of the component. The first module represents the adapter (or container) of the component. It includes the set of resources that adapt the business code of the component to the platform, following the interaction rules imposed by the technology. It defines three classes: The wrapper class of the component, called {ComponentName}_Wrapper, which represents the most external class of the component. It offers the equivalent interface of the component, which LwCCM establishes as the only interface that can be used by clients or by the deployment tool to access to the component. For that, the class implements the CCMObject interface, which, among others, offers operations to access to the component facets, or to connect the corresponding server components to the receptacles. Besides, the capacity to incorporate interceptors is achieved by implementing the Client/ServerContainerInterceptorRegistration interfaces, a modified version of the interfaces with the same name defined in QoSCCM [17]. As it is shown in Figure 3, this class is a container which aggregates or references all the elements that form the component: - The component context, through which components access to their receptacles. - The home, which represents the factory used to create the component instance. - The executor of the component, which represents its real business code implementation. Its structure is explained below. - An instance of a facet wrapper class that is aggregated for each facet of the component. They capture the invocations received in the component and transfer them to the corresponding facet implementations, which are defined in the executor. The facet wrappers are the place in which the interceptors for managing non-functional features are included. The class that represents the context implementation, called {Component- Name}_Context. It includes all the information and resources required by the component to access to the components which are connected to its receptacles.

126 118 XI Jornadas de Tiempo Real (JTR2008) The {ComponentName}_Home_Wrapper, which implements the equivalent interface of the home of the component. It includes the class procedures (static) that are used as factories for component instantiation. The rest of generated Ada packages contain the classes that represent the implementation of the business code of the component (the executor). The LwCCM standard fixes a set of rules that define the programming model to follow in order to develop a component implementation. Taking the IDL3 specification of a component, CCM defines a set of abstract classes and interfaces which have to be implemented, either automatically or by the user, to develop the functionality of the component. This set of root classes and interfaces are grouped in the generated package {ComponentName}_Exec. The {ComponentName}_Exec_Impl package includes the concrete classes for the component implementation. They inherit from the classes defined in the previous package. The class that represents the component implementation, {Component- Name}_Exec_Impl, which is shown in Figure 4, has two attributes: A reference to the component context. It is set by the environment through the set_session_context() operation, and it is used to access to the receptacles. An aggregated object, of the {ComponentName}_Impl class, whose skeleton is generated by the tool and has to be completed by the developer. The {ComponentName}_Impl class, represented in Figure 4, is defined in a new package, in order to hide the environment internals to the code developer. It represents the reference frame in which the developer introduces the business code. Relevant elements of that class are: For each facet offered by the component, a facet implementation object is aggregated. Each activation port defined in the specification of the component, represents a thread that is required by the component to implement its functionality. For implementing those threads two kinds of Ada task types are defined. The OneShotActivationTask executes once the corresponding run() procedure of the port, while the PeriodicActivationTask executes periodically the update() procedure of the corresponding port. Both types of task receive as a discriminant during its instantiation, a reference to the data structure that qualify their <<Interface>> CCM_ComponentX set_session_context() get_theporta() set_attribute1() <<Interface>> Interface_A Co mp onen tx_ Impl theporta_facet 1 theporta_port th eosaport OneShotActivationTask ComponentX_Exec_Impl theimpl block 1 OneShotActivationBlock 1 1 -thepaport PeriodicActivationTask thecontext <<Interface>> CCM_ComponentX_Context thestate 1 ComponentX_State theportu : Interface_U theosaport : OneShotActivationBlock thepaport : PeriodicActivationBlock attribute1 : Attr_Type_1... thestate 1 block PeriodicActivationBlock Fig. 4. Example of Component Implementation Structure for ComponentX

127 4. Sistemas Distribuidos 119 execution, including scheduling parameters, period, state of the component, etc. These threads are activated and terminated by the environment by means of standard procedures that LwCCM includes in the CCMObject interface to control the lifecycle of the component. All the implementation elements (facet implementations, activation tasks, etc...) operate according to the state of the component, which is unique for each instance. Based on that, the state has been implemented as an independent aggregated class, which can be accessed by the rest of the elements, avoiding cyclic dependencies. Most of the code of this class is generated automatically, the component developer only has to write the body of the activation port procedures (run and update), and the set of operations offered by each of the facets implementations. The structure generated is exactly the same, but the business code, which in the case of a connector consists in the code required to implement remote invocations, is also generated automatically by the deployment tool. The code generation tools follows a very simple scheme. It is based on a set of parameterized source code templates that represent all the possible code blocks which can appear in any of the four packages to which a component specification is mapped. The parameters represent the identifiers or options inside the blocks which depend on the implemented component. The tool generates the whole code by inserting consecutive blocks according to the elements defined in the specification of the component (ports, operations, etc). For each element, its corresponding identifiers and qualifiers are extracted, and they are used to replace the corresponding parameters in the code block. The current available Ada mapping for IDL [19] is based in Ada95, so for the development of the code generation tool, it has been necessary to define new mappings for some IDL types in order to get benefit of the new concepts introduced in Ada The main change concerns to the usage of interfaces. The old mapping for the IDL interface type leads to a complex Ada structure while now can be directly mapped to an Ada interface. Besides, some data structures defined in IDL, as for example the sequence type, can be implemented now with the new Ada 2005 containers. 5 Practical experience At the time of the first attempts made to validate the proposed technology, there was no real-time operating system with support for Ada 2005 applications, so the tests were run on a Linux platform, using the GNAT (GAP) 2007 compiler. The construction of the connectors for the communication between remote components, was made using the native Ada Distributed System Annex (DSA), Annex E of the Ada specification. The implementation of DSA used was GLADE [20]. Distributed test applications were developed and executed successfully. The platforms used in this evaluation were sufficient for the conceptual validation of the technology, since from the point of view of the software architecture the final code is equivalent, but of course, it is not appropriate for the validation of the timing properties of real-time applications.

128 120 XI Jornadas de Tiempo Real (JTR2008) The recently released new version of MaRTE_OS [21] provides now support for the execution of Ada 2005 applications, and allows to test the technology over a hard realtime environment. Still there is a lack for a real-time communication middleware. An enhanced version of GLADE that enables messages priority assignment exists for MaRTE_OS & GNAT [22], but it has not been ported to the new versions. To overcome this limitation, we have developed simpler connectors using a link layer real-time protocol. Our first tests on a real-time platform have been done with connectors that use directly the RT-EP [23] protocol for the communication between remote components. The same application tested in the linux platform was used in MaRTE_OS, and as expectable, the code of the components did not require any modification, the only necessary change was the development of the new connectors suitable for the new communication service (RT-EP) used. 6 Conclusion and future work This paper proposes a model based technology for the development of real-time component-based applications. The usage of the Ada language for its implementation, makes it particularly suitable for applications that run in embedded nodes with limited resources and strict timing requirements. The technology is based in the D&C and LwCCM standard specifications, which have been extended in order to support the development of applications with a predictable and analyzable behaviour. The key features of this technology have been specified and tested successfully. Nevertheless some challenges arise for this community to face. The most rewarding of them is the availability of an Ada native communication middleware, here used in the development of connectors, which must hold predictable behaviour, and allow a priority assignment for the messages based on the transactional (or socalled end-2-end flow) model. Our aim is to develop the connectors using the Ada Distributed System Annex so that applications rely only on the Ada run-time infrastructure with no additional middleware, which is highly desirable to target small embedded systems. As future work, some more tests have to be applied in order to quantify the concrete overheads introduced by the technology. A planned enhancement for the technology is the construction of a graphical environment to integrate all the stages of development of an application: design, code generation, analysis, and finally, execution. Another effort that has been started in the OMG and arise from this work is the elaboration of an updated version of the mapping from IDL to Ada 2005 [24]. References [1] IST project COMPARE : Component-based approach for real-time and embedded systems [2] IST project FRESCOR : Framework for Real-time Embedded Systems based on Contracts [3] OMG : "Lightweight Corba Component Model", ptc/ , November 2003 [4] T. Taft et al., editors. Ada 2005 Reference Manual. Int. Standard ISO/IEC 8652/1995(E) with Technical Corrigendum 1 and Amendment 1. LNCS 43-48, Springer-Verlag 2006.

129 4. Sistemas Distribuidos 121 [5] Aldea M. et al: FSF: A Real-Time Scheduling Architecture Framework Proc. of 12th RTAS Conference, San Jose (USA), April 2006 [6] OMG : "Deployment and Configuration of Component-Based Distributed Applications Specification", version 4.0, Formal/ , April 2006 [7] A. Möller,M. Åkerholm, J. Fredriksson y M. Nolin. Evaluation of Component Technologies with Respect to Industrial Requirements Proc. of 30th Euromicro Conference on Software Engineering and Advanced Applications, August [8] R. Ommering, F. Linden, J. Kramer: "The koala component model for con-sumer electronics software". In: IEEE Computer, IEEE (2000) [9] Lundbäck K-L., Lundbäck J., Lindberg M.: "Component based development of dependable real-time applications" Arcticus Systems, [10] Bondarev E., de With P., Chaudron M. Predicting Real-Time Properties of Component- Based Applications Proc. of 10th RTCSA Conference, Goteborg, August [11] Bondarev E. et al. CARAT: a toolkit for design and performance analysis of componentbased embedded systems Proc. of DATE 2007 Conference, April [12] M. Åkerholm et al. The SAVE approach to component-based development of vehicular systems Journal of Systems and Software, Vol. 80, 5, May 2007 [13] Ke X., Sierszecki K. and Angelov C. COMDES-II: A Component-Based Framework for Generative Development of Distributed Real-Time Control Systems Proc. of 13th RTCSA Conference, August 2007 [14] K. C. Wallnau. Volume III: A Technology for Predictable Assembly from Certifiable Components, Technical report, Software Engineering Institute, Carnegie Mellon University, April 2003, Pittsburgh, USA. [15] P. López, J.M. Drake, and J.L. Medina: "Real-Time Modelling of Distributed Component- Based Applications" Proc. of 32h Euromicro Conference on Software Engineering and Advanced Applications, Croacia, August [16] M. González Harbour, J.J. Gutiérrez, J.C.Palencia and J.M.Drake: "MAST: Modeling and Analysis Suite for Real-Time Applications" Proc. of the Euromicro Conference on Real- Time Systems, June [17] OMG: Quality of Service for CORBA Components, ptc/ April 2006 [18] OMG: A UML Profile for MARTE ptc/ , August 2007 [19] OMG: Ada Language Mapping Specification - Version 1.2. October [20] L. Pautet and S. Tardieu. GLADE: a Framework for Building Large Object-Oriented Real-Time Distributed Systems. Proc. of the 3rd IEEE Intl. Symposium on Object- Oriented Real-Time Distributed Computing, Newport Beach, USA, March [21] M. Aldea and M. González. MaRTE OS: An Ada Kernel for Real-Time Embedded Applications. Proc. of the International Conference on Reliable Software Technologies, Ada-Europe 2001, Leuven, Belgium, Springer LNCS 2043, May [22] J.M.Martínez and M. González. RT-EP: A Fixed-Priority Real Time Communication Protocol over Standard Ethernet Proc. of the 10th Int. Conference on Reliable Software Technologies, Ada-Europe 2005, York(UK), June 2005 [23] López-Campos, J.-J. Gutiérrez and M. González-Harbour: "The chance for Ada to support distribution and real-time in embedded systems" Procc. of the 8th Intl. Conference on Reliable Software Technologies, Ada-Europe 2004, Palma de Mallorca, Spain, June [24] J.Medina. Status report of the Ada2005 expected impact on the IDL to Ada Mapping. OMG documents mars/ and mars/

130 122 XI Jornadas de Tiempo Real (JTR2008) An Architecture to Support Dynamic Service Composition in Distributed Real-Time Systems Iria Estévez-Ayres Dpto. Ing. Telemática Univ. Carlos III de Madrid Leganés, Madrid, Spain Luís Almeida DET / IEETA-LSE Universidade de Aveiro Aveiro, Portugal lda@det.ua.pt Marisol García-Valls, Pablo Basanta-Val Dpto. Ing. Telemática Universidad Carlos III de Madrid Leganés, Madrid, Spain {mvalls, pbasanta}@it.uc3m.es Abstract Recently, new trends in application development for distributed platforms, such as the composable services model, attempt to provide more flexibility in system design, deployment and execution. Such trends, and particularly the referred composable services model, can also be beneficial in real time distributed embedded systems, also providing a means to support more adaptive behaviors, reacting to the execution environment or coping with system reconfiguration. This paper explores a relatively new direction, which is the extension of the service-based model to dynamic, i.e. at run time, composition in real-time distributed environments, in order to support the level of flexibility and adaptibility referred above. The paper proposes an architecture to support such dynamic service composition that is based on the Flexible Time Triggered communication paradigm (FTT).To achieve the desired goal, we also redefine the concepts of service and service based application in the concept of the FTT paradigm. Finally, we show experimental results obtained with a prototype implementation of the proposed architecture that confirm its feasibility and good temporal behavior. 1 Introduction In the last few years, distributed software systems have become more dynamic allowing transparent distribution, self-reconfiguration, portability and migration of applications, etc. As a consequence, new application development paradigms emerged such as those based on the usage of multiple services dispersed in the environment [22]. These new paradigms, used in conjunction with the composable service model, provide more flexibility in the aplication development and execution. Instead of monolithic applications resident in one single node, it is possible to create applications dynamically from existing services, possibly remote, enhancing the reuse of code and decreasing the development time. The composable service model can also be advantageously used in the development of distributed real time embedded systems, identifying services with clear interfaces that are executed according to a specific sequence. The services available, possibly with multiple versions, can be shared among different applications and can change online providing a high level of adaptability to varying operational conditions, enhancing the efficiency in the use of system resources [19]. Examples include networked control systems, where different controllers and filters can be seen as services that can be shared and updated on-line, and also distributed multimedia systems, in which the different services correspond to filters and encoders/decoders possibly with different levels of Quality of Service(QoS). The dynamic composition of services in this kind of applications can be interesting not only from the applications development point of view, by means of automatic composition of software components, but mainly as a way to support on line updating and reconfiguration of services, for example to provide dynamic QoS management, load balancing and fault tolerance as explained next: Dynamic software and hardware updating. Whenever a new version of a service appears in the system, it will be analyzed to check whether the performance of the respective application using this new version can be improved, according to a pre-specified metric. Depending on such analysis, the system might switch service versions and recompose the applications that use them. Fault Tolerance: Different service implementations are used as backups to assure the survival of the system if one of the nodes involved falls down. In this situation, upon a service failure, a failure detector requests

131 4. Sistemas Distribuidos 123 the removal of the failed service, causing the applications recomposition with another available version of that service. Load balancing: Whenever the performance of an application degrades because excessive load in a given node is causing poor performance to one of its services, then the system can look for other versions of that service residing in different nodes and currently exhibiting better performance and invoke the recomposition of the application to exploit those alternatives. This allows maximizing application performance at each moment. Worst-case response time analysis can be used to deduce the current performance level of a given service as a function of the load in the respective node. As a practical example consider a vision system for textile inspection based in a parallel computer cluster[3]. Each acquisition node delivers camera frames to one among n computing nodes. However, acquisition nodes are sometimes idle, for example when the fabric roll is changed, and thus, the load of the computing nodes varies. Using a compositional framework as referred above we can consider the processing of each vision data stream as an application and distribute the load of the computing nodes on line to enhance the performance of the whole system. Another example concerns high availability systems, such as those involved in energy transportation [6]. A fault tolerance solution using the dynamic service composition referred above can provide the required flexibility, reusability and portability, at a reasonable cost, instead of an ad hoc solution at application level that lacks such attributes and would be more expensive. Also, solutions at hardware and OS level, only, do not solve all dependability problems of those applications as indicated by the end-to-end argument [21]. However, in order to be effective for distributed real-time embedded systems, the proposed compositional framework must allow for on line addition/removal of services and nodes, transparent code mobility, scalability and continued timeliness. This combination of attributes is not trivial to achieve and requires an adequate arquitectural support. Previous work [7, 9] focused on the definition of a framework that allowed performing the off-line composition of static real-time applications from existing downloadable services but this framework did not deal with dynamic distributed real-time systems or with the possibility of using remote services. In this paper we define a model and propose an architecture that integrate techniques inspired by emerging models of distributed computing for on-line composition of real-time services, offering the possibility of dynamic reconfiguration of software and hardware. To build this architecture we use the Flexible Time Triggered (FTT) communication paradigm [17] that provides the operational flexibility and the timeliness required to support dynamic composition of real-time systems based on services allocated in nodes connected to a real-time network. The rest of the paper is organised as follows: Section 2 briefly introduces the background of this work; Section 3 presents the underlying application model; Section 4 describes the proposed approach; Section 5 presents preliminary experimental results; and, finally, Section 6 outlines the main conclusions. 2 Related Work The component service model has been recently applied to distributed environments such as Ubiquitous Computing and Peer-to-Peer, to develop generic frameworks that allow providing end to end application QoS via QoS aware middleware systems [16, 10, 11, 13]. However, these systems are not suitable for composing real time applications since the specified QoS does not take into account the underlying hardware and the scheduling requirements of the whole system. Integration of QoS characteristics into component technology has also been studied in [5]. However, these approaches aim at a rather static composition environment and enterprise applications based on components and not strictly on services. In the real time world, component based software development (CBSD) [4, 12] is an emerging development paradigm that enables the transition from monolithic to open and flexible systems. This allows systems to be assembled from a pre defined set of components explicitly developed for multiple usages. However, none of these approaches can be directly applied to dynamic composition of real time components (or more generally, applications) in a distributed environment because they are focused on the design phase rather than on the execution phase. Within the context of reconfigurable real time systems Tesanovic s work [23] on reusable software using aspects and components defines a WCET based composition model for systems that execute in a single node, thus not being suitable for distributed applications. Wang s work [24], on the other hand, studies the use of aspects and components in a COTS solution, CIAO, the Component Integrated ACE ORB. This solution is too heavy to be applied to distributed embedded systems based on simple microcontrollers. Another line of work that bears some relationship with the scope of this paper is the use of multiple modes. In a way, multiple modes also correspond to dynamic reconfigurations. Traditionally, real time systems are assumed to be formed by a fixed set of tasks that are scheduled to fulfil the application requirements and exhibit a desired behavior. Applications needing to exhibit more than one behaviour are known as multi moded applications [20]. These appli- 2

132 124 XI Jornadas de Tiempo Real (JTR2008) cations consist in a fixed number of operating modes, each one producing a different behaviour, implemented by different task sets. The set of modes is established off line and thus, despite the dynamism introduced by allowing on line mode changes, the overall run time flexibility is still limited, particularly with respect to the level targeted by the architecture that we propose in this paper. In fact, we focus on run time flexible systems in which the composition of the applications is not established aprioriand can change during the system lifetime due to the arrival/departure of new versions of the services that compose the applications, and due to the possible composition of new applications with the existing services. More related work can be found in the field of Fault Tolerance, where active, passive and hybrid replication mechanisms have been developed [15]. The architecture proposed in this paper supports a passive replication approach in which backup services are activated upon the failure of active ones. A particular aspect is that previous approaches seem to have focused on using similar service or component instances, replicas, for backup elements. However, in our case the backups do not need to be similar and can, inclusively, present different levels of performance or QoS. When necessary, the system should use the best backup service available to compose each application. This may support more efficient and thus less expensive fault-tolerant replication mechanisms. N-version software [1] is a particular approach that targets high coverage of software faults. This approach also uses several different versions of each software entity but it is normally associated to active replication and voting mechanisms that are capable of handling broader fault models but are substantially more expensive and less flexible than the approach proposed in this paper. In the remainder of the paper we propose and describe an architecture that explores the FTT paradigm [17] to support dynamic service composition in distributed environments. The composition is driven by specific performance (QoS) metrics defined apriorithus being QoS aware. The FTT paradigm is particularly suited to support this model because it concentrates in a single node, known as FTT master, the temporal properties of all message streams and tasks currently running in the system, and distributes schedules for the remaining nodes generated at run time. Thus, changes in the system configuration are readily and synchronously applied to the whole system. Moreover, most of the system management overhead is concentrated on one node, only, i.e., the FTT master, which can be replicated for fault tolerance porposes [8], while the remaining nodes can be relatively simple. This paradigm has been implemented over Controller Area Network (CAN) [8] using simple 8- bit microcontrollers and over Etheernet, both in shared [18] and switched [14] modes. However, the FTT paradigm by itself does not provide the notions of service and application, nor the means for service composition. This extension is the main contribution of this paper. 3 Application model We consider a distributed system executing recurrent applications that span over several nodes. Each application is composed of a set of services possibly residing in different nodes and invoked in a given sequence. Services are materialized in tasks, each with worst case computation time Ci t and relative deadline Dt i. Given the recurrent nature of the applications, tasks will also be recurrent with period Ti t. Global synchronization is available thus supporting use of a release offset Oi t. Tasks communicate with each other across the system exchanging messages which are read in the beginning of their execution and generated at the end, in every periodic instance. We classify tasks in four groups according to the way they interact with one another. A task needing data from other tasks is a consumer; a task that generates data for other tasks is a producer; a task can also be both simultaneously, i.e., a consumer/producer; tasks that do not interact with others are called stand alone. The tasks in the first three groups are generically called interactive and their interactions are supported by message passing with multicast, i.e. several consumers may receive a message sent by one producer. There is no control flow between tasks and messages. Their interface is based on data buffers which are read/written by the tasks. Messages are then considered independent entities, triggered autonomously by the network, each with worst-case transmission time Ci m, relative deadline Di m,periodt i m and release offset Oi m. Hence, an application corresponds to an ordered graph of services (tasks) and the respective interactions (messages) such as represented in fig. 1(a), which is executed periodically. A corresponding transaction (or data stream as in [2]) is then defined as the sequence of task executions and message transmissions between the triggering of the first producer and termination of the last consumer involved in an application. Typically, all tasks and messages in a transaction share the same period but not necessarily, such as in multirate control systems in which inner loops execute at rates that are integer multiples of a lower rate at which an outer loop executes. If we have multiple implementations (versions) of a task on different nodes, e.g. tasks 2A and 2B in fig. 1(b), the system must decide, according to an appropriate parameter, e.g., the worst case execution time (WCET), worst case response time (WCRT) or another QoS metric, which service version will be used to compose the application, using a composer engine. However, for transparent composition we need a mechanism to hide the existence of different imple- 3

133 4. Sistemas Distribuidos 125 msg1 2 4 msg2 msg msg3 msg5 (a) Distributed application abstraction 1 2A 1 msg1 msg1 2A 2B msg2b msg2a (b) Different versions of service 2 2B 3C 3B msg1 msg2 msg3 2 3A carried out by data buffers, as considered in the application model above, without explicit control signals. Beyond the tasks and messages scheduled by the master, the FTT paradigm also considers the existence of a specific window inside each EC to support the transfer of non-periodic messages, called asynchronous, e.g., related to management actions, signalling and logging (CM and NRTM). These, however, to do not interfere with the scheduled, i.e., synchronous, messages (SM). (c) Hidding the different versions EC(i) EC(i+1) Figure 1. Application model abstraction TM synchronous window asynch window SM2 SM5 SM8 SM9 CM3 NRTM4 TM SM4 SM11 SM9 CM1 NRTM2 NRTM7 mentations of the same service and make it transparent to the other tasks that interact with that service (fig. 1(c)). Such mechanism is basically a system manager that controls the execution of tasks and transmission of messages across the distributed system, including the respective multiple versions, thus inherently playing the role of composer. To implement this mechanism we use the FTT paradigm that already includes a system manager to control task executions and message transmissions by means of specific trigger messages, and we add the necessary structures to support the composition, namely to describe the applications and their associated services as well as all the service versions available and their properties. 4 Applying the model to FTT 4.1 FTT paradigm principles The FTT paradigm [17] uses an asymmetric architecture, with one master node controlling multiple slave nodes. The master holds the system requirements (communications and computations) and executes the scheduling policies and on line admission control. Since the requirements and operational data and mechanisms are located in just one node, this architecture supports fast and atomic changes to the system configuration. Master redundancy is used to tolerate master crashes. The scheduling of the tasks and messages is carried out on line in the master, for a time window called Elementary Cycle (EC). At the begining of each EC, the master distributes across the system the schedule for that EC, called the EC-schedule, using trigger messages (TMs), fig. 2. The TM (see structure in table 1) is the only control message sent per EC and it may trigger the transmission of several slave messages, called synchronous messages, and tasks executions. These messages use indirect source addressing with unique identifiers assigned to all sources. Moreover, the interface between tasks and synchronous messages is Figure 2. FTT Elementary Cycle Some important features of the FTT paradigm are: Slave nodes are not aware of the particular scheduling policy and the master can change it on the fly autonomously or on demand; Slave nodes are synchronized upon the reception of the TM; All dispatching information per EC is conveyed in the TM; The master holds all current timing requirements of the system; Any changes to such requirements are subject to on line admission control to filter out changes that could jeopardize the system schedulability; The activation periods, the deadlines and the offsets are integer multiples of the EC duration. 4.2 Temporal model of an application In the FTT paradigm the transmission of messages and the execution of tasks can be triggered independently of each other by the central scheduler. Alternatively, the scheduler may trigger just the message transmissions while the associated tasks are triggered by the reception of these, using callback functions. In any case, the application transactions are triggered by the central scheduler, which resides in a network component, and thus we call this a network centric approach [2, 14]. The network centric approach requires the definition of appropriate message offsets to achieve low transaction endto-end delay. In the example shown in fig. 3, messages m 1 and m 2 are scheduled by the central scheduler and trigger tasks τ 2 and τ 3 respectively. Task τ 1 is triggered by the central scheduler with offset 0 and represents the start of the respective transaction. The offset O2 m must be larger but as close as possible to O1 m + W 1 m + W 2 t where W 1 m and W2 t are the worst-case response times of message m 1 and task τ 2, respectively, considering all possible interferences. If a new implementation of task τ 2 becomes available with a shorter C2, t thus shorter W2, t the central manager can reduce O2 m accordingly, thus decreasing the transaction endto-end delay. The worst-case response time analysis must 4

134 126 XI Jornadas de Tiempo Real (JTR2008) Type TM Flags Num. synch. ID Time... TM type Master ID Reserv. Num Sec. msgs Tx 2Bytes 2Bytes 2Bytes 2Bytes 1Byte... [b 15,b 12 ] [b 11,b 0 ] No def [b 7,b 0 ] [b 15,b 0 ] [b 15,b 0 ] [b 7,b 0 ]... TM MESG ID [0, 4096) No def [0, 256) [0, 65536) [0, 65536) [0, 256)... Table 1. Original structure of the Trigger Message Producer Producer Consumer Consumer msg1 τ 1 τ 2 τ 3 end to end delay msg2 Application Service Nodes DB Lookup service App Admission Control Producer ( τ 1 ) t=0 produce message 1 W t W t m O 1 t T 1 Composer addmessage(id) delmessage(id) Producer ( τ 2 ) Consumer t=0 W t m O 2 produce message 2 Application Service Nodes DB App Interface Admission Control FTT Ethernet Consumer ( τ 3 ) t=0 Network t=0 W m W t W m Scheduler EC Scheduler Register Ethernet Dispatcher Figure 3. A transaction and its timeline be executed on line whenever a service is added/removed from a node. The application composer must then deduce the appropriate offsets, for example, using the techniques proposed in [2]. Notice, however, that either the response time analysis and the generation of the required offsets are beyond the scope of this work. 4.3 Proposed architecture The proposed system architecture is an extension to the FTT architecture, which simplifies substantially the overall design task by reusing and exploiting the operational flexibility features of the FTT paradigm. Moreover, it also allows an easy porting to different hardware platforms for which FTT protocol implementations are available. Fig. 4 shows an implementation over FTT-Ethernet. Using other communication platforms, e.g., CAN, just requires replacing the FTT-Ethernet block by an FTT-CAN block, which interface is nevertheless similar. Basically, the internal architecture of the FTT master is complemented with an extra layer that handles applications and services. At the application level, the new layer keeps track of the applications being executed and the services that compose them while, at the service level, it manages the several versions of the available services, their location, worst-case execution/response times, possibly other QoS Figure 4. Proposed architecture parameters and the messages involved. All such information is stored in the Application-Service-Nodes Database. When a new service implementation arrives at the system it notifies the master using an asynchronous message (Lookup service), the master analyzes the respective worstcase response time as well as the performance and schedulability of the target application and of the whole system (Application Admission Control). If a given QoS parameter is improved, e.g., application end-to-end delay, and the schedulability of the system is not jeopardized, the master (Composer) makes the necessary changes at the application level to replace the old with the new service implementation within the respective application transaction. These changes are then passed down to the FTT layer as new messages/tasks to be added or replaced in the FTT System Requirements Database. In order to achieve the transparency referred in Section 3, the unique identifier of each message is broken into two parts, a service level identifier and an application level identifier. The former separates different versions of the same service while the latter identifies the service within an application. The consumers of a message will only check the application-level identifier, i.e., they listen to any service version indistinctively. On the other hand, the master and the producers will look at the whole identifier, which al- 5

135 4. Sistemas Distribuidos 127 lows scheduling one given service implementation among several available. { { QoS1 1 OUT:msg 01 2a Master { QoS2a 3 IN:msg X1 OUT:msg 12 TM(async,msg01,msg12) 2b (a) Configuration of nodes { { QoS1 1 OUT:msg 01 2a Master async msgs QoS1 1 OUT:msg 01 2a Master { QoS2a 3 IN:msg X1 OUT:msg 12 (b) New node attach { QoS2a 3 IN:msg X1 OUT:msg 12 QoS3 IN:msg X2 QoS3 IN:msg X2 { { { TM(async,msg01,msg22) async msgs QoS2b 2b IN:msg X1 OUT:msg 22 2b QoS3 IN:msg X2 { QoS2b IN:msg X1 OUT:msg 22 Consumer EPIC 266MHz Master Intel Mobile Pentium MMX 128MB RAM Intel Celeron 735MHz 128MB RAM Ethernet Switch 10Mb/s Intermediate 1 EPIC 266MHz Intel Mobile Pentium MMX 128MB RAM Producer Intel Celeron 735MHz Intermediate 2 EPIC 266MHz Intel Mobile Pentium MMX 128MB RAM Figure 6. Configuration of nodes 128MB RAM uses Switched Ethernet [14]. Then we defined an experimental setup (fig. 6) using computers running RT-Linux. The FTT EC is set to 1 millisecond. The system executes one synthetic application composed of three services similarly to what is shown in fig. 1(b). 5.1 Two similar implementations of a consumer/producer task (c) Reconfiguration Figure 5. Example of reconfiguration Producer Intermediate Consumer 1 Intermediate Consumer 2 Fig. 5(a) depicts an illustrative simplified scenario in which all nodes implement only one service and the application spans over three nodes. The message identifiers have two digits, the first one referring to the service-level identifier and the second one referring to the application-level identifier. At run time, the Trigger Message triggers the transmission of the synchronous messages 01 (version 0, service 1) and 12 (version 1, service 2). So, the nodes 1 and 2a that hold the scheduled service versions transmit their messages over the network in a broadcast fashion. These messages are received by nodes 2a and 3 respectively. The X in the service-level identifier means a don t care. When a new node joins to the network, it uses an asynchronous reserved message to notify all its information to the master 5(b). The master assigns an identifier to each new service version (2b in this case) and decides to replace the other version 2a. Thus, it starts scheduling message 22 (version 2, service 2) instead of 12, Fig. 5(c). The following service in the application, i.e., 3, continues receiving any message that has application identifier 2, either 12 or 22 and thus it will not be aware of the change. 5 Experimental Results To verify the practical feasibility of the proposed architecture, we implemented it over the FTT-SE protocol, which Packets/t e e e e e+08 Time Figure 7. Experiment 1: Message arrivals at the consumer node This experiment verifies the switching between two versions of a service that corresponds to a consumer/producer task. The versions reside in the nodes intermediate 1 and 2 and exhibit similar temporal parameters, namely O = 200EC, T = 1000EC, D = T. The master switches between those implementations every 10 seconds. The consumer node is actually receiving both the producer message and the consumer/producer messages and logging their reception instants (fig. 7). We can see that the switching between both service versions does not cause any significant extra delay. The end to end delay experimented by the whole application stays around 25.3 microseconds with a jitter of ±30 nanoseconds and rare peaks of 100 nanosec- 6

136 128 XI Jornadas de Tiempo Real (JTR2008) onds caused by the run time system, despite the switching between versions. Similarly, the interactivation delay of the consumer task stays steadily near 1 second, as expected, with again a jitter of ±30 nanoseconds. This experiment shows that the proposed architecture is able to switch between service implementations in different nodes, on-line, maintaining the same QoS, and with the remaining tasks not being aware of the switching. 5.2 Two implementations of a consumer/producer task with different QoS Figure 9. Experiment 2: End to end delay 1.4 Producer Intermediate Consumer 1 Intermediate Consumer Packets/t e e e e e+08 Tiempo Figure 8. Experiment 2: Message arrivals at the consumer node In this experiment we change a temporal parameter of one of the consumer/producer tasks. One has an offset of O2A m = 100EC for producing the second message, and the other has O2B m = 200EC for the same. This offset impacts directly on the QoS of the application, leading to different end to end delays. We can see (figures 8 and 9) how the end to end delay is modified from 150 to 250 microseconds every time the service implementation is switched, as expected. The interactivation delay (fig. 10) shows peaks of ±100 milliseconds when the switching occurs due to the difference between the two different offsets of the transmissions carried out by the different versions of the intermediate service. This experiment shows that when we switch between two different profiles, i.e., implementations, of a service, the consumer of the messages produced by such service will not notice such switching, except for an anticipation or delay of its activation, caused by the different timing parameters of the different profiles, which lead to a corresponding end to end delay modification. This result shows that the proposed architecture can support on line updating of code, even when its timing characteristics are changed. Figure 10. Experiment 2: Interactivation delay 6 Conclusions and Future Work Next generation distributed real time systems will require more flexibility, and the ability of reacting and changing their behaviour at run time, i.e. adaptability. Service based approaches can be used in the development of such systems, in order to provide this desired flexibility by means of the definition of a model and an architecture that supportdynamic service composition, allowing the applications to change dynamically the set of services that compose them, supporting for each service the definition of different versions or profiles.these aproaches also offer the possibility of providing dynamic QoS management, load balancing and fault tolerance. This paper proposed an architecture to support dynamic service composition in a distributed embedded real time environment. This architecture allows an application to be composedof existing services dispersed in the network, and to dynamically switch between different versions of any given service in a transparent way with respect to service location, timing features or set of consumers of its messages. The architecture is built as an extension to the FTT paradigm. This extension required the introduction of the 7

137 4. Sistemas Distribuidos 129 notion of service and service based application in the FTT scope.also, a prototype of the proposed architecture was implemented as proof of concept, and the experimental results show that the architecture is able to switch between service implementations in different nodes, without affecting the remaining services in the application, and that it can support on line updating of code, even when its timing characteristics are changed. Future work will consider the integration of offset configuration tools and schedulability analysis that can be applied on-line. Acknowledgements This work has been partially funded by e-magerit (S- 0505/TIC/0251), project funded by the Education Council of the Goverment of the Region of Madrid, the European Social Fund, and the European Regional Development Fund; also by a grant from IEETA, Universidade de Aveiro, Portugal; and by the European Commission (ARTIST2 NoE, IST ). References [1] A. A. Avizienis. The Methodology of N-Version Programming. In C. Lyu, editor, Software Fault Tolerance, pages John Wiley & Sons Ltd, [2] M. Calha and J. Fonseca. Data streams an analysis of the interactions between real time tasks. In ETFA 2005, 10th IEEE Conference on Emerging Technologies for Factory Automation, pages , Catania, Italy, September [3] J. Cano, J. Perez-Cortes, J. Valiente, R. Paredes, and J. Arlandis. Textile Inspection with a Parallel Computer Cluster. In 5th International Conference on Quality Control By Artificial Vision. QCAV-2001, Le Creusot (France), [4] I. Crnkovic and M. Larsson. A case study: Demands on Component based Development. In Proc. of 22nd Int. Conf. of Software Engineering, Limerick (Ireland), June [5] M. A. de Miguel, J. Ruiz, and M. García-Valls. QoS Aware Component Frameworks. In Proc. of the International Workshop on Quality of Service, May [6] G. Deconinck, T. A. V. Vincenzo De Florio, and E. Verentziotis. The EFTOS Approach to Dependability in Embedded Supercomputing. IEEE Transactions on reliability, 51(1):76, March [7] I. Estevez-Ayres, M. Garcia-Valls, and P. Basanta-Val. Enabling WCET based Composition of Service based Real Time Applications. ACM SIGBED Review, 2(3):25 29, July [8] J. Ferreira, L. Almeida, J. A. Fonseca, P. Pedreiras, E. Martins, G. Rodriguez-Navas, J. Rigo, and J. Proenza. Combining operational flexibility and dependability in FTT-CAN. IEEE Transactions on Industrial Informatics, 2(2):95 102, May [9] M. García-Valls, I. Estévez-Ayres, P. Basanta-Val, and C. Delgado-Kloos. CoSeRT: A Framework for Composing Service Based Real Time Applications. In C. Bussler and A. Haller, editors, Business Process Management Workshops, volume LNCS 3812, pages Springer, [10] X. Gu and K. Nahrstedt. Distributed Multimedia Service Composition with Statistical QoS Assurances. IEEE Transactions on Multimedia, 8(1): , February [11] X. Gu and K. Nahrstedt. On Composing Stream Applications in Peer-to-Peer Environments. IEEE Transactions on Parallel and Distributed Systems, 17(8): , August [12] D. Isovic and C.. Norström. Components in Real time Systems. In Proc. of the 8th Conf. on Real Time Computing Systems and Applications, Tokyo, [13] J. Jin and K. Nahrstedt. A Distributed Approach for QoS Service Multicast with Geometric Location Awareness. IEEE Distributed Systems Online, 4(6), June [14] R. Marau, L. Almeida, and P. Pedreiras. Enhancing Real Time Communication over COTS Ethernet switches. In WFCS 2006, IEEE 6th Workshop on Factory Communication Systems, Turin, Italy, June [15] S. Mullender, editor. Distributed systems. Addison Wesley, 2nd edition, Chapters 7 and 8. [16] K.. Nahrstedt, D. Xu, D. Wichadakul, and B. Li. QoS Aware Middleware for Ubiquitous and Heterogeneous Environments. IEEE Communications Magazine, 39(2): , Nov [17] P. Pedreiras and L. Almeida. The Flexible Time-Triggered (FTT) Paradigm: An Approach to QoS Management in Distributed Real-Time Systems. In IPDPS 03: Proc. of the 17th International Symposium on Parallel and Distributed Processing, April [18] P. Pedreiras, P. Gai, L. Almeida, and G. Buttazzo. FTTethernet: a flexible real-time communication protocol that supports dynamic QoS management on Ethernet-based systems. IEEE Transactions on Industrial Informatics, 1(3): , August [19] D. Prasad, A. Burns, and M. Atkin. The Measurement and Usage of Utility in Adaptive Real Time Systems. Journal of Real Time Systems, 25(2/3): , [20] J. Real and A. Crespo. Mode Change Protocols for Real- Time Systems: A Survey and a New Proposal. Real Time Systems Journal, 26(2): , March [21] J. Saltzer, D. Reed, and D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, 2(4): , [22] M. Satyanarayanan. Pervasive computing: vision and challenges. IEEE Personal Communications, 8(4):10 17, August [23] A. Te sanovic, D. Nyström, J. Hansson, and C. Norström. Aspects and components in real time system development: Towards reconfigurable and reusable software. Journal of Embedded Computing, Feb [24] N. Wang, C. D. Gill, D. C. Schmidt, and V. Subramonian. Configuring Real Time Aspects in Component Middleware. In R. Meersman and Z. Tari, editors, CoopIS/DOA/ODBASE (2), volume LNCS 3291, pages Springer,

138

139 5. Herramientas de Desarrollo

140

141 5. Herramientas de Desarrollo 133 PLATAFORMA JAVA PARA EL DESARROLLO DE APLICACIONES EMPOTRADAS CON RESTRICCIONES TEMPORALES Jaime Viúdez, Juan A. Holgado Departamento de Lenguajes y Sistemas Informáticos Universidad de Granada {jviudez, jholgado}@ugr.es Resumen Java aporta indudables beneficios en el desarrollo y mantenimiento de aplicaciones, facilitando la implementación de programas flexibles, robustos, reutilizables y seguros. Además, la propiedad de portabilidad en teoría asegura su ejecución en diferentes plataformas sin necesidad de volver a recompilar los programas. Sin embargo, todas estas propiedades deseables tienen un precio que se paga en el rendimiento, en una falta de predictibilidad y en unas necesidades desmesuradas de memoria que no puede admitirse en aplicaciones que tienen que ejecutarse sobre entornos empotrados, con limitados recursos de procesamiento y memoria, y requisitos de tiempo real. En este trabajo se presenta el desarrollo de una nueva plataforma Java, que hemos denominado JAHASE (Java para el Control del Hardware de Sistemas Empotrados), que facilita el desarrollo de aplicaciones sobre una gama de microcontroladores y sistemas empotrados comerciales en los que se utiliza Java como principal lenguaje de programación. La plataforma desarrollada resuelve el problema de portabilidad presente en sistemas de baja escala, y añade nuevos mecanismos y funcionalidades, algunos de ellos inspirados en la especificación RTSJ, dotando al desarrollador de un conjunto de utilidades que facilitan su labor como, por ejemplo, la gestión unificada de buses digitales I2C, SPI y 1-Wire, o el modelo de eventos asíncrono. Dado que la plataforma JAHASE puede ser utilizada en distintas propuestas comerciales de entornos empotrados, su funcionamiento va a depender de las características concretas del tipo de empotrado y del entorno de ejecución. Por este motivo, JAHASE incorpora un componente con el que es posible implementar pruebas o experimentos para conocer de un modo más directo las prestaciones que ofrece el empotrado. 1 INTRODUCCION El desarrollo de dispositivos empotrados requiere de un alto nivel de integración entre sus componentes hardware y software para conseguir un producto de bajo costo por unidad, que permita su fabricación en serie [4]. El desarrollador debe seleccionar aquellos componentes hardware más apropiados de acuerdo con sus necesidades de confiabilidad, costo, rendimiento, consumo de energía, tiempo de vida, etc. de entre una gama de microprocesadores de 8, 16 o 32 bits, DSPs (procesadores digitales de señales), o ASICs (circuitos integrados específicos para una aplicación), así como los módulos de memoria, dispositivos de entrada y salida, contadores, etc. Por otra parte, debe integrar los componentes software más adecuados, para lo cual tiene que instalar un ejecutivo o un entorno de ejecución que facilite la posterior carga de la aplicación final, y elegir un lenguaje de programación flexible y seguro, que le permita implementar un programa optimizado según las necesidades concretas de la aplicación y las características de la plataforma hardware. La mayoría de las aplicaciones para entornos empotrados aún se construyen utilizando una mezcla de lenguajes de programación C y ensamblador. De este modo, los desarrolladores pueden implementar programas más simples, que logran un máximo rendimiento de acuerdo con las prestaciones del dispositivo, con un consumo mínimo de memoria y energía. Sin embargo, el uso de lenguajes procedimentales inseguros como C tiene un impacto negativo en el desarrollo de software, especialmente en el caso de aplicaciones complejas, que hace que el desarrollo del software se convierta en una tarea tediosa y propensa a errores, en el que es muy difícil su mantenimiento y reutilización [16]. Java es un lenguaje de programación moderno orientado a objetos, que ha sido aplicado exitosamente para el desarrollo de aplicaciones para sistemas de escritorio y empresariales, aunque recientemente está ganando la atención de los desarrolladores de sistemas empotrados [10][17-18][28]. Java proporciona la infraestructura y los mecanismos necesarios para facilitar el desarrollo de aplicaciones que sean robustas, flexibles, seguras y reutilizables que, en teoría, pueden ser ejecutados en cualquier plataforma hardware gracias a la propiedad de portabilidad. Por ello, Java puede ser una alternativa interesante para el diseño de sistemas

142 134 XI Jornadas de Tiempo Real (JTR2008) empotrados, ya que puede simplificar el diseño, desarrollo, prueba y mantenimiento de aplicaciones complejas. Sin embargo, todas estas características deseables para el desarrollo de aplicaciones requieren sacrificar otras propiedades que pueden ser muy importantes para una aplicación concreta que se ejecuta en un sistema empotrado con recursos limitados en memoria y procesamiento, como son el rendimiento, el determinismo temporal del sistema o el consumo de memoria. Se han adoptado varias posibles soluciones que solventan en parte algunas de estas dificultades, pero en ningún caso pueden considerarse como soluciones estándares [17-18, 20, 28]. Cada una de estas propuestas proporciona una plataforma Java que, siguiendo en parte algunas de las especificaciones estándares, imponen restricciones en la máquina virtual Java (JVM) donde se ejecutan las aplicaciones Java, y añade nuevos mecanismos y facilidades para el desarrollo de aplicaciones empotradas incluyendo para ello nuevas librerías propias generalmente no conformes con ninguna especificación estándar. Cuando comparamos algunas de estas propuestas entre sí nos podemos encontrar diferentes mecanismos o incluso modelos de programación diferentes para efectuar las mismas funciones, como ocurre con la gestión de los temporizadores. Las divergencias entre las distintas librerías pueden originar incompatibilidades entre las distintas soluciones empotradas a las que el desarrollador de aplicaciones debe estar atento. Es, por ello, que en este trabajo se propone el desarrollo de una nueva plataforma software JAHASE (JAva para el control del HArdware de Sistemas Empotrados) con tres objetivos: (a) facilitar la portabilidad de programas Java sobre sistemas empotrados aún teniendo JVM diferentes, (b) ofrecer nuevas posibilidades a los desarrolladores de empotrados como servicios de perro-guardían (watchdog), diferentes tipos de temporizadores, servicios de bitácora, acceso a memoria, comunicaciones, etc., y (c) proporcionar un banco de pruebas que facilite la medida de las prestaciones concretas del entorno empotrado utilizado. La plataforma consta de un conjunto de paquetes y clases que proporcionan un marco de trabajo uniforme, coherente y transparente para trabajar con los diferentes entornos empotrados, así como las facilidades que éstos ofrecen como el acceso a los puertos de entrada y salida digitales o analógicos, interfaces de bus (I2C, CAN, 1-Wire, SPI), temporizadores o manejadores de interrupciones. El desarrollo de esta plataforma software es un componente importante del middleware JOVIM (Java Open Virtual Instrument Middleware) [8], y es utilizada para el control de una maqueta domótica [31], así como parte del desarrollo de una arquitectura abierta para el control domótico [9]. 2 JAVA PARA SISTEMAS EMPOTRADOS El desarrollo de aplicaciones para entornos empotrados requiere conocer las características hardware del entorno empotrado, el entorno de ejecución utilizado, y el lenguaje de programación sobre el que se implementará la aplicación. Pese a los beneficios que ofrece Java como lenguaje de programación para el desarrollo de aplicaciones, existen algunas particularidades que pueden causar dificultades en el desarrollo de aplicaciones para sistemas empotrados. A continuación vamos a analizar algunas de éstas características, centrándonos en aspectos de Java que pueden comprometer el rendimiento y el cumplimiento de las restricciones temporales. 2.1 MODELOS DE EJECUCIÓN DE LA JVM La máquina virtual de Java (JVM, Java Virtual Machine) es el núcleo del paradigma Java, y determina el rendimiento de la ejecución de los programas Java. Para ejecutar programas en Java primero deben ser compilados en un formato binario portable (en la forma de código intermedio neutro, llamado Bytecode) contenidos en ficheros.class, para que luego los bytecodes contenidos en dichos ficheros sean ejecutados en la JVM. La JVM no sólo traduce los bytecodes a instrucciones nativas, sino que además verifica todos los bytecodes antes de su ejecución. Existen cuatro modelos de ejecución de la JVM: a) Interpretado: este es el modo clásico de ejecución, en el que la JVM interpreta bytecodes del fichero.class uno a la vez y ejecuta la operación o secuencia de operaciones nativas sobre la plataforma software del dispositivo (el entorno de ejecución del mismo). En este modo se satisface la característica de portabilidad de la filosofía Java, pero además se obtiene el peor rendimiento, por lo que este modelo no se emplea en la actualidad en la implementación de la JVM. b) Just-in-Time (JIT). El proceso de traducción de bytecodes a código nativo se realiza cada vez que un método se invoca o una clase se carga por primera vez. La secuencia correspondiente de código nativo se guarda en memoria RAM para una rápida ejecución la próxima vez que se llame al mismo bytecode. Este modo es más rápido que el anterior, pero requiere una cantidad importante de memoria RAM para guardar tanto el código Java como su correspondiente código nativo. Por ello, existen variaciones del algoritmo JIT como la compilación adaptativa dinámica (DAC) [6]. La mayoría de las

143 5. Herramientas de Desarrollo 135 máquinas virtuales comerciales están desarrolladas siguiendo este enfoque, como la HotSpot de SUN y J9 de IBM.. c) Ahead-of-Time (AOT) o compilación cruzada: los bytecodes Java, o incluso el propio código fuente Java se traducen directamente a formato binario de la plataforma nativa del dispositivo, utilizando un compilador cruzado. Este método es similar al utilizado por aplicaciones en C/C++ con librerías compartidas. El fichero ejecutable que se obtiene puede incluir el fichero objeto más todas las librerías precompiladas en un solo fichero, o bien las librerías precompiladas pueden ser enlazadas en tiempo de ejecución. En cualquier caso, sólo puede ser ejecutado por el entorno de ejecución específico para el que fue construido. Con este modo se logra un rendimiento superior con menos uso de RAM, pero la portabilidad, la independencia entre plataformas y la flexibilidad de Java se sacrifican. Este modelo es el que más frecuentemente se utiliza en el diseño de máquinas virtuales para empotrados, como, por ejemplo, JAMAICA que cumple la especificación RTSJ [23], LEJOS, JELATINE [1], etc. d) Procesador Java: este modo es el más rápido, ya que los bytecodes son interpretados y ejecutados directamente por el hardware del procesador, sin necesidad de traducción o de tan siquiera un sistema operativo subyacente. Existen dos formas de ejecutar bytecodes en hardware: con un procesador Java [7] o con un chip acelerador [29]. En la primera, se reemplaza al procesador de uso general por un procesador hardware específico capaz de ejecutar bytecodes de JVM como su conjunto de instrucciones nativo, con lo que sólo puede ejecutar bytecodes Java. En cambio, en la segunda se incorpora un coprocesador junto al microprocesador de propósito general, el cual se encarga de traducir los bytecodes Java en secuencias de instrucciones para el procesador de propósito general. Con este tipo, el mismo hardware pueda utilizarse para ejecutar código mixto. Existen varios procesadores Java con implementaciones diferentes de este enfoque, como el ajile JEM2, CJip, etc. 2.2 ESTANDARIZACION DEL LENGUAJE Y LAS LIBRERIAS El lenguaje Java está sustentado sobre dos pilares básicos: (a) la especificación del lenguaje y las librerías, y (b) la implementación del lenguaje. Con la especificación del lenguaje y las librerías se definen los elementos y estructura del lenguaje, estableciendo tanto su sintaxis como su semántica. La funcionalidad se obtiene a través de las librerías APIs que se organizan en un conjunto de paquetes. Ahora bien, con objeto de controlar el desarrollo y evolución del lenguaje y de las librerías, se ha regulado la estandarización de las especificaciones a través de la comunidad Java JCP (Java Community Process), en la cual grupos especializados de compañías y organizaciones académicas pueden desarrollar nuevas especificaciones de librerías (APIs) para campos de aplicación específicos y para las nuevas tecnologías que van surgiendo, como Bluetooth (JSR-82), tecnología inalámbrica (JSR- 185), programación en tiempo real (JSR-1), Java TV (JSR-927), etc. Por otra parte, la implementación del lenguaje conlleva la implementación de la máquina virtual JVM sobre la cual se van a ejecutar las aplicaciones Java, y además la implementación de las librerías que pueden incluir código nativo, cumpliendo la especificación estándar que satisfacen. En el mercado de los dispositivos empotrados Java, se pueden encontrar una gran gama de productos que satisfacen algunas de las siguientes especificaciones: Java Card. La primera versión de esta especificación fue introducida en 1996, y está orientada al desarrollo de aplicaciones para pequeñas tarjetas inteligentes con procesadores de 8/16/32 bits y fuertes restricciones de memoria, en el rango de 1KB de RAM y 16 KB de ROM. [12] PersonalJava [18]. Introducida en 1997, y compatible con la versión de JDK 1.1.8, ahora discontinuada, está orientada a dispositivos empotrados con 2,5 MB de ROM y un mínimo de 1MB de RAM. Actualmente esta especificación ha sido reemplazada por la de J2ME/CDC. Sin embargo, hay fabricantes que aún venden sistemas empotrados con librerías que satisfacen esta especificación como Tini [25]. EmbeddedJava. Introducida en 1998, es compatible con la versión discontinuada del JDK 1.1 orientada a dispositivos de gama baja con 512kB de ROM y 512 KB de RAM sin interfaces gráficas. Actualmente se ha reemplazado esta especificación por J2ME/CLDC. J2ME/CLDC (JSR-30) [11]. Introducida en el 2000, su mercado natural son dispositivos con capacidad de conexión inalámbrica, como teléfonos móviles, buscapersonas, etc., y se ha convertido en el estándar de facto para la mayoría de los dispositivos empotrados Java, incluyendo equipos de TV, etc. Requiere de al menos 192kB de memoria, y de un procesador de bits. J2ME/CDC (JSR-36) [11]. Introducida en el 2001, reemplaza a la especificación PersonalJava para los dispositivos empotrados con al menos

144 136 XI Jornadas de Tiempo Real (JTR2008) 2MB de memoria y procesadores de 32 bits. Se orienta principalmente a PDAs y a dispositivos empotrados con más recursos. RTSJ (JSR-1) [13]. En 2001 se introdujo la especificación para la programación de aplicaciones de tiempo real. Extiende el lenguaje Java, sin modificar la semántica del lenguaje para hilos que no tienen características de tiempo real, para que soporte tareas de tiempo real con planificadores que pueden ser modificados dependiendo del tipo de sistema de tiempo real. RTSJ fue diseñado para extender a la especificación J2SE (no a J2ME), por lo que uno de sus mayores problemas está en la gran cantidad de memoria que necesitan sus implementaciones, lo que dificulta su uso en dispositivos empotrados, especialmente en aquellos de gama baja. Por ejemplo, la implementación de referencia de Timesys requiere 2,6 MB de memoria para la JVM además de 2,5 MB para las librerías. JSR-302: También conocido como Safety Critical JavaTM Technology. Es una especificación basada en el JSR-1 (Real-Time Specification for Java) que agrupa una serie de características mínimas para sistemas con requisitos de seguridad críticos (valido para sistemas embebidos). Es una especificación que está todavía en desarrollo. El uso de la implementación de una librería API que sea conforme a una de las especificaciones descritas anteriormente sobre un sistema empotrado establece la semántica de Java y las posibilidades que ofrece en cuanto a la programación. Los fabricantes de empotrados superan algunas de las debilidades de Java desarrollando nuevas librerías que complementan los mecanismos y medios que se le ofrecen a los desarrolladores de sistemas empotrados. Por tanto, por una parte, los fabricantes anuncian la compatibilidad de sus dispositivos empotrados con especificaciones Java estándar (como PersonalJava) y por otro lado incluyen librerías propietarias para completar el entorno de desarrollo de software para los dispositivos RECOLECTOR DE BASURA. El recolector de basura (RC) es un elemento fundamental de la JVM, cuyo cometido es ayudar al programador, pues lo libera de la tarea de ir liberando los recursos (objetos) que deja de utilizar. Este gran beneficio tiene como contrapartida los siguientes problemas: - Produce sobrecarga debido a que su acción suele ser costosa en tiempo de computación. - Indeterminismo, pues reclama la memoria de los objetos que se han quedado fuera de ámbito en cualquier momento, invalidando los plazos de tiempo de otras tareas. - Fragmentación de la memoria: las aplicaciones Java crean objetos temporales frecuentemente, lo cual origina huecos en la memoria cuando son liberados. Para solventar estos problemas existen diversas posibilidades: - Implementar nuevos algoritmos para el recolector de basura que sean más eficientes, preventivos, que optimicen los recursos de memoria, etc. Existe una nueva generación de RC de tiempo real. - Definir nuevos modelos de memoria, de modo que el RC sólo pueda actuar sobre objetos temporales y no sobre objetos que requieran una persistencia mayor en el tiempo. - Deshabilitar el uso del RC en las aplicaciones MODELO DE CONCURRENCIA. La concurrencia posibilita la ejecución de varios hilos para simplificar la escritura de un programa al separarlo en tareas independientes. Ahora bien, la implementación de aplicaciones concurrentes requiere en Java mecanismos de sincronización para controlar el acceso a los datos compartidos en exclusión mutua. El soporte de concurrencia del Java estándar tiene una semántica débil que no está suficientemente especificada. Así, no tenemos garantía de que los hilos de mayor prioridad se estén siempre ejecutando, o que los hilos de igual prioridad se ejecuten en lapsos de tiempo fijos [33]. En la mayoría de las ocasiones esto se resuelve incluyendo en el entorno de ejecución un sistema operativo de tiempo real sobre el que se ejecuta una máquina virtual de Java que utiliza el modelo de hilos nativos, es decir, hace corresponder los hilos de Java con los hilos del sistema operativo. Sin embargo, esto no es suficiente para garantizar que tenga un comportamiento en tiempo real. La otra solución más razonable consiste en incluir una máquina virtual RTSJ, dado que esta especificación define un modelo de concurrencia seguro junto a planificadores que posibilitan garantizar el cumplimiento de las restricciones temporales de las tareas en ejecución. El soporte de concurrencia no siempre es necesario, y en algunos casos no se utiliza porque el sistema empotrado tiene escasos recursos para disponer de un entorno de ejecución completo. Una alternativa muy utilizada en los empotrados es la técnica basada en ejecutivos cíclicos.

145 5. Herramientas de Desarrollo CARGA DINÁMICA DE CLASES. La carga dinámica de clases es una facilidad única en Java que permite cargar componentes Java en tiempo de ejecución. Esto permite cargar clases independiente de su ubicación y tipo de clases (applets, servlets, beans). Los pasos que se llevan a cabo son los siguientes: -Buscar y cargar los bytecodes asociados a la nueva clase (carga perezosa de clases lazy loading ). -Verificar el formato del.class -Enlazar la clase con estructuras de datos manejados por la JVM. -Preparación e inicialización de la clase. -Ejecución del constructor para instanciar el nuevo objeto. El problema que presenta la carga dinámica es que penaliza la ejecución de las aplicaciones y requiere un consumo de recursos que puede ser excesivo. Por este motivo en algunos sistemas empotrados Java no se soporta esta funcionalidad, debido a los escasos recursos disponibles. Como soluciones a este problema se pueden utilizar las siguientes alternativas: (a) Carga estática de clases: Se genera un binario (similar al compilado de C), o (b) definir nuevos mecanismos más ligeros de carga de clases ACCESO AL HARDWARE. El modelo de seguridad de Java no permite un acceso directo al hardware del dispositivo para el que se programa, debido a que Java carece de punteros, sólo referencias. Así se consigue una mayor seguridad y control de los programas que se están ejecutando. Por tanto, no es posible controlar directamente el contenido de los registros hardware, ni de las posiciones de memoria, lo cual hace imposible la implementación de controladores para controlar dispositivos de entrada/salida. Sólo es posible acceder al hardware utilizando métodos nativos Java a través de la extensión JNI (Java Native Interface) que nos permite invocar llamadas a procedimientos externos escritos en otros lenguajes de programación como C o ensamblador. Sin embargo la eficiencia de este mecanismo es muy dependiente del entorno de ejecución y tiene un rendimiento muy pobre. Por este motivo la mayoría de los fabricantes de dispositivos empotrados aportan sus propios interfaces a métodos nativos para mejorar el rendimiento global de las aplicaciones GESTION DE MEMORIA. Java libera al programador del compromiso de la reserva y liberación de memoria que necesita para el desarrollo de aplicaciones no permitiendo que pueda acceder directamente a la memoria, y además limitando el uso únicamente a memoria heap. El proceso de reserva de memoria la realiza la JVM a medida que se va necesitando en la instanciación de las clases, mientras que el proceso de liberación se produce en instantes concretos de modo automático en cuanto los objetos creados quedan fuera de ámbito llamando al recolector de basura. La gestión de memoria debe estar controlada en las aplicaciones con restricciones de tiempo real, ya que puede ser una de las causas de indeterminismo. En el caso de Java, el recolector de basura puede interferir y hacer perder plazos de tiempo, por lo que se deben adoptar soluciones que controlen el recolector de basura. Una posible solución para impedir el efecto del recolector de basura es eliminarlo de la JVM por lo que o bien hay que añadir un mecanismo para liberar la memoria ocupada, o bien ésta no se libera sino se sobrescribe. Otra solución consiste en definir un pool de objetos de modo que se puede reutilizar la misma zona de memoria sin la intervención del recolector de basura al no permitir que se quede fuera de ámbito. En RTSJ esto se resuelve definiendo nuevas áreas de memoria como la memoria Inmortal y la memoria Scoped. 3 DISPOSITIVOS EMPOTRADOS JAVA En esta sección se ofrece una descripción de algunos dispositivos empotrados Java comerciales. Los dispositivos seleccionados cumplen con las siguientes restricciones. Primero, pueden incluir procesadores de 8, 16 o 32 bits, pero deben ser soluciones comerciales con un costo por unidad conocido a través de sus sitios Web o distribuidores. En segundo lugar, el lenguaje de programación principal utilizado para dichos dispositivos debe ser Java. En tercer lugar, los dispositivos no deben pertenecer al dominio de los dispositivos móviles con conectividad inalámbrica, como PDAs, buscapersonas o teléfonos móviles. En la tabla 1 se muestra un resumen de las características particulares de los sistemas empotrados estudiados. 3.1 JAVELIN STAMP El Javelin Stamp [14] es una placa de tamaño reducido de Parallax Inc., para hacer simple el

146 138 XI Jornadas de Tiempo Real (JTR2008) desarrollo y despliegue de pequeños sistemas prototipo. Incluye convertidores de señal analógica a digital (A/D) y digital a analógica (D/A) y pines de entrada/salida (E/S) genéricos. Soporta un subconjunto de la especificación 1.2. No soporta multihilo, recolector de basura ni interfaces, por lo que no es estándar. 3.2 AJILE El ajile JEMcore [2] es un procesador Java de ejecución directa (es una implementación Hardware de la JVM). El proceso de desarrollo y despliegue de aplicaciones es complejo, ya que requiere el manejo y configuración de aplicaciones específicas como el JemBuilder y el cargador/depurador Charade. 3.3 TINI Tiny InterNet Interface [30] es una especificación hardware-software de plataformas empotradas desarrollada por Dallas Semiconductor (ahora Maxim Integrated Products). Soporta parcialmente el JDK de Sun, pues no posee carga dinámica de clases ni finalización de objetos. El proceso de desarrollo y despliegue de aplicaciones requiere herramientas específicas (TINI SDK), las cuales generan un fichero propio (.tini), pues no soporta la ejecución directa de ficheros Java (.class) como en el caso de Ajile. 3.4 SNAP SNAP (Simple Network Application Platform) [24] es una placa de Imsys Tech basada en el procesador Java Cjip. Se acopla a una placa portadora que provee hardware adicional y que es compatible con el hardware TINI capaz de ejecutar directamente bytecodes Java. El Cjip soporta y es conforme con J2ME-CLDC. Incluye la implementación del perfil MIDP junto con otros paquetes de clases específicos. La programación y descarga de aplicaciones es vía telnet y FTP. 3.5 EJC EJC (Embedded Java Controller) [25] es una plataforma empotrada Java de uso general diseñada por Snijder Micro Systems, orientada hacia aplicaciones Java con conexión a Internet, al igual que TINI. El hardware se basa en un controlador de la familia EC200 y su JVM siguen un modelo de ejecución JIT, en el cual los bytecodes de Java deben ser primero traducidos a código nativo (a través de la JVM) antes de su ejecución, como en TINI y Javelin Stamp. Soporta y es conforme con la especificación PersonalJava, versión 1.2. Adicionalmente, Snijder incluye el paquete com.snijder para manejar características de bajo nivel. La programación y descarga de aplicaciones es vía telnet y FTP. 4 PLATAFORMA HAJASE 4.1 DESCRIPCION GENERAL JAHASE es una plataforma software basada en Java que proporciona un marco de trabajo que permite el acceso al hardware en sistemas empotrados o embebidos de una forma homogénea y sencilla, abstrayendo la complejidad y la heterogeneidad que pueda existir dependiente de los diferentes fabricantes. Este entorno de programación facilita a los programadores de aplicaciones para sistemas empotrados la manipulación del hardware independientemente de su arquitectura, lo que permite desacoplar las aplicaciones respecto del empotrado usado, pudiéndose modificar o actualizar dicho hardware en función de las necesidades cambiantes de la aplicación empotrada. JAHASE también aporta nuevas funcionalidades de alto nivel al programador que pueden no estar presentes en el hardware subyacente como la gestión de temporizadores o gestión de watchdog, o en la JVM subyacente como la reserva de memoria que excluye la intervención del recolector de basura. Los beneficios que aportan son numerosos, desde las facilidades de extensibilidad, homogeneidad en el tratamiento del hardware, portabilidad de las aplicaciones, simplificación en el modelo de programación, hasta la posibilidad de aumentar el número de sistemas empotrados controlados de forma sencilla y rápida. 4.2 CARACTERÍSTICAS DE JAHASE Este entorno de trabajo ofrece una serie de características y funcionalidades que facilitan la programación de aplicaciones empotradas y de tiempo real. De forma resumida podemos destacar las siguientes características principales: 1.Tratamiento homogéneo de las diferentes entradas/salidas del sistema en conjunto, independientemente de que éstas estén conectadas a buses digitales mediante expansores de puertos, o pertenezcan al propio microcontrolador. 2.Acceso homogéneo a los diferentes tipos de buses digitales disponibles en una plataforma empotrada como, por ejemplo, I2C, SPI o 1-Wire.

147 5. Herramientas de Desarrollo 139 Javelin Ajile Snap EJC TINI Resources (Speed) ~25Mhz 100 MHz / 66 MHz MHz 74 MHz 75 MHz 1 kb on-chip, 8 MB 8 kb on-chip, 64 MB 64 kb on-chip, 1 32 kb RAM, 32 kb 32 kb on-chip, 2-8 MB Resources (Memory) External RAM, 2 MB External RAM, 16 MB MB external RAM, program EEPROM external RAM Flash Flash 1 MB Flash Processor RISC Ubicom SX48AC ajile aj-100 / aj-80 Power 150 mw 260 mw Internal Connectivity External Connectivity 16 I/O pins PWM RS I/O pins SPI I2C CAN 1-Wire Ethernet RS-232 IrDA Direct on-chip Java bytecodes execution Imsys Cjip 1700 mw (w/active Ethernet) 8 I/O pins SPI I2C CAN 1-Wire Ethernet RS-232 Tabla 1. Embedded system features sumary Cirrus Logic EP7312 (+ slave PIC 16LF872) Maxim/Dallas Semiconductor DS80C mw 120 mw 15 I/O pins I2C 1-Wire Ethernet RS Wire CAN Ethernet RS-232 JVM - Java Direct on-chip Java Java Execution Model Interpreter bytecodes execution JVM Java Interpreter Java Standard Compliance Non-standard J2ME CLDC J2ME CLDC, MIDP PersonalJava compliant compliant Java 1.2 compliant profile compliant (Java based) subset Concurrence/Synchronization limited 1 Standard 2 Standard Standard Standard Performance low high normal high normal Efficiency low high normal high normal Real-Time Support Real Time capable (low-level programming), no schedulers Real-Time threads and scheduling schemes (Piano roll, Periodic threads) support Real-time OS, deterministic timers support. Real-time OS, thread RTOS-Java priority mapping support JVM Java Interpreter Java subset support (not compliant) Real-Time OS, but no Java Real- Time support. Latency high low high low high Reliability low high medium high medium Standardization low medium high medium high Ease of Development Medium low Medium-High High Medium-High Flexibility low medium medium medium medium Portability none none none none none Escalability low high medium high medium Integration low medium high Médium medium JNI none none Partial supported none TNI 3 Security vs HW Access 1 Javelin soporta la programación de Periféricos Virtuales que se ejecutan en pseudo-concurrencia, pero no soporta programación multi-hebra. 2 Ajile posee algunas características adicionales como PeriodicThread, PeriodicTimer, PianoRoll, etc. 3 TNI: Tini Native Interface. Es un subconjunto de JNI.

148 140 XI Jornadas de Tiempo Real (JTR2008) 3.Abstracción de la comunicación, la cual se puede realizar de forma independiente a la tecnología subyacente (ej: RS232, Ethernet, etc.). 4.Acceso homogéneo a ficheros en memoria RAM o flash, y posibilidad de almacenamiento persistente si el sistema empotrado tiene soporte para un sistema de archivo. 5.Funcionalidad de generación de Bitácora (logging). Permite realizar anotaciones ordenadas cronológicamente en registros (o logs) durante la ejecución de las aplicaciones de forma sencilla, con la posibilidad de definir diferentes niveles de logging (Warning, Error, Info, etc.). Los registros pueden realizarse sobre memoria, consola o en un sistema de almacenamiento persistente. 6.Pool de objetos. Permite reutilizar objetos para optimizar el uso de la memoria. 7.Controlador virtual de acceso a memoria. Crea espacios de memoria permitiendo el acceso a los mismos como si se tratase de memoria física. 8.Modelo de eventos asíncronos genéricos que permiten unificar el modelo de interrupciones y eventos software, proporcionando la posibilidad de extender la funcionalidad de los manejadores o controladores asociados. 9.Perro Guardián (Watchdog) software y hardware. Permite asociar la ejecución de una determinada tarea o el reinicio del sistema (funcionamiento clásico) cuando la aplicación se bloquee en la zona protegida por el perro guardián. 10.Acceso al uniforme al reloj del sistema. Se puede obtener el tiempo real del sistema de forma genérica. 11.Temporizadores: de un solo disparo y periódicos. Facilitan la programación de tareas periódicas y no periódicas. 12.Cronómetros y utilidades para medidas del tiempo de peor ejecución (WCET, Worst Case Execution Time). Permite poder realizar mediciones de tiempo de forma sencilla. 13. Medida de prestaciones mediante un conjunto de pruebas que pueden ser personalizados por el usuario. Para el desarrollo de JAHASE se han tenido en cuenta varios objetivos. En primer lugar se han utilizado técnicas avanzadas de diseño mediante el uso de diferentes patrones de diseño como el patrón Factoría, Adaptador, Proxy, etc. [15] que facilitan tanto el desarrollo como el mantenimiento de la plataforma. En segundo lugar, las clases implementadas en JAHASE están optimizadas para reducir el consumo de memoria y aumentar la velocidad de ejecución. En tercer lugar, se ha seguido un esquema genérico mediante la incorporación de jerarquías que posibilita la adición de nuevas clases para la extensión de JAHASE a nuevos entornos empotrados. El entorno de programación de JAHASE sigue el esquema de trabajo host-target empleado frecuentemente para el desarrollo de aplicaciones para sistemas empotrados. En una primera etapa la aplicación Java se compila a través de JAHASE que incluirá todos los paquetes comunes y genéricos de la plataforma, comprobará si hay algún error, y posteriormente añadirá las clases específicas del sistema empotrado concreto a utilizar. Posteriormente se invoca a los scripts y herramientas específicas de cada fabricante para crear la imagen de la aplicación (en unos casos serán bytecodes, en otros serán un archivo binario, etc.) dependiendo del modelo de ejecución de la JVM. 4.2 ARQUITECTURA DE JAHASE En la arquitectura de JAHASE se pueden diferenciar dos capas: una capa de bajo nivel encargada de ofrecer la abstracción respecto del hardware subyacente, permitiendo la homogenización en el acceso de bajo nivel, y una capa superior que contiene los componentes y servicios de alto nivel que permiten una mayor flexibilidad a la hora de desarrollar aplicaciones sobre esta plataforma. La figura 1 muestra gráficamente tanto el nivel de componentes como el de acceso al hardware de la arquitectura de JAHASE. Los componentes soportados actualmente son: 1-Acceso a Buses Digitales. 2-Servicios de Bitácora (Logging). 3-Utilidades para el Acceso a memoria. 4-Servicios de tiempo. 5-Comunicaciones. 6-Utilidades de entrada/salida (Hardware).

149 5. Herramientas de Desarrollo 141 Aplicación Java JAHASE Componentes DigitalBuses Logging Memory Time Com CPU I2C SPI 1-Wire CAN Homogeneidad y Abstracción Librería Javelin Stamp Librería EJC Librería TINI Librería JPro Librería SNAP J2ME J2SE RTSJ JVM Fig.1. Arquitectura JAHASE Cada uno de los componentes tiene asociado un paquete que agrupa todas las clases, paquetes e interfaces relacionados con dicho componente, lo cual da un mayor grado de estructuración a la plataforma. El paquete JAHASE contiene los servicios genéricos y funcionalidades de alto nivel independientes del hardware, mientras que el paquete HDK (Hardware Developement Kit) contiene las implementaciones específicas para cada tipo de sistema hardware soportado. Las clases incluidas en el paquete HDK son extensiones de las clases genéricas del paquete JAHASE, por lo que existirá una extensión de cada clase en JAHASE por cada tipo de hardware empotrado soportado a menos que dicha implementación sea la misma para varios sistemas hardware. 4.3 COMPONENTE DE BUSES DIGITALES En este apartado se analiza con detalle uno de los componentes básicos de JAHASE. Con objeto de uniformar el acceso a los diferentes tipos de buses digitales que puede soportar un sistema empotrado, se ha utilizado un modelo maestro-esclavo para facilitar el acceso a los mismos. El modelo es modularizable y extensible, y permite de forma genérica tratar del mismo modo tanto dispositivos conectados a un bus I2C, a un bus SPI, a un bus 1- wire ó a un bus CAN. El modelo maestro-esclavo es una versión software del modelo utilizado físicamente. El maestro, el empotrado, se encarga de la sincronización del bus y de iniciar las comunicaciones con cualquiera de los dispositivos esclavos, que encapsulan a otros dispositivos electrónicos como conversores ADC, extensores de E/S, sensores de temperatura, etc. La ventaja de utilizar el bus digital, es que permite simplificar el cableado, pues usa dos o tres cables para acceder al dispositivo electrónico, y por otra parte, posibilita aumentar la capacidad del empotrado en algún aspecto como por ejemplo, proporcionar más E/S de la que inicialmente tiene el empotrado, acceder a una unidad de memoria, etc. Para virtualizar desde el software el modelo maestroesclavo (figura 2) se requiere manejar un maestro virtual que, internamente, utiliza una instancia del maestro concreto en función del tipo de bus digital seleccionado. Además necesitamos conocer el

150 142 XI Jornadas de Tiempo Real (JTR2008) esclavo concreto con el que queremos trabajar. Esto es necesario, ya que aunque los esclavos utilizan un mismo protocolo de comunicaciones fijado por el bus digital, no ocurre así con los datos que son capaces de interpretar. Por tanto se debe implementar un esclavo virtual para cada familia de esclavos como conversores ADC, sensor de Temperatura, etc. JAHASE incluye algunos esclavos virtuales por defecto como el esclavo virtual del dispositivo LM75 de National Semiconductor que permite manejar un sensor de temperatura por I2C, o el esclavo virtual del dispositivo PCF8574 de Phillips que permite expandir 8 entradas/salidas digitales por el bus I2C. Por otra parte, también es necesario incluir un mecanismo que permita especializar los esclavos virtuales según las necesidades del programador. Master Slave 1 Generic Digital Bus Slave 2 Slave N Fig.2. Modelo Maestro-Esclavo genérico para buses digitales Diseño del Componente. Dependiendo del tipo de sistema empotrado que estemos usando, la librería API proporcionada por el fabricante es diferente para el acceso a los diferentes tipos de buses digitales, así como los mecanismos definidos para acceder a los mismos. Esto nos obliga a realizar una implementación concreta para cada dispositivo y para cada tipo de sistema empotrado según el tipo de bus usado que tiene que ajustarse al modelo de programación de JAHASE. Para manejar los buses digitales con el modelo maestro-esclavo virtual se ha creado un conjunto de clases donde, gracias a la herencia y al patrón Factoría, se consigue abstraer el tipo de bus que internamente se está utilizando, al igual que el tipo de sistema empotrado (hardware) subyacente. El diseño realizado permite que la implementación de los diferentes tipos de esclavos sea única, pues el acceso específico de cada esclavo al bus digital se realiza utilizando las primitivas genéricas de lectura y/o escritura del bus. I2C::Pcf8591_I2CSlave - PCF8591: String {readonly} # diectread(int) : int # directwrite(int, int) : int SPI::SPISlave Slave # m_address: int # m_name: string # diectread(int) : int # directwrite(int, int) : int I2C::I2CSlave I2C::LM75_I2CSlave - LM75: String {readonly} # diectread(int) : int # directwrite(int, int) : int 0..* onewire:: OneWireSlave I2C::AjileI2CBus - bus: I2C contiene + read(byte[], int, int) : int + setslaveaddress(int) : int + setvelocidad(int) : int + write(byte[], int, int) : int I2C::Pcf8574_I2CSlave - PCF8574: String {readonly} # diectread(int) : int # directwrite(int, int) : int 1 DigitalBus # m_name: string # m_speed: int + read(byte[], int, int) : int + setslaveaddress(int) : int + setspeed(int) : int + write(byte[], int, int) : int SPI::SPIBus I2C::I2CBus I2C::SnijderI2CBus - bus: I2CMasterBus - esclavo: I2CDevice + read(byte[], int, int) : int + setslaveaddress(int) : int + setvelocidad(int) : int + write(byte[], int, int) : int onewire:: OneWireBus I2C::SnapI2CBus - bus: I2CPort + read(byte[], int, int) : int + setslaveaddress(int) : int + setvelocidad(int) : int + write(byte[], int, int) : int Fig. 3. Detalle de la jerarquía de clases de JAHASE respecto a los buses Digitales. Como se ve en la figura 3 las principales clases que soportan el modelo maestro-esclavo están en la raíz de los árboles de herencia, DigitalBus y Slave, las cuales contienen los métodos y atributos comunes a las tecnologías de buses digitales. La clase DigitalBus es el maestro virtual que abstrae el acceso a cualquiera de los tipos de buses digitales soportados por un sistema empotrado. Contiene esencialmente los métodos y atributos comunes a todos los tipos de buses digitales y se extiende en clases especializadas para cada tipo de bus digital y para cada tipo de empotrado. Estos tienen que implementar los métodos abstractos de acceso al bus utilizando la API específica de cada plataforma hardware. La clase Slave contiene todos los atributos y métodos genéricos de un esclavo virtual para cualquier tipo de bus digital. Para cada uno de los diferentes tipos de buses digitales y dispositivo concreto existe una clase que especializa a Slave de acuerdo con la semántica concreta de cada bus. Cuando se desee realizar una implementación para un dispositivo concreto como el PCF8591 de Philips se deberá extender la clase esclavo del bus al que pertenece I2CSlave en una nueva clase PCF8591_I2CSlave. Una aplicación que utilice JAHASE, puede aprovechar este mecanismo de acceso a buses digitales para controlar cualquier tipo de dispositivo esclavo conectado al bus mediante una clase específica por cada uno de los esclavos. El acceso a los buses digitales se realiza desde la clase estática JAHASE.cpus.CPU, mediante la llamada a un método getdigitalbus al que se le indica el tipo de bus deseado (SPI, I2C, 1-Wire, etc.) como se puede ver en la figura 4. Las flechas blancas representan el orden en las llamadas originadas, mientras que el resto de flechas rellenas indican las relaciones de herencia entre las diferentes clases.

151 5. Herramientas de Desarrollo 143 La implementación específica de cada tipo de sistema empotrado para un determinado bus digital se encuentra en la jerarquía de paquetes HDK, y siempre extiende la clase del tipo de bus correspondiente de la jerarquía de paquetes JAHASE (representado en el diagrama con las circunferencias rojas de HDK y sus correspondientes flechas de herencia hacia JAHASE). I2C I2CBus JAHASE cpus CPU getdigitalbus(id,bustype,busname) digitalbus DigitalBus SPI SPIBus onewire OneWireBus I2C I2CBus getdigitalbus SnapI2CBus AjileI2CBus SnijderI2CBus Hdk digitalbus DigitalBus getdigitalbus(id,bustype,busname) SPI SPIBus getdigitalbus SnapSPIBus AjileSPIBus SnijderSPIBus onewire OneWireBus getdigitalbus SnapOWIBus AjileOWIBus SnijderOWIBus Fig. 4. Diagrama para la Creación de Buses Digitales Ejemplo de Aplicación Para mostrar el funcionamiento del componente de acceso a buses digitales, se presenta un ejemplo de aplicación en el que se accede a un sensor de temperatura LM92 mediante el bus I2C. package example.buses; import JAHASE.digitalBus.*; import JAHASE.digitalBus.I2C.*; import JAHASE.cpus.*; /** This is a example of use of I2C bus, a kind of DigitalBus. */ public class pruebalm92{ // public static void main(string args[]) { CPU.initialize(); // To Initiate any of supported embedded system. DigitalBus bus = CPU.getDigitalBus(0,DigitalBus.I2C,"Bus_I2 C"); TemperatureSlave sensor = new LM92_I2CSlave(0,"LM92_sensor", bus); while(true){ System.out.println("-Temperatura: "+sensor.gettemperature()); try{ Thread.sleep(1000); }catch(exception e){ e.printstacktrace(); } } } } Fig. 5. Ejemplo de utilización de los buses digitales (para un sensor de temperatura I2C LM92). En el ejemplo de la figura 5 se puede ver como se puede acceder de forma sencilla a un determinado dispositivo electrónico a través del bus digital I2C, concretamente a un sensor de temperatura I2C LM92. En primer lugar la llamada estática initiliaze de la clase CPU inicializa la plataforma JAHASE para poder trabajar sobre el sistema empotrado concreto. A continuación se obtiene el maestro virtual del bus concreto a partir de la invocación al método getdigitalbus pasando como argumento el tipo de bus deseado. Posteriormente se crea una instancia de LM92_I2CSlave, especificando en los argumentos el bus utilizado y la dirección física del puerto en el que físicamente se ha conectado el dispositivo LM92. El resto del código de la figura 5 muestra un ejemplo sencillo de cómo se puede muestrear de forma periódica el sensor de temperatura y luego, a continuación, imprimir el valor de temperatura actual Extensión del Componente Cuando se desee realizar una implementación de un esclavo virtual para un dispositivo concreto se deberá extender la clase esclavo del bus al que pertenece dicho dispositivo. Esto nos permite utilizar la semántica del protocolo de comunicaciones asociado al bus, que está implementado en los métodos directread y directwrite. El protocolo de comunicaciones del bus digital establece de forma general cómo se realizan las lecturas y escrituras entre los distintos dispositivos del bus, pero no específica el formato o mascara de bits de los datos ni el orden en el que son interpretados por cada dispositivo conectado al bus; esto viene definido por el fabricante de cada dispositivo que se conecta al bus. Por tanto, la implementación del esclavo virtual requiere establecer el protocolo de datos específico de cada dispositivo utilizando las primitivas directread y directwrite. El mecanismo de extensión permite la total independencia del esclavo virtual con respecto al tipo de empotrado subyacente, e incluso podría ser independiente respecto al tipo de bus digital, si el protocolo de datos utilizado por el esclavo virtual fuera el mismo para cualquiera de los buses digitales. Esto se consigue al manejar los métodos abstractos de la superclase DigitalBus, que por la ligadura dinámica se especializan en los métodos de acceso al bus concreto de las clases especializadas. 5 ANÁLISIS DE ENTORNO DE EJECUCIÓN Cuando se trabaja con sistemas empotrados es importante conocer las prestaciones físicas del mismo, el entorno de ejecución o sistema operativo instalado, cómo se realiza la carga una aplicación en el empotrado y las posibilidades que ofrece el sistema para la programación. Toda esta información facilita al desarrollador la implementación de aplicaciones optimizadas en función de las características concretas del entorno empotrado.

152 144 XI Jornadas de Tiempo Real (JTR2008) Aunque la plataforma JAHASE proporciona un marco de trabajo común para trabajar con distintos tipos de plataformas empotradas, su funcionamiento está ligado a las características concretas del empotrado concreto, por lo que debe incluirse un banco de pruebas que permita conocer de forma explícita las posibilidades que ofrece. JAHASE suministra un componente que facilita el diseño de aplicaciones de prueba, así como incorpora un conjunto de ellas para que pueda ser utilizada por el desarrollador. 5.1 DISEÑO DE UN EXPERIMENTO En esta sección se explica un modelo de programación que hemos diseñado para posibilitar el diseño de experimentos sobre el entorno de JAHASE. El modelo desarrollado es adaptable a diferentes gamas de experimentos, extensible para que se puedan incluir nuevos tipos de experimentos, y reutilizables para probar las diferentes plataformas soportadas por JAHASE. Además es necesario disponer de un mecanismo automático capaz de chequear el sistema ante una serie de experimentos predefinidos o diseñados por el desarrollador. Básicamente el modelo genérico desarrollado incluye: 1. Esquema del programa principal: Iniciará el entorno JAHASE, creando un objeto logger para ir registrando los datos de medición y realizará cada uno de los experimentos, que posteriormente pueden ser volcado a consola o a un archivo. 2. Método de inclusión de carga y/o ruido: Con objeto de comprobar la influencia de otras hebras de ejecución en el propio experimento, se han diseñado diferentes niveles de carga y/o ruido (carga por número de hebras y periodo de activación de éstas). 3. Método de realización común de los diferentes tipos de medidas o pruebas: Mediante los métodos establecidos en una interfaz de experimento (Tester.java), se consigue que cualquier objeto experimento que la implemente podrá ser ejecutado dentro de este esquema (método maketest(subtype,logger,patrón)) 4. Mecanismo de extensión para incorporar nuevos tipos de medidas o pruebas: Para ello se debe implementar la interfaz Tester y añadir el nombre de la nueva clase en el listado de experimentos a realizar (bien en un fichero de texto o en un array de cadenas en el propio programa principal) En la figura 6 se puede ver la relación entre las diferentes clases del modelo, incluyendo dos ejemplos (rendimiento, latencia de despacho y temporizador periódico). La clase Test contiene el método main, donde se utiliza un objeto Logger como bitácora para ir capturando las salidas de los diferentes experimentos, los cuales se ejecutan mediante las sucesivas llamadas al método MakeTest de la interfaz Tester (la cual se implementa según cada tipo de experimento), ofreciendo un modelo genérico y flexible para la adición de nuevos experimentos. class Test Diagram Test + TESTS: String[] {readonly} + main(string[]) : void 0..* 1..* «interface» Tester + destroy() : void + enableoverload(boolean) : void + getoverloaddesc(int) : String + getoverloaddescs() : String[] + getoverloadtype() : int + getoverloadtypes() : int[] + isoverloadenabled() : boolean + maketest(int, Logger, String) : void + setoverload(int) : void 1 contains PerformanceTester logging::logger # m_level: int # m_prefix: String + maketest(int, Logger, String) : void 1 + flush() : void + gethandler() : VHandler + getlevel() : int + getprefix() : String + log(int, String) : void + log(int, String, Throwable) : void + sethandler(vhandler) : void + setlevel(int) : void + setprefix(string) : void GenericTester + destroy() : void + enableoverload(boolean) : void + getoverloaddesc(int) : String + getoverloaddescs() : String[] + getoverloadtype() : int + getoverloadtypes() : int[] + isoverloadenabled() : boolean + setoverload(int) : void PeriodicTimerTester + maketest(int, Logger, String) : void DispatchLatencyTester + maketest(int, Logger, String) : void Fig. 6. Diagrama de clases del modelo de test Para realizar una medida basada en tiempo es importante tener en cuenta algunas propiedades como la resolución, precisión y granularidad de los relojes que se utilizan durante el experimento y de los resultados obtenidos [27]. Aunque en general los empotrados suelen incluir algún reloj de alta resolución basado en un reloj hardware a través de la implementación de una clase particular, las especificaciones Java hasta la JDK1.5 no incluyen un mecanismo para medir tiempo por debajo del umbral de los milisegundos. 5.2 EXPERIMENTOS REALIZADOS A continuación se presentan algunos de los experimentos realizados que son más significativos en cuanto a características de tiempo real se refiere, como son el experimento de rendimiento, el retardo en la activación de los temporizadores, gestión de memoria, latencias y jitter en la activación de tareas periódicas. Todos los experimentos se ha realizado con diferentes niveles (7) de sobrecarga en el sistema. Para el estudio de los experimentos realizados hemos considerados 3 targets concretos: el entorno empotrado SNAP de Imsys, el microcontrolador EJC de Snijder, y el empotrado JSTIK de Systronix

153 5. Herramientas de Desarrollo 145 basado en el procesador Java ajile aj-100. Los datos se contrastan con otra plataforma neutra basada en una estación de trabajo SUN ULTRA con un procesador con doble núcleo AMD Opteron 1214 a una frecuencia de 2.2 Ghz (cada núcleo) y una memoria caché de 2MB (1MB por núcleo) y 3GB de memoria RAM, que incluye Windows XP de 32 bits y JVM HotSpot basada en JDK Rendimiento (Performance) -Descripción: Para obtener una medida del rendimiento se han realizado una serie de pruebas en la que se mide la capacidad que tiene la JVM del entorno empotrado para procesar distintos tipos de operaciones sobre tipos primitivos (byte, int, etc.) o sobre colecciones (Vector, array, etc.) y Strings. Este experimento se basa en el benckmark realizado por Systronix [26]. -Procedimiento de medida: Número de operaciones por segundo. -Variable a medir: Se utiliza el tiempo que tarda en procesarse n-veces operaciones de copia sobre tipos primitivos (byte, integer, double, float), arrays y cadenas String. A partir del número total de operaciones realizadas se puede obtener las operaciones por segundo de una plataforma. -Resultado: Según la tabla 2, el sistema basado en el microcontrolador ajile es superior en casi todas las mediciones, lo cual se debe a que posee una frecuencia de reloj bastante alta ya que utiliza un modeo de ejecución basado en procesador Java. No obstante, en algunas operaciones aritméticas obtiene mejores resultados el sistema EJC pese a su menor frecuencia de reloj. Por el contrario la plataforma SNAP no puede utilizarse para aplicaciones que demanden grandes prestaciones del entorno. Tabla 2. Performance Summary Temporizadores -Descripción: Se mide el tiempo transcurrido entre disparos, para compararlo con el teórico previamente establecido para los diferentes tipos de temporizadores de JAHASE, oneshottimer (un solo disparo), PeriodicTimer (periódico), CycledTimer (cíclico) obteniendo de este modo el retardo en el disparo del temporizador. -Procedimiento de medida: La forma de realizar esta medida es, en primer lugar, realizar un registro de tiempo antes de iniciar el temporizador, y luego registrar el tiempo cada vez que se activa el temporizador. Si el temporizador es periódico, se podrá estimar la precisión a lo largo de varios disparos. En el caso de un temporizador de un solo disparo se medirá varias veces el tiempo en el que se activa desde que se inicia. Se debe tomar también el máximo y mínimo además del promedio de tiempos y medir la dispersión en el caso de los temporizadores periódicos. -Variable a medir: Retardo en el disparo del temporizador. -Resultados: Se puede observar en la tabla 3 que la plataforma JSTIK prácticamente no tiene ningún retardo en la activación del temporizador periódico ni en valores medios ni en cuanto al valor máximo lo que posiblemente se deba a que éstos se encuentren en un rango de tiempos por debajo del milisegundo que es la mínima resolución con la que se puede medir en

154 146 XI Jornadas de Tiempo Real (JTR2008) esta plataforma. En el caso de la estación de trabajo se obtienen unos valores muy bajos indudablemente debido a la diferencia de prestaciones de este tipo de equipos. En cambio, en el resto de sistemas empotrados, al aumentar el nivel de sobrecarga del sistema, se comienza a retrasar el disparo del temporizador periódico, siendo este aumento casi lineal, tanto en el valor medio como en el máximo tiempo de retraso. Fig. 7. VPeriodicTimer Maximum Summary Gestión de Recursos de Memoria -Descripción: En este experimento se mide la capacidad que tiene el entorno empotrado para la gestión de memoria. JAHASE permite gestionar dos tipos de memoria: la memoria Heap convencional de cualquier JVM y que tiene asociado un recolector de basura para liberar y desfragmentar automáticamente la memoria después de su uso, y por otra parte, la memoria estática o Memory Spaces, que establece un tipo de memoria (similar a la memoria inmortal en RTSJ) donde se impide la actuación del recolector de basura. Este experimento está basado en el HeapTest de Systronix [26]. Tabla 3. Timers Summary Según la figura 7, la plataforma JSTIK es la más estable de todas eliminando cualquier tipo de retardo incluso cuando el sistema tiene la mayor carga posible, salvo algún retardo de 1 ms debido a la poca precisión del reloj utilizado para hacer las medidas. Por tanto, esta plataforma es adecuada para el desarrollo de aplicaciones de tiempo real. En los demás casos, incluso en la estación de trabajo, se observan retardos locales en la activación del temporizador periódico que no se acumulan en las sucesivas activaciones. Sólo la plataforma SNAP muestra retardos locales excesivos que se acumulan cuando el sistema tiene sobrecarga -Variable a medir: Estado de la memoria antes y después de la prueba, y el tiempo necesario para realizar la prueba. -Procedimiento de medida: La forma de realizar esta medida consiste en reservar diferentes tamaños de memoria varias veces, y evaluar el tiempo que tarda en hacerlo, y en segundo lugar medir el tiempo que tarda en copiar diferentes bloques de datos dentro de la memoria. Para hacer el experimento real, el proceso se realiza con varias tareas que gestionan distintas partes de la memoria. Este experimento se hace para los dos tipos de memoria: Memoria controlada por el recolector de basura y memoria no controlada por el recolector de basura. -Resultados: En la figura 8 se observa el tiempo total de las pruebas y el tiempo que requiere el recolector de basura (GC Time). El tiempo total (TotalTime) hace referencia al tiempo que tardan en ejecutarse concurrentemente tres hebras. En este experimento podemos comprobar en primer lugar que el tiempo total necesario para la reserva y liberación sucesiva de memoria aumenta a medida que aumenta la carga del sistema. Ahora bien, el tiempo necesario es mucho más severo en la plataforma SNAP, del orden de 2500 segundos (40 minutos aproximadamente)

155 5. Herramientas de Desarrollo 147 para cada ejecución, si lo comparamos con otras plataformas casi dos órdenes de magnitud menor. -Procedimiento de medida: Para medir esta latencia, se hace al revés, ya que es simétrico. Se marca el tiempo de una tarea más prioritaria antes de hacer un yield() para devolver el control al planificador, que justo a continuación devuelve el control a la tarea menos prioritaria que se encuentra bloqueada. Para que la tarea menos prioritaria pueda marcar el tiempo se requiere que la tarea más prioritaria active un flag en un recurso compartido protegido. Esta medida se realiza durante un número de veces dado. b. Latencia de Evento -Descripción: La latencia de evento se corresponde con el tiempo total desde la ocurrencia de un evento hasta que éste es tratado. -Variable a medir: Latencia de evento. Fig. 8. Total Time and Total Garbage Collector Time Summary Pese a que SNAP es el que más tarda con diferencia, el número de veces y por tanto el tiempo dedicado al recolector de basura es menor que en el EJC y muy similar al del JSTIK. Lo cual significa que el recolector de basura implementado en dichas plataformas es mucho más eficiente y menos invasivo. Aunque el EJC llama un mayor número de veces al recolector de basura, y el tiempo que le dedica es bastante grande, sigue obteniendo un tiempo total bastante bueno al tener mejor controlado el tiempo necesario para reservar nueva memoria Latencias En este experimento se mide la latencia del sistema desde varios puntos de vista: la latencia de despacho debido al cambio de contexto, la latencia debido al mecanismo de sincronización, y la latencia asociada a la activación de eventos. a. Latencia de Despacho -Descripción: La latencia de despacho se corresponde con el tiempo total desde que una tarea es interrumpida por otra de mayor prioridad hasta que la tarea de mayor prioridad ocupa el procesador. -Variable a medir: Latencia de despacho. - Procedimiento de medida: Para medir la latencia de evento se marca el tiempo antes de disparar un evento AsyncEvent, y luego se mide justo en el momento en el que se activa el manejador de evento AsyncEventHandler. Esto se repite un número determinado de veces. c. Latencia de Sincronización -Descripción: La latencia de sincronización mide el tiempo que tarda un hilo en tomar el control desde que adquiere el cerrojo para entrar en la sección crítica. -Variable a medir: Latencia de sincronización. - Procedimiento de medida: Para medir la latencia de sincronización se registra el tiempo antes de invocar un método protegido por synchronized, y después de entrar en la sección crítica. Para medir el tiempo se resta el tiempo que tardaría en invocar un método no synchronized. -Resultados: Según la tabla 4 y la figura 9, la latencia de despacho crece de forma lineal tanto en el sistema EJC como en el SNAP (en el ultimo con una pendiente mucho más pronunciada como cabía esperar). Sin embargo en el JSTIK es muy baja incluso en alta sobrecarga, obteniendo resultados muy semejantes a los de la estación de trabajo. En el caso de la latencia de evento (tabla 4) se obtienen unos resultados bastante parecidos a los de la de despacho. En el EJC y SNAP crece según el nivel de carga de manera lineal, mientras que en el JSTIK se mantiene estable y muy

156 148 XI Jornadas de Tiempo Real (JTR2008) baja (por debajo del milisegundo). Respecto a los valores mínimos, en el EJC siempre es superior a 2 ms, mientras que en SNAP el mínimo es 4 ms en situaciones de sobrecarga nula. Estos datos pueden ser relevantes cuando se requiera un tiempo respuesta frente a eventos con una marque de tiempo muy reducido Jitter en la activación de tareas periódicas El jitter nos permite medir la fluctuación que hay en la activación de una tarea periódica; es decir, la diferencia entre el máximo retardo en la activación de una tarea y el mínimo retardo en la activación de una tarea. Para ello hemos realizado dos tipos de experimentos. En el primer experimento que hemos denominado modelo síncrono, la activación de las tareas periódicas se programa mediante un hilo que, dentro de un bucle, bloquea la ejecución del mismo el tiempo especificado por el periodo mediante instrucciones basadas en retardos como sleep tal y como se muestra a continuación: while (true) { suspensión (Periodo); código de la tarea periódica } Para medir el jitter tenemos que estudiar el retraso que se origina entre la activación real de la tarea periódica y la esperada. Tabla 4. Latencies Summary La latencia de sincronización vuelve a ser dependiente del nivel de sobrecarga, incluso para el JSTIK, que se ve afectado por las altas sobrecargas. El sistema JSTIK presenta varios picos muy pronunciados bajo alta sobrecarga, haciéndolo poco recomendable para aplicaciones que requieran un gran número de métodos sincronizados y existan un alto grado de carga en el sistema. Por el contrario, el EJC es ahora el que, en su línea, mejor responde entre los sistemas empotrados para la sincronización de métodos en altos niveles de carga. La latencia es crece de forma lineal según la carga, por lo que es más predecible. Time (ms) Fig. 9. Dispatch Latency Average Summary None Very Low Low Medium High Very High Overload Levels SNAP JStik EJC Sun Dispatch Latency Summary (Average & Deviation) En el segundo tipo de experimento basado en temporizadores de JAHASE, que hemos denominado modelo asíncrono, utilizamos el mecanismo basado en eventos, de modo que la tarea periódica se activa como consecuencia de la ocurrencia de un evento asíncrono que se dispara en el momento en el que se agota el tiempo programado para el temporizador. En ese momento se transfiere el control en un manejador de eventos bloqueado a la espera del disparo de eventos de temporizador. A la diferencia entre el tiempo de activación de la tarea y el esperado lo hemos denominado TimerDelay. Para ambos experimentos hemos considerado que la tarea periódica se ejecuta con un periodo de 200 ms. -Descripción: En este experimento se mide el jitter que presenta el entorno empotrado, es decir, las fluctuaciones que se producen en la activación de tareas periódicas. Dado que JAHASE funciona sobre la JVM del empotrado, su medida dependerá esencialmente de la JVM subyacente. -Variable a medir: Jitter, es decir, la dispersión máxima que hay entre la activación real y la activación teórica. -Procedimiento de medida: Para medir el jitter se registra los instantes de activación de una tarea periódica respecto del instante en que realmente debería haberse activado

157 5. Herramientas de Desarrollo 149 (que se calcula previamente antes de iniciar la tarea periódica). debido a que el jitter no es demasiado excesivo en altos niveles de sobrecarga. -Resultados: Viendo estos resultados (tabla 5), se podría decir que en ausencia de sobrecarga, el Jitter de los diferentes sistemas es el siguiente: -SNAP: entorno a los 3-4 milisegundos. -EJC: entorno a los 1-2 milisegundos -JSTIK: entorno al milisegundo. -SUN Ultra 20: por debajo del milisegundo. Time (ms) Jitter (Asynchronous Model) Average Summary SNAP JStik EJC Sun None Very Low Low Medium High Very High Overload Levels Fig. 11. Jitter (Asynchronous Model) 6 CONCLUSIONES Y TRABAJO FUTURO Java constituye una plataforma muy potente para el desarrollo de aplicaciones empotradas con requerimientos estrictos de rendimiento y prestaciones. Sin embargo, existe una gran cantidad de especificaciones, mecanismos y modelos de programación particulares a cada fabricante de dispositivos, lo cual dificulta la adopción de Java por parte de los desarrolladores de sistemas empotrados. Tabla 5. Jitter Times (ms) No obstante, si aumentamos el nivel de carga los tiempos crecen, lo cual hace que el sistema sea menos predecible y menos adecuado para aplicaciones con requisitos temporales estrictos. Time (ms) Jitter (Synchronous Model) Average Summary SNAP JStik EJC Sun En este trabajo se han expuesto las características principales de una muestra significativa de las soluciones Java empotradas más populares, y se propone una plataforma de alto nivel, basada en los mecanismos abstractos proporcionados por los distintos fabricantes, que permite manejar de forma homogénea y coherente los recursos de bajo nivel de todos los dispositivos analizados. En el futuro se prevé expandir la plataforma desarrollada para cubrir mecanismos y tecnologías aún no soportadas, como una adaptación a una JVM conforme a RTSJ como Jamaica o JRate, así como el diseño de una jvm para sistemas empotrados de baja escala con recursos muy limitados en memoria y procesamiento None Very Low Low Medium High Very High Overload Levels Fig. 10. Jitter (Synchronous Model) De nuevo el empotrado JSTIK ofrece muy buenos resultados, casi exactos a la temporización teórica (jitter 0), aunque el EJC también se puede utilizar Referencias [1] Agosta G., Crespi S., Svelto G. (2006) Jelatine: a virtual machine for small embedded systems. Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems. Francia. [2] ajile Processor,

158 150 XI Jornadas de Tiempo Real (JTR2008) [3] Chen,G. (2001) PennBench: a benchmark suite for embedded Java, Proceedings. WWC IEEE International Workshop on Workload Characterization, pp [4] Corsaro, A., (2002) Evaluating real-time Java features and performance for real-time embedded systems, IEEE Real-Time and Embedded Technology and Applications Symposium, Proceedings., pp [5] Debardelaben, J.A., (1997) Incorporating cost modeling in embedded-system design, Design & Test of Computers, IEEE, pp [6] Debbabi, M., (2005) Accelerating embedded Java for mobile devices, Communications Magazine, IEEE, pp [7] Hardin, D.S., (2001) Real-Time Objects on the Bare Metal: An Efficient Hardware Realization of the Java TM Virtual Machine, Proceedings. 4th International Symposium on Object- Oriented Real-Time Distributed Computing, pp [8] Holgado J.A., Moreno A., Capel M. (2007) Java-based Adaptable Middleware Platform for Virtual Instrumentation. Proceedings IEEE Int. Conf. on Virtual Environments, Human- Computer Interfaces and Measurement Systems. Editorial: IEEE. Italia, pp [9] Holgado J.A., Viúdez J. (2007) Arquitectura abierta basada en Java para Entornos Domóticos. 3rd Int. Symposium on Ubiquitous Computing & Ambient Intelligence. Zaragoza, pp [10] Holgado J.A., Viúdez J, Capel M.I., Montes J.M. (2006) Diseño de un Sistema de Control Domótico basado en Java. Actas de las XXVII Jornadas de Automática (CEA), Almería, pp [11] Java 2 Micro Edition, Sun Developer Network, Sun Microsystems, [12] Java Card Documentation, Sun Microsystems, [13] Java SE Real-Time, Sun Microsystems, [14] Javelin Stamp de Parallax, [15] Kuchana, P, (1994) Software Architecture Design Patterns in Java, Auerbach Publications. [16] Meyer, B. (2000) Object-Oriented Software Construction. 2nd Edition. Prentice-Hall [17] Mulchandani D., Java for Embedded Systems, IEEE Internet Computing, Vol (1998) [18] Nilsson, A., (2001) Deterministic Java in Tiny Embedded Systems, Object-Oriented Real- Time Distributed Computing, ISORC Proceedings. Fourth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, pp [19] PersonalJava Application Environment, Sun Microsystems, [20] Schoeberl, M., (2004) Restrictions of Java for Embedded Real-Time Systems, Object- Oriented Real-Time Distributed Computing, Proceedings. Seventh IEEE International Symposium on Object-oriented Real-time Distributed Computing, pp [21] Schoeberl, M., (2005) JOP: A Java Optimized Processor for Embedded Real-Time Systems, Memoria de Tesis Doctoral, Vienna University of Technology. [22] Sharp D., Pla E., Luecke K. (2003) Evaluating mission critical large-scale embedded system performance in real-time. Proceedings of the 24th IEEE International Real-Time Systems Symposium. IEEE Computer Society, [23] Siebert F. (2002) Bringing the full power of java technology to embedded realtiem applications. MSy'02 Embedded Systems in Mechatronics, Oct 2002, Winterthur, Switzerland [24] Snap. Imsys Development tools, [25] Snijder, [26] Systronix, [27] Stewart D. (2006) Measuring execution time and real time performance. Embedded Systems Conference. Boston. [28] Strom O., Svarstad K. and Ass E. (2003) On the Utilization of Java Technology in

159 5. Herramientas de Desarrollo 151 Embedded Systems. Design Automation for Embedded Systems, 8, (2003). [29] Tan, Y.Y. Yau, C.H. Lo, K.M. Yu, W.S. Mok, P.L. Fong, A.S. (2006) Design and implementation of Java processors. Computers and Digital Techniques, IEE Proceedings, pp [30] Tini. Tiny Internet Interface de Maxim, [31] Viúdez J., Holgado J.A. (2007) "Diseño y construcción de una maqueta domótica controlable a través de microcontroladores Java. V Jornadas de Enseñanza a través de Internet/Web de la Ingeniería de Sistemas y Automática (EIWISA 07). Zaragoza, pp [32] Viúdez J. (2007) Plataforma para el diseño de Sistemas de Control en entornos empotrados basada en Java. Máster en Desarrollo de Software. Dpto. Lenguajes y Sistemas Informáticos. Universidad de Granada. [33] Wellings A., (2004) "Concurrent and Real-Time Programming in java" Ed. Wiley

160

161 6. Sistemas de Control

162

163 6. Sistemas de Control 155 A Taxonomy on Prior Work on Sampling Period Selection for Resource-Constrained Real-Time Control Systems Camilo Lozoya, Manel Velasco, Pau Martí, José Yépez, Frederic Pérez, Josep Guàrdia, Jordi Ayza, Ricard Villà and Josep M. Fuertes Automatic Control Department Technical University of Catalonia Abstract In this paper we present a non-complete taxonomy on prior work on sampling period selection for resourceconstrained real-time control systems. Selection of sampling periods for real-time control tasks determines resource utilization (or alternatively, task set schedulability) as well as overall control performance. At the end, it determines the sequence of control tasks instances executions over time, that is, the schedule. Different schedules are obtained depending on which criterion is used to select sampling periods, what real-time paradigm is demanded in the underlying executing platform, who should decide which task to execute, when the decision is taken, and how the decision is enforced. To analyze all these aspects, ten papers from the last decade, one per year from 1998 to 2007, have been selected and categorized. The taxonomy, although being incomplete, reveals key tendencies and rises important research challenges. 1. Introduction Traditionally, computer controlled systems are implemented using real-time periodic tasks. Each periodic task is statically assigned a period obtained following wellestablish procedures that mandate to sample and control periodically. However, the embedded systems market requires systems with more and better functionalities at lower prices. A consequence is that control applications must be implemented in platforms where resources are scarce and/or where increasing performance is a must. And the traditional static periodic approach to control systems implementation fails at minimizing resource utilization and maximizing control performance. To provide solutions fulfilling the tight demands posed by modern embedded systems, the control and real-time communities have shown in recent years a renewed interest on deriving novel sampling period selection methods for efficient implementation of real-time control systems. In this paper we present a non-complete taxonomy of some of these methods. Although most of them have focused on CPU, many can also be adapted to networks or battery limited platforms. Many of the novel methods for sampling period selection go beyond than just finding the best values for control task periods. They provide complete real-time frameworks tailored to effective concurrent execution of control tasks. They are characterized by which criterion is used to select sampling periods, thus establishing what real-time paradigm is demanded in the underlying executing platform, who should decide which task to execute, when the decision is taken, and how the decision is enforced. To analyze these frameworks and their key properties in terms of Which, What, Who, When and How, ten papers [1]- [10] from the last decade ( ) have been arbitrarily selected and categorized. Papers are listed in chronological order in the references section to provide an initial view of the contributions over time. The selection of one paper per year, although being incomplete in terms of time coverage and number of papers, collects a wide variety of approaches while revealing the key existing tendencies on control task scheduling, and rising important research challenges. 2. Taxonomy A key aspect of these new methods is the theoretical criterion used to obtain the set of sampling periods. From the criterion, key aspects are derived and analyzed in order to construct the taxonomy summarized in table Criterion Two main criterion can be identified: optimization approach or bounding the intersampling dynamics.

164 156 XI Jornadas de Tiempo Real (JTR2008) Table 1. Taxonomy of sampling period selection approaches. Which What Who When How Criterion Triggering Triggering Solving Solution Timing Sched. Paradigm Entity the problem Constraints [1] Set98 Optimizat. TT Coord. Off-line Periods Static periodic EDF/FP [2] Arz99 Bound dyn. ET Task On-line Periods Aperiodic ET Missing [3] Reh00 Optimizat. TT Coord. Off-line Sequences Static pseudo periodic Cyc. Ex. [4] Hri01 Bound dyn. TT Coord. Off-line Sequences Static pseudo periodic Cyc. Ex. [5] Pal02 Optimizat. TT Coord. Off-line Periods Static periodic EDF [6] Cha03 Optimizat. TT Coord. Off-line Periods Static periodic EDF/FP [7] Mar04 Optimizat. TT Coord. On-line Periods Varying periodic EDF [8] Hen05 Optimizat. TT Coord. On-line Periods Varying periodic EDF [9] Tab06 Bound dyn. ET Task On-line Periods Aperiodic TT Missing [10] Lem07 Bound dyn. ET Task On-line Periods Aperiodic TT Elastic sch. In the optimization approaches ([1], [3], [5], [6], [7], [8]) sampling periods are selected to solve an optimization problem. They assume that there is a cost function parameterized in terms of control performance and sampling periods that has to be minimized or maximized depending on whether it denotes penalty or benefit. The optimization problem domain is restricted by closed loop stability and task set schedulability constraints. In the approaches based on bounding the intersampling dynamics ([2], [4], [9], [10]), sampling periods are selected to keep each closed loop dynamics within predefined thresholds. Thresholds, which are derived from pure control theoretical approaches, are used to bound changes in the dynamics or to ensure closed loop stability. It is important to identify whether the theoretical criteria capture the dual problem posed by modern embedded systems: minimizing resource utilization and maximizing control performance. In all the optimization approaches the duality is captured by the cost function and optimization constraints. However, the bounding approaches usually only capture control performance issues. Therefore, utilization, and at the end schedulability, is not addressed Triggering paradigm and entity These two criteria influence whether the period selection solution requires a real-time architecture following a time-triggered (TT) or event-triggered (ET) paradigm. All the solutions to the optimization approaches require a time-triggered architecture while almost all the solutions based on bounding closed-loop dynamics require an eventtriggered architecture, except for [4]. The classification considers who is in charge of selecting sampling periods (triggering entity). All solutions requiring a TT architecture are based on a global coordinator that decides the best periods for the set of control tasks. On the contrary, in the solutions requiring an ET architecture control tasks are in charge of deciding their periods When to solve the problem The previous classification (TT vs. ET) relates to whether the period selection is performed off-line or online. In all ET approaches ([2], [9], [10]) periods are derived on-line. However, in the TT approaches, some solutions have to be computed off-line ([1], [3], [4], [5], [6]) while the rest are on-line ([7], [8]). It is important to identify when the sampling periods are selected for two main reasons. First, computational overhead must be considered, which may be a disadvantage for on-line approaches. Second, ability to adapt to workload changes, this means that varying available resources or varying demands from the control applications has to be also accounted for, which may be an advantage for on-line approaches Solution and its enforcement using real-time technology Once periods are selected, they must be enforced by the underlying real-time architecture. Therefore, it is important to examine how the solutions are enforced. This can be analyzed in a three step procedure: first, looking at the solution in more detail; second, looking at the type of demanded timing constrains; and third, looking at scheduling policies capable of enforcing such timing constraints. Solution. Although the taxonomy reviews methods for sampling period selection, two of the methods ([3], [4]) do not establish sampling periods. Rather they provide periodic sequences of ordered control task instances. All the others provide periods, time intervals or timing bounds

165 6. Sistemas de Control 157 that establishes when tasks have to be executed, i.e., they provide diverse control tasks timing constraints. Timing constraints. All solutions following a TT paradigm impose periodic timing constraints for control tasks. In particular, [1], [5], and [6] specify static periodic timing constraints for control tasks. That is, the outcome of solving off-line their particular methods is a set of periods for control tasks, which will not change at run-time, named static periodic. Similarly, [3] and [4] specify static periodic sequences of ordered tasks instances that will not change at run-time. Looking at a single task, this would be a static pseudo-periodic execution. Finally, the on-line solution to the frameworks presented by [7] and [8] provide periods for control tasks that will change at run-time, named varying periodic. The timeliness of the solutions demanding an eventtriggered architecture is in the general case aperiodic. However, different type of aperiodicities can be distinguished. In [2], the execution is purely aperiodic and it is triggered whenever an external event detected, using specific hardware (event detector), is identified. For control safety reasons, an upper bound is imposed in order to force an execution if no events are detected. The triggering condition can not be predicted. In [9], although tasks execute aperiodically, a lower bound on the inter-arrival of job executions is predicted at each job execution, thus indicating an sporadic task behavior. Finally, in [10], the execution is again aperiodic, but with the advantage that at each job execution the next job deadline is predicted. Both approaches [9] and [10] can be considered as aperiodic time-triggered. Scheduling. All solutions demanding a time-triggered architecture can enforce the derived timing constraints for control tasks using well known scheduling strategies. In particular, [3] and [4] can exploit cyclic executives, while the rest can exploit scheduling policies for periodic tasks, such as earliest deadline first (EDF) and fixed priority (FP). Table 1 specifies for each of these solutions what scheduling policy can be applied. For the on-line approach [7] a specific resource allocator that computes on-line the sampling periods is required before the EDF dispatching. In the other on-line approach [8], EDF can be applied directly because the computation of the periods is performed periodically for a particular task named feedback scheduler task. For the solutions demanding an event-based architecture, the scheduling policy, that can enforce the presented solution, is lacking in the general case. Only the result provided in [10] integrates the presented even-triggered control with existing scheduling theory. At each job execution the deadline for the following job is predicted and the elastic scheduling is invoked to accommodate the new timing demands, considering the whole task set. However, if the elastic scheduling can not met them, problems may occur Discussion The presented taxonomy mainly considers key real-time aspects of the reviewed methods. However, some other aspects have been omitted: Which task model is used in terms of avoiding/reducing sampling and latency jitters? In the optimization problem, are the solutions general or depend on each controlled system? Are they exact (closed forms) or approximates? Solving them also means to obtain the appropriated controller gains?. Looking more at control aspects, questions not analyzed are: Which type of controllers support the presented solutions? Are observers considered? Is noise also considered? Overall, many questions have not been analyzed. However, the taxonomy is not closed, and all the previous questions can be incorporated. In addition, many existing papers on sampling period selection (and related issues) for real-time control systems have not been cited nor analyzed. They could be also included in the taxonomy. However, the main tendencies that the taxonomy reveals and that we analyze next would remain the same (or very similar). 3. Tendencies To identify tendencies, we focus the attention to columns When and What of Table 1, reading them chronologically, from top to bottom. The When column, that refers to whether the sampling period selection is performed off-line or on-line clearly shows a tendency to on-line approaches. This tendency reflects and aims at meeting the demands of modern embedded systems that are required to work in dynamic environments, being adaptive to the available resources that can change abruptly, or to the resource demands of control applications that can be considered as varying depending on the state of the controlled plants. The What column refers to whether the presented solution requires a time or event triggered real-time architecture. The first conclusion that can be extracted is that TT solutions are more common that ET solutions. But more revealing, it shows a tendency toward event-triggered approaches. A reason for such trend could be to force up to the limit the logics of the periodic on-line approaches ([7] or [8]). In on-line approaches, sampling periods are hold until the sampling period selection procedure takes place, and new periods are derived. This logic has been shown to be effective at minimizing resource utilization and/or at maximizing control performance. Forcing this logic to the limit means executing the sampling period selection procedure each time a control task instance executes, which may

166 158 XI Jornadas de Tiempo Real (JTR2008) provide even better resource utilization and better control performance. 4. Research challenges Key tendencies indicate a current interest on on-line event-triggered approaches. However, there is a recognized lack of scheduling support for event-based control theoretical results [2], problem that can be also identified by looking at the last column of table 1: scheduling solutions for [2] and [9] are missing, and the scheduling solution adopted by [10] may not be able to fulfill the demanded timing. A key property of solutions demanding a TT architecture is that they include resource constraints in the theoretical criterion. Therefore, task set schedulability is already considered for example in terms of utilization tests, as mentioned in Section 2.1. However, the criteria used to derive sampling periods for the ET approaches do not consider resource constraints in the formulation. Therefore, the existing solutions do not implicitly solve the scheduling. To overcome this limitation, several solutions can be envisioned. In the last two reviewed papers, [9] and [10], tasks decide their rate of progress in an event-triggered fashion. Specifically, at each control task instance execution, the same information used to compute the next value of the control signal is also used to decide when the control task has to be applied again. The latest decision could also consider resource utilization. As mentioned earlier, resource utilization is usually expressed in terms of the utilization factor. However, this measure is atemporal and pessimistic. A more appealing metric for considering when the next instance has to be executed may be the synthetic utilization factor, which depends on time and was developed for aperiodic scheduling. Future work will explore this approach. A more radical solution envisions the dispatching of aperiodic jobs without characterizing them with timing constraints (periods and deadlines). The solutions presented in [9] and [10] translate at each job execution, control demands into next job timing constraints. However, an alternative vision could be to avoid translating control demands into timing constraints. The approach would require to have all control tasks always ready to execute, and to have a global coordinator that picks each time the most appropriated task for execution. In this case, concepts like task set schedulability will not hold, and equivalent concepts should be derived. For example, schedulabiliy could mean stability in the sense of determining how many control loops can be kept stable. Future work will also explore this approach. 5. Conclusions A 10-Year taxonomy on prior work on sampling period selection for resource-constrained real-time control systems has been presented. The taxonomy shows that current trends point to on-line event-triggered approaches, that is, to select periods at run-time generating aperiodic task instances executions. In addition, research challenges for effective building these type of solutions have been presented and discussed. References [1] D. Seto, J.P. Lehoczky, L. Sha, Task Period Selection and Schedulability in Real-Time Systems, IEEE Real-Time Systems Symposium, 1998 [2] K.-E. Årzén, A Simple Event-Based PID Controller, 14th World Congress of IFAC, January, [3] H. Rehbinder, and M. Sanfridson, Integration of offline scheduling and optimal control, 12th Euromicro Conference on Real-Time Systems, [4] D. Hristu-Varsakelis, Feedback control systems as users of a shared network:communication sequences that guarantee stability, 40th IEEE Conference on Decision and Control, 2001 [5] L. Palopoli, C. Pinello, A. L. Sangiovanni-Vincentelli, L. Elghaoui and A. Bicchi, Synthesis of Robust Control Systems under Resource Constraints, Hybrid Systems: Computation and Control, [6] R. Chandra, X. Liu, and L. Sha, On the Scheduling of Flexible and Reliable Real-Time Control Systems, Real-Time Systems 24(2), March [7] P. Martí, C. Lin, S. Brandt, M. Velasco and J.M. Fuertes, Optimal State Feedback Based Resource Allocation for Resource-Constrained Control Tasks, 25th IEEE Real-Time Systems Symposium, Lisbon, Portugal, December [8] D. Henriksson, and A, Cervin, Optimal On-line Sampling Period Assignment for Real-Time Control Tasks Based on Plant State Information, 44th IEEE Conference on Decision and Control and European Control Conference ECC 2005, December [9] P. Tabuada and X. Wang, Preliminary results on statetriggered scheduling of stabilizing control tasks, 45th IEEE Conference on Decision and Control, December [10] M. Lemmon, T. Chantem, X. Hu, and M. Zyskowski, On Self-Triggered Full Information H-infinity Controllers, Hybrid Systems: Computation and Control, April 2007

167 6. Sistemas de Control 159 Distributed Control of parallel robots using passive sensor data Asier Zubizarreta, Itziar Cabanes, Marga Marcos, Dario Orive, Charles Pinto University of the Basque Country Abstract The present article introduces a novel control architecture for parallel robots. A closed form of the dynamic model of parallel robots is difficult to obtain, due to the complex kinematic relations of these kind of mechanism. However, with the use of the extra data provided by passive sensors, kinematics and dynamic modelling can be simplified. The dynamic model can be used to implement advanced control techniques to improve que efficiency of parallel robots. In this paper, mono and multiarticular control techniques are implemented on a 5R parallel robot, showing that the use of the of extra sensor data leads to better and accurate control. 1 Introduction Accuracy and high-speed operation are two opposed characteristics required for actual robotic applications. However, actual serial robots cannot operate a high-speed without showing poor accuracy due to their serial chain structure. As an alternative, Parallel Kinematic Robots are proposed [14]. This kind of mechanisms are composed of a endeffector platform and a group of serial subchains joining it to a fixed base. This structure provides a high stiffness that makes them appropriate for high loads or task in which high-speed and accuracy are required. However, to exploit the full potential of these robots advanced control techniques based on the dynamic model are necesary. This can be a difficult task, as these mechanisms have highly coupled kinematics and dynamics, which, in most cases, cannot be solved in closed form and require numerical approaches. As there is not an extended generalized approach to dynamic modelling of parallel robots, most control approaches in the literature belong to PID based monoarticular control techniques, leaving model-based control an almost unexplored area. As most of the control techniques are based in serial robot control schemes, researchers contributions in this area can be grouped in two general approaches: monoarticular and multiarticular control. In monoarticular control each actuated joint is considered separately as a system, so the rest of the robot is considered as a disturbance over it. However, its efficiency is poor in high-speed or precision tasks due to effect of the dynamics of the rest of the robot. To increase the rejection factor of this disturbance, Chiacchio, et al. [2] propose to add to the traditional servocontrol scheme an external acceleration loop. As acceleration is difficult to measure, they propose an State-Space filter to calculate it. They also conclude that including a feedfordward loop with the inverse dynamics of the actuated joint improves the control efficiency. Feed-fordward loops are also implemented on [1] to reduce the effect of dynamic coupling between actuators in a 2DOF parallel robot. Another proposed technique to reduce the positioning error is to implement a PD plus gravity compensation control, which requires calculating the gravity term of the dynamic model of the robot. Gunawardana and Ghorbel in [6],[8],[7] apply this control scheme to the 5R parallel manipulator. The gravity term is calculated based on the dynamic modelling method proposed by the authors and making use of the principle of virtual work and reduced modelling technique. Su, et al. [18] combines this technique with a cross-coupling control, in order to compensate the disturbances of the dynamics of the rest of the robot on the actuated joint. Multiarticular control, on the other side, considers the whole robot as a system and controls its actuated joints considering the coupling between

168 160 XI Jornadas de Tiempo Real (JTR2008) the joints. If the model is accurate, this technique presents better efficiency than the previous one. The most extended control technique in this group is Computed Torque Control (CTC), whose basic idea is to compensate the nonlinear dynamics of the robot using the inverse dynamic model of the robot. Due to the difficulty of obtaining this model, few approaches can be found in the literature. Codourey, [3] obtains its inverse dynamic model and uses it to implement the CTC control scheme, which is found to reduce the tracking errors in a pick-and-place application compared to monoarticular control schemes. A similar study with the same conclusions is done in Denkena and Holtz in [4], in which CTC scheme is applied in the 6 DOF PaLiDa robot. However, to accurately implement the CTC scheme, the inverse dynamic model s parameters must be identified, which is not an easy task in most cases. To avoid this, some adaptive control schemes have been combined with CTC schemes. Honegger, et al. in [9] applies this method to the 6 DOF Hexaglide, Pietsch, et al. in [17] uses the 2 dof PORTYs robot. Another technique to compensate the unmodeled dynamics and the parameter variation is the use of the Dual Model-Based Control proposed by Li [10]. Along with these two approaches, some authors have tried to apply other control techniques to parallel robots, most of them directly exported from serial ones. That is the case in [19], where Vivas and Poignet apply a Predictive Control Scheme to H4 robot. Garrido [5] applies vision control techniques to a redundant 2 DOF planar parallel robot. Another interesting approach is the Design for Control approach, proposed by Li [11], where the objective is to design parallel robots in order to obtain a simple dynamic model. The basic idea is that complex dynamics require complex control techniques, and simple dynamics require simple control. This way, a simple control can be used to control the system, obtaining a similar performance level than that obtained with an advanced control technique. As stated above, to obtain a precision and high speed robot operation, advanced and model-based control techniques are required. However, due to the construction of parallel robots, the control loop, even those proposed by advanced controls, only consider active joints. This way, and from control engineering point of view, the rest of the mechanical structure, that is, the structure from the active joints to the end effector, remains on openloop.therefore the accuracy of the positioning of the end-effector regards on the accuracy of modeling and parameter identification. To improve control, some authors have proposed the use of extra sensors in passive joints [13]. Although sensors can be difficult to introduce in the mechanical structure, their advantages compensate the effort. For instance, with a certain number of sensors in strategic passive joints, direct kinematic model can be obtained analytically in most cases, and the dynamic model can be simplified. This approach is applied by Marquet, et al. in [12] to H4 robot, using simple monoarticular PID control. Their result show an increase on the positioning accuracy. In this paper, based on this approach, a modified CTC scheme is presented. The objective is to introduce passive sensor data in the control scheme in order to increase positioning accuracy and speed. For that purpose simulations on the 5R parallel robot are presented. The rest of the paper is organized as follows: Section 2 presents the 5R parallel robot and its kinematic and dynamic model. Section 3 describes the CTC control scheme with extra sensor data. On section 4 control performance data based on Matlab-ADAMS cosimulation is presented. On section 5 the most reults are summarized. 2 Kinematic and Dynamic Modelling of 5R parallel robot considering passive joints The 5R 2-DOF parallel robot consists of 4 mobile links connected by 5 revolute joints. Active and passive joints are identified in Figure 1. Where q ai correspond to the actuated or[ active ] qa joints and q pi to the passive ones, and q =, q p q = dq dt and q = d dq dt[ dt. ] Cartesian coordinates will x be denoted as x =. y Kinematic and Dynamic Models considering only active sensors are obtained in [8].

169 6. Sistemas de Control 161 Figure 1: 5R Robot Structure 2.1 Kinematic Modelling In this section, the kinematic model of the 5-bar mechanism considering the passive joints is presented. Figure 2: Triange Decomposition Considering the triangle decomposition of Figure 2, direct and inverse kinematic models can be derived Direct Kinematic Model Using Direct Kinematics Model, cartesian coordinates x can be obtained from articular coordinates q. Based on Figure 2, Direct Kinematics relating q and x can be easily obtained as a redundant equation system: x = L 1 cos q a1 + l 1 cos q p1 = L + L 2 cos q a2 + l 2 cos q p2 y = L 1 sin q a1 + l 1 sin q p1 = L 2 sin q a2 + l 2 sin q p2 (1) Inverse Kinematic Model Inverse Kinematic Model relates cartesian coordinates x and articular ones q. Based on Figure 2 and using the Cosine Theorem and trigonometric relations, the following expressions can be obtained: ( l 2 ( ) 1 q a1 = ± arccos L2 1 x 2 +y 2) 1(x 2 +y 2 + arctan y )L x 1 ( l 2 ( ) 2 q a2 = π ± arccos L2 2 (L x) 2 +y 2) 1((L x) 2 +y 2 + arctan y )L L x 2 ( ) l 2 1 q p1 = ±π arccos L12 + (x 2 +y 2) 2L 1 l 1 ( ) l 2 2 q p2 = ±π arccos L22 + ((L x) 2 +y 2) 2L 2 l Jacobian Matrix From expression (1) and differentiating with respect to time: With: (2) J e q a + J s q p = 0 (3) [ L1 sin q Js = a1 l 1 sin(q a1 + q p1 ) L 2 sin q a2 + l 2 sin(q a2 + q p2 ) L 1 cos q a1 + l 1 cos(q a1 + q p1 ) L 2 cos q a2 l 2 cos(q a2 + q p2 ) [ ] l1 sin(q Je = a1 + q p1 ) l 2 sin(q a2 + q p2 ) l 1 cos(q a1 + q p1 ) l 2 cos(q a2 + q p2 ) 2.2 Dynamic Modelling To implement CTC control Scheme considering passive joint sensor data, it is necessary to consider the passive joints information in the Inverse Dynamic Model (IDM). For that purpose, a reduced model based method is used, based on those presented by Nakamura [16], Murray[15] and Ghorbel [7]. The main idea is to divide the 5R robot though the end effector obtaining 2 serial subchains. These two subchains are considered as fully-actuated and their dynamic model is obtained using the Lagrangian or Newton-Euler formulation. Therefore, a set of 4 equations are obtained, corresponding to both passive and active joints: τ r = τ a1 τ a2 τ p1 τ p2 = D r q + C r q + G r (4) Using the proposed formulation, the virtual torques of the reduced model τ r are projected [ on] τa1 the active articular coordinate space τ = τ a2 by means of the transformation matrix T defined as: [ T = J 1 s ] I J e This matrix has been obtained by several authors [7][15][16] applying the Principle of Virtual Work. (5) ]

170 162 XI Jornadas de Tiempo Real (JTR2008) So, the inverse dynamic model, respect to the articular coordinates q: τ = T T (D r q + C r q + G r) (6) coordinates and velocities of the 4 passive and active joints to generate the feedfordward compensation torque. The control algorithm can be written as stated in equation (8). τ = T T D r (K p e + K v ė)+t T (D r q d + C r q + G r) (8) 3 Control Schemes using extra sensor data Based on the Kinematic and dynamic model, two control strategies are analyzed: monoarticular control, considering one PID control loop for each actuator, and CTC Scheme, using the inverse Dynamic model of the 5R. The implementation of these two schemes, when applied only to active joints, is the same as on serial robots. However, when considering the extra passive sensor data, the system turns redundant, as there are more error signals than actuators. On monoarticular control, as there only exist two actuators in the robot, the T transformation matrix is required to generate a combined error signal, as proposed in [12] (Figure 3). As stated in (3), this matrix is composed by the Jacobian matrix of the robot, that projects the 4-Dimensional error e = q d q to the 2-Dimensional e needed to implement the control loops (equation (7)). Figure 3: PID Scheme with Extra Sensors e = T T e (7) Based on this idea, the CTC Scheme can be modified to consider passive joint sensor information, as it can be seen in Figure 4. In this case, the 4 error signals are multiplied by T T D r in the feedback loop to obtain a 2 Dimensional decoupled error signal. The IDM, defined on equation (6), needs the Where q d is the desired acceleration. Figure 4: CTC Scheme with Redundant Sensors As stated in Section 1, in theory adding extra sensor information leads to a scheme with improved robustness against model uncertainties. 4 Simulation and Results To validate the proposed control schemes, a set of experiments have been conducted with the 5R parallel robot. Using ADAMS Multibody Software, the robot has been modelled, and its parameters identified. Then, the control loop has been implemented on Matlab/Simulink enviroment, using ADAMS as the plant. In this Co-Simulation Enviroment, the 4 control schemes previously introduced (PID and CTC with and without redundant sensors) have been studied. Model parameters have been identified as follows. Kinematic Parameters: L = 0.35, L 1 = 0.15, L 2 = 0.3, l 1 = 0.25, l 2 = 0.2. Link Center of mass position:l cl1 = 0.071, l cl2 = 0.146, l cl1 = 0.11, l cl2 = 0.1. Link masses: m L1 = , m l1 = , m l2 = , m L2 = Passive sensor masses: m si = with i = 1, 2, 3. Link inertia moments: I L1 = , I L2 = ,I l1 = , I l2 = Sensor inertia moments:

171 6. Sistemas de Control 163 = with i = 1, 2, 3. All data in IMS units. PID control has been experimentally tuned, obtaining the following parameters: K p1 = 300, K d1 = 10, K i1 = 0.1 and K p2 = 1000, K d2 = 20, K i2 = The K p and K d gains on CTC Control Scheme have been tuned to obtain a maximum overshoot of 10% and a peak time of 1ms. An example nonsingular trajectory has been defined on cartesian coordinates. Using the Inverse Kinematics defined in Section 2.1.2, the trajectories for the 4 active and passive joints have been derived, and introduced in the control schemes. The comparative analysis of control schemes consists of studying the ISE, IAE and ITAE performance indexes of the end-effector positioning error. To show the effect of parameter uncertainty, the model parameters have been randomly modified by 10% of ther actual value an 10 iterations of co-simulation have been run. Results are summarized in Table 1. Table 2 illustrates the relative improvement percentage of the extended approaches considering passive sensor data and traditional schemes. The first conclusion, as stated previously by other authors, is that model-based CTC control scheme improves the control performance of the monoarticular PID control. In second place, results show that when passive joint sensors are introduced, control performance of the corresponding control scheme considering only active joints is improved. it is also worth noting that, refering to ITAE index, the improvement is higher in y coordinate than in x coordinate. In the case of PID control scheme, ITAE index variation percentaje negative, but near zero. This implies that the proposed scheme has similar performance than classical one in that coordinate, due to model uncertainties. However, due to gravity effect, in y coordinate, the improvement is much higher. Finally, the use of passive joint sensors also increases the robustness to model parameter uncertainity. This proves the effectiveness of the approach. 5 Real-Time Implementation on Labview-RT I si Figure 5: Reference Trajectory The previously proposed strategies are going to be implemented in the 5R parallel robot prototype designed and constructed on the Control Engineering Department, with the collaboration of the Comp- Mech Research Group of the Department of Mechanical Engineering. The prototype structure, (Figure 6) is made by four aluminium links. The actuated joints are driven by two Maxon EC32 motors and controlled by respective EPOS 24/5 Position Controllers. The three passive joints have been sensorized with absolute encoders (500 pulse per turn). Figure 6: 5R Prototype with Aluminium structure The control system is implemented in a commercial PC running Labview-RT and using a classical Host-Target architecture. The communication with the motor controllers is made using CANOpen protocol. The passive joint sensors are read using RS-