Actas del WSRFAI 2013

Tamaño: px
Comenzar la demostración a partir de la página:

Download "Actas del WSRFAI 2013"


1 Actas del WSRFAI 2013 Luis Baumela (Editor), Universidad Politécnica de Madrid September 2013 ISBN

2 ii

3 Invited Speakers: Organization Christian Theobalt Max Planck Institut / Saarland University Local Organizing Committee: Luis Baumela, José Miguel Buenaposada, Antonio Fernández Baldera, Pablo Márquez Neila, Universidad Poliécnica de Madrid Universidad Rey Juan Carlos Universidad Politécnica de Madrid Universidad Politécnica de Madrid Program Committee: José Luis Alba, Luis Baumela, José Miguel Buenaposada, Joan Martí, Luisa Micó, Carlos Orrite, Roberto Paredes, Francisco Perales, Nicolás Pérez de la Blanca, Filiberto Pla, María Vanrell, Jordi Vitrià, Universidade de Vigo Universidad Politécnica de Madrid Univ. Rey Juan Carlos Universitat de Girona Universitat d Alacant Universidad de Zaragoza Universitat Politècnica de València Universitat de les Illes Balears Universidad de Granada Universitat Jaume I Universitat Autònoma de Barcelona Universitat de Barcelona iii

4 iv

5 Conference Program September, 19th 1 Posters Session 1 1 Human Action Recognition with Limited Labelled Data Mario Rodriguez, Carlos Orrite, Carlos Medrano 5 A Web-based Tool for Training Radiologists in Medical Imaging Christian Mata, Arnau Oliver, Joan Martí 9 Human-Computer Interaction for Optical Music Recognition tasks Jorge Calvo-Zaragoza, Jose Oncina 13 A Multimodal Genre Recognition Prototype Jose Francisco Bernabeu, Carlos Pérez Sancho, Pedro José Ponce de León Amador, Jose M. Iñesta, Jorge Calvo- Zaragoza 17 SAGH a supervised image hashing technique Guillermo García, Mauricio Villegas, Roberto Paredes 21 Multibiometric Authentication Based on Facial, Fingerprint and Hand Geometric Adaptative Templates Alejandro Vidal, Francisco José Perales 25 Un Estudio Comparativo de Descriptores de Características para la Segmentación de Sinapsis y Mitocondrias Kendrick Cetina, Pablo Márquez-Neila, Luis Baumela 29 Real-Time Multiple-Face Age Estimation in Portable Devices Elisardo González-Agulla, Enrique Argones-Rúa, Jóse Luis Alba-Castro 33 Towards the Improvement of Breast Density Estimation: Removing the Effect of Compression Paddle Mohamed Abdel-Nasser, Jaime Melendez, Meritxell Arenas, Domenec Puig Valls 37 Modelos computacionales de visión: una metodología bio-inspirada Maria Vanrell 41 Segmentation of Breast Masses Through Texture-Based Classification Rosario Denaro, Jaime Melendez, Domenec Puig Index of Authors 45 v

6 vi

7 Human Action Recognition with Limited Labelled Data Mario Rodriguez Carlos Orrite I3A, University of Zaragoza, Spain Carlos Medrano Abstract Recording enough videos for training robust human action recognition systems is highly time consuming. However, it is possible to reduce this time by using the large amount of videos recorded in a wide range of scenarios available on the internet. In a modified version of the discrete Hidden Markov Model, designed in order to overcome the constraints of the classical approach when dealing with scarce training data, we introduce an initial transfer learning stage in the codebook creation. The human silhouettes are the input of our system, and as some human poses are shared among actions, actors or scenarios, we combine old shots with the current scenario shots so as to create a robust codebook. We have tested this framework in the IXMAS dataset and confirmed its usefulness. 1 Introduction The potential benefit of an automatic video understanding system in ambient intelligent applications (e.g. visual monitoring of elderly and disabled people at home) has stimulated much research in computer vision, especially in the areas related to human motion analysis, see a recent survey by [10]. Figure 1. Samples from real world scenarios and sources of variability However, in most of the real scenarios, the amount of labelled footage is not enough to train a robust human action recognition system. For instance, it is impossible for a newly installed activity recognition system to collect sufficient amount of clean and precisely labelled training videos in a short period. Therefore, it becomes desirable to develop a machine learning procedure able to cope with this problem. On the other hand, nowadays there exit a large amount of human action videos elsewhere, some of them coming from public dataset. Although these videos may not be directly relevant to the current recognition task, it is always possible to extract useful information from them and boost the current recognition task. We can notice in the images of Figure 1a the high variability involving human activities. In Figure 1b the main sources of variability are represented: (i)actor, with different shapes, clothes and performances, (ii) Scenario, changing the background or illumination and (iii) Camera Settings. Despite the efforts made in general purpose human action recognition systems, the described variability sources forces to a need of recording training videos in the objective scenario. However, the recording process is expensive in time because obtaining enough repetitions involves to wait until enough actions happen in the normal evolution of the scene or expensive in accuracy if we use actors performing the desired actions, implying a lack of natural behaviour. So, the aim of this work consists in minimizing the limitations arisen from the need of these training examples. Our approach is based on the premise that several human poses coincide among different actions. This coincidence allows us to learn most of the possible human poses using just an adequate set of human actions. In addition to the labelled sequences in the new scenario, we combine them in a transfer learning stage with the available databases in order to extract key human poses shared among actions and then we model an adequate poses codebook. 1

8 2 Proposed system Every recognition system needs a training stage where several parameters are estimated. Usually, this initial process is carried out by learning from several examples of the desired classes in the specific scenario, and the number of examples is both, crucial for the accuracy of the results and expensive to increase. In particular, we propose the use of a modified Hidden Markov Model (HMM) recognition system suitable for limited training examples called Fuzzy Observation HMM (FO-HMM) [4]. HMM is a widely used tool in sequences recognition systems [6]. It is a probabilistic model in which the process modelled is assumed to be a Markov Chain composed by a finite set of states. The states are not observable (they are hidden) but every observation of the sequence generated by the HMM obeys a probability distribution associated to a specific state. The HMM is then defined by three sets of parameters: (i) a vector with the initial state probability (ii) a matrix with the state transition probabilities and (iii) a probability distribution per state defining the possible observations in the specific state. In relation to the third set of parameters, the classical approaches can be divided in two: (i) discrete-hmm where the possible observations in a state belong to a codebook and (ii) continuous-hmm where the possible observations are n-dimensional real vectors. With the aid of several sequences of observations and using a defined topology of the HMM it is possible to obtain the model parameters using optimization methods. In recognition, the likelihood that a sequence has been generated by a specific model is compared with other models. 2.1 Fuzzy Observation-HMM Figure 2. Observation probability distribution. Continuous-HMM (a). Discrete-HMM (b). FO-HMM (c). With limited number of labelled sequences both classical approaches suffer from limitations mainly in the observation probability distribution training. Taking into account that we are working with sequences of real vectors codifying human poses, it seems reasonable the use of the continuous-hmm. However, the limited training data produces some unwanted results. Looking the continuous example in Figure 2a we see red dots representing the training observations and two ellipses surrounding them representing a Mixture of Gaussians (MoG) that model the probability distribution of the observations. With an adequate number of observations it is possible to train a reliable MoG representing the probability distribution but, as we dispose of sparse information, the obtained model is unreliable. Human motion has a high variability and the probability of new actions in positions out of the model is too high for a robust system. The discrete-hmm, on the other hand, constraints the freedom of a real vector to a specified codebook. In Figure 2b we see the same red dots in the same data space represented in the continuous example, but in this case we force the observation to belong to a specific codeword of a designed codebook. For its part, the codebook is trained with the examples available quantizing the data space and again, a higher number of training examples will produce a more suitable codebook. Moreover, the scarce training examples can lead to zero probability of some codewords, and then the likelihood estimation be zero because of just one outlier. In order to avoid this effect some kind of regularization use to be done, but without any intelligence. Finally, in Figure 2c we can see a modification in the use of the observations for training and testing the HMMs that we call Fuzzy Observations (FO). In our approach, we use a similar type of model as the discrete one, using the same codebook but applying it in a different way. We know that we are working with n- dimensional real vectors but the winner takes all rule removes much of the position information, so we replace this rule with a fuzzy assignation. Every observation has a probability of belonging to every codeword depending on the distance to the specific codeword. With this modification we solve the problem of zero probability but also we assign a more reliable probability in the codewords because some examples may land in a unclear place that the winner takes all rule force to a specific codeword but the FO distribute among every codeword. The use of FO implies an adaptation in the discrete-hmm algorithms explained in [4]. 2.2 Action recognition using FO-HMM with Transfer Learning We show in Figure 3 the diagram of the proposed framework. Due to the availability of the silhouettes in 2

9 Training Test Training Transfer Learning ML both the transferred poses and the new scenario poses in the same space. Afterwards, we proceed with a clustering algorithm obtaining the desired codebook. We use k-means as a general purpose clustering algorithm [2] because it has been widely used in the literature and allows us to evaluate our approach. As we have seen, FO-HMM is designed to deal with scarce training labelled sequences. So, we use this approach with the obtained codebook, training a FO- HMM per class. The recognition process consists on evaluating a test action in every model and deciding the class with Maximum Likelihood. Silhouettes Clustering + FO FO-HMM Maximum Likelihood Figure 3. Fuzzy observation HMM with Transfer Learning improvement the used databases we select this feature as representation of the human poses, codifying each action as a sequence of silhouettes. These features are not the most appropriate in a recognition system and current systems use other methods like SIFT or HOGs [8] [3], but our objective consists on the evaluation of the performance of the Transfer Learning and the FO-HMM so the silhouettes are acceptable. Additionally, as the silhouette space has a large dimensionality we use Principal Component Analysis (PCA) [1], widely used in the literature, in order to reduce the dimensionality. The limited number of labelled data for training gives rise to a sparse projected data space, which produces at least two problems: first, the data space is not conveniently modelled with the obtained clusters and second, the system lacks of the whole variability information. The second problem implies a rigid system where introducing new actions supposes a handicap. Attending to the large amount of human action videos elsewhere, and the premise of the existence of many shared poses among actions, we propose the use of an inductive transfer learning stage [5] in order to create the codebook, as shown in the diagram of the Figure 3. The idea is to introduce poses information from external videos doing a better model of the data space and providing a freedom in the system to introduce new classes. Although an in deep study of the transfer learning needs will probably leads to a better performance, we suggest that an intuitive selection of point of view and performed actions in the selected databases, or even just a random selections, will lead to an improvement of the system. Let us suppose that we have a pool of pose silhouettes belonging to existing databases and covering a wide range of actions. We train a PCA and we project 3 Approach Evaluation In order to evaluate the performance of the approach we have selected two public databases, one used as source dataset from which the transfer learning is done, and the other one used as target where the system is set up. As source for the transfer learning we use the virtual dataset ViHASi [7]. This dataset has been virtually created with 20 action classes, 9 different actors and 40 perspective camera views. The interest for using a virtual dataset in contrast to a real scenario lies in the easiness of the implementation of new action examples with desired characteristics. Without loss of generalization we can use the virtual dataset for our system, and the results can be extrapolated to real datasets. On the other hand, the new scenario is simulated with the IXMAS multi-view dataset [9]. IXMAS disposes of 13 actions performed each 3 times by 11 actors. The recording has been made from 5 viewpoints. As we consider a fixed viewpoint new scenario we use only one camera, and following the authors suggestion we discard two of the actors due to their irregular performance, so we dispose of 27 repetitions per action. The initial experiments are carried out over 5 classes of IXMAS (sit down, walk, wave, punch and kick) so as to allow an experiment where new action classes are included. Both datasets are provided with the silhouettes already extracted. Our main premise is that many poses are shared among classes, which implies that the obtained clusters should be composed by samples from several classes. In Table 1 we observe the ratio of clusters composed by samples from two or more classes (R num. cls.) as well as the average number of classes that compose a cluster (Avg. cls.). For this study we have used the selected 5 IXMAS classes which suppose silhouettes. Attending to these results, we can conclude that the clustering algorithm performs as expected because most of the clusters have samples from several classes. Proved that the clusters obey the initial premise, we have carried out some experiments that validate the use- 3

10 Table 1. Average number of classes composing the clusters K Avg. cls R num. cls fulness of this feature in the whole system. For training the transfer learning we have selected 7 actions from ViHASi (collapse, grenade, hero door slam, jump kick, punch, walk and walk turn 180) and a viewpoint from 20 perspectives with the same vertical angle. With only one sequence per class for training the 5 selected action classes in the new scenario we have obtained the following results. The only use of clusters from the new scenario (FO-HMM) and the combination with transfer learning clusters (FO-HMM+TL) provide similar success rates, 59.7% and 59.2% respectively. Figure 4 shows how the introduction of new actions affects to the performance. As the number of new actions increases the performance of the system gets lower and lower. However, the average success rate keeps higher when transfer learning is used. Figure 4. Average success rate increasing the number of classes. 4 Conclusions We have proved that transfer learning can improve the recognition rate with limited labelled data available for training. However, due to the large amount of available human action video, it is important an in-depth research in the selection of adequate information sources. Using our approach it is possible to create a set of pretrained clusters covering many features so that the setup of the system in a new scenario will just imply the selection of the right clustering and a short training stage. In the future, recognition systems will be scenario independent or at least able to adapt automatically. Although we are still far from it, new researches in transfer learning will help to move forward this goal. In this regard, some improvements have to be carried out in the codebook source data selection. Not to mention a better selection of the pose features, the dimensionality reduction process, and the clustering algorithm. Acknowledgements This work is partially supported by Spanish Grant TIN (MICINN) and FEDER and by regional government DGA-FSE. Mario Rodriguez has got a FPI grant from the MICINN. References [1] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer- Verlag New York, Inc., Secaucus, NJ, USA, [2] A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8): , June [3] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Conf. on Computer Vision & Pattern Recognition, jun [4] C. Orrite, M. Rodrguez, and M. Montas. One-sequence learning of human actions. In A. Salah and B. Lepri, editors, Human Behavior Unterstanding, volume 7065, pages Springer Berlin / Heidelberg, [5] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10): , October [6] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): , feb [7] H. Ragheb, S. Velastin, P. Remagnino, and T. Ellis. Vihasi: virtual human action silhouette data for the performance evaluation of silhouette-based action recognition methods. In Proceedings of the 1st ACM workshop on Vision networks for behavior analysis, VNBA 08, pages 77 84, NY (USA), ACM. [8] P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th international conference on Multimedia, MULTIMEDIA 07, pages , New York, NY, USA, ACM. [9] D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2-3): , [10] D. Weinland, R. Ronfard, and E. Boyer. A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst., 115(2): , Feb

11 A Web-based Tool for Training Radiologists in Medical Imaging Christian Mata, Arnau Oliver, Joan Martí Computer Vision and Robotics Group Universitat de Girona {cmata, Abstract The massive growth in applications of radiological imaging and image-guided treatments has met a worldwide shortage of trained radiologists. Therefore, several radiology training environments which combine traditional learning opportunities with advanced e-learning platforms have recently been developed with the goal of supporting the acquisition of radiological expertise. A new research web-based tool which has been customized in order to incorporate learning capabilities for training and guidance of newly employed or resident radiologists is presented. Such tool allows panels of experts to collaborate at different hospitals and research centres by means of integrating a Picture Archiving and Communication System (PACS) to store the Digital Imaging and Communications in Medicine (DICOM) files as well as a database with the extended Markup Language (XML) files containing the experts annotations for each clinical case. First prototype is being evaluated on digital mammography images. 1. Introduction Nowadays, medical imaging plays a fundamental role in achieving correct diagnosis for many diseases. However, it involves changes concerning the management, storing and retrieving of digital images. Thus, in order to improve the current imaging systems, it is necessary to develop architectures and medical databases, as well as PACS [1] systems. Besides, systems based on web applications [2-4] are particularly interesting due to recent advances in computer technology and in communications systems and protocols, which have allowed the development of online software and data recovery in a reliable and inexpensive way [5]. Therefore, the use of Web Services applied to medical domain has experienced an important growth during the last years, and has become a communications system in itself. The main advantages of web-applications are obvious: easy accessibility, users can connect from anywhere in the world, time-saving and efficient mechanism to find special information which can be saved directly to one s computer without the tiring tasks of searching in libraries [6], providing high levels of security system, downloading medical annotations using query forms and also can be used as a high level of interactivity among health professionals. Moreover, the introduction of the XML (extended Markup Language) file format [7] allows information exchange regardless of the platform, thus, introducing a standard format to include diagnosis on the images [8]. The purpose of this work is to provide a useful tool to the medical and scientific communities in order to manage a mammographic image database including their associated diagnoses featuring the advantages and functionalities of a web service (i.e. authentication, security and data retrieval). 2. System Architecture The proposed system has been designed as a webbased application within the Zend Apache server [9] which is also used as a MySQL database server for data storage. The application links both the PACS and the XML servers which store clinical cases (as DICOM and annotation files). All data transmissions between users and the web server are encrypted in order to ensure complete confidentiality of the data. Currently, the Secure Socket Layer (SSL) cryptographic protocol is used. Queries to the prototype are performed through a web form and provide a user with a list of clinical cases according to the query parameters, including the image and the annotation files. 5

12 present at the hospital itself and the outside professionals, we decided to design a web application. One of the key elements in this application is the data and information security across the network, especially when it comes to confidential data related to medical examinations. In this sense, the three modules shown in Figure 1, that constitute our system architecture (Zend Server, PACS and exist), have internal security measures to avoid vulnerabilities from the transfer of information, so that besides using a control User ID private keys, the algorithms have systems that protect and encrypt the information during its transmission. Finally, the access system has been designed in order to facilitate both locall and external users. The users should log in correctly using a unique username and password independently if the connection is via intranet or external. 3. Web-application Figure 1. Architecture model of prototype Figure 1 shows the system architecture of the prototype. The application is stored in a Zend Apache server, while this server is configured with a MySQL database manager. In addition, when the PACS is linked, a database (pacsdb) is also created in MySQL and therefore directly related to Zend Apache server. This architecture allows to access to the PACS server for analysing new cases but keeping them unchanged, since the information concerning the annotations is stored in the MySQL database. The application is also related to the exist manager [10], responsible for the XML database, incorporating its own server for access and management via web interface. We opted for the creation of the XML annotation files in the database in order to preserve the integrity of the original DICOM files and to ensure their full support if they come from different acquisition systems. All XML files that are stored in the exist database are related to their corresponding mammographic images as thesee are the files containing the records of diagnosis made by radiologists. For this reason, it can be seen in the architecture model of Figure 1 thatt there is a direct relationship between the MySQL database and the exist manager. Finally, we have defined the key field identifier or files with their breast imaging, so that the database and the files are correctly matched. In order to facilitate the access to the application, and with the idea to serve both the medical experts We propose the use of the web-application prototype as a part of the problem-based learning paradigm. This section describes the most important features of the application. It has been divided into three different subsections: identification and security, control panel and accessibility, and management and training query form Identification and security Firstly, the secure access user to the application is ensured by a username and password generated by the system administrator. The system incorporates mechanisms of authentication and protection against fraudulent use of identity. The system has a three-level security setting for user profiles. The 1st level (Standard User) allows users to submit queries on the various medical cases stored in the prototype (images and diagnosis). This level corresponds to a profile of a student or auxiliary medical personnel, who uses the application for remote diagnostics, training, etc. The 2nd security levels, in addition to the permissions of the 1st level, allows access to PACS and exist for addition and modification of the mammographic database content and clinical cases (images and diagnosis). Essentially, it corresponds to a profile of advancedd student or medical specialist partner in the construction of the mammographic database, who needs access to the system locally or remotely. The 3rd security level serves for system administrators, adding user account management to the list of permissions. 6

13 3.2 Control panel and accessibility Once a user is identified correctly in the system, the home page of the prototype as the control panel for his profile appears in the browser. Depending on the type of user accessing the system, the control panel shows different possibilities. 3.3 Prototype functionality The most important part associated with the learning capabilities for training and guidance of newly employed or students, is shown in this section. Therefore, the management and training form, finding results, and characterization analysis of the mammographic findings are presented. This e-learning platform usess queries to the prototype through a web form and provides a user with a list of clinical cases according to the query parameters, including both the image and the annotation files. The first step required from the student for this training process is the use of the query form. In order to obtain desirable results the user should select appropriate criteria, as directed by the expert or medical staff. Their work is to find mammographic studies according to the criteria used by the user. The search is performed in all the XML files stored in the exist database and the list of results is shown in the web interface. In addition to a comprehensive list of clinical cases contained in the database, the query forms allow to search according to some specificc medical criteria: reported diagnosis, BIRADS classification, presence of abnormalities (micro-calcifications, structural distortions, masses, etc.). Depending on the volume of records obtained from the results list, the application uses automatic paging group records in a more visual way as can be seen in Figure 2. For each of the records obtained, the system allows three actions: see information on the DICOM image stored in the PACS server, check the diagnosis associated XML file formats, and display the mammographic image. As for the characterization and analysis of the mammographic findings, three different actions are available once the records have been obtained: Display the information contained in the DICOM file. Get the diagnosis linked to the case, stored in a XML file. Display the digitizedd images contained in the DICOM file. When the list of results is obtained, the user can get information about all studies and compare them in order to perform a classification and selection of the best results according to the preferences selected in the query form. This is the educational part of this e- learning tool and it is useful to understand its methodology. For each case a student can display acquisition information contained in the DICOM file, read annotations about the study included in the XML file, and display the images included in each study. In Figure 3 two examples of DICOM images associated with the selected study are displayed. It also shows overlays marked by doctors and their individual annotations. This information is very important because users should compare and classify studies in order to later download all the cases useful for the learning purposes. Figure 3. DICOM images with their associated overlays Figure 2. Results obtained after the training query form The last step when students study all mammographic cases is to download alll the causes useful for them. An important feature of the system is that the records can be visualized on the screen or downloaded locally to the workplace selected by the user. The system offers the possibility to download a 7

14 single record or use an automatic selection. Once selected, the DICOM files together with their associated XML files become part of the ZIP file which is downloaded by the user. 4. Conclusions and Future Work Digital medical imaging technologies have become beneficial in modern medical practices and health care systems, providing powerful tools for diagnosis, treatment, and surgery. The volume of medical images being generated has grown rapidly due to an increase in the number of clinical exams performed in digital form and to the large range of available modalities. Therefore, the demand for online medical imaging systems that allow visualization and processing has increased significantly. Although nowadays public databases of mammographic images are available, many of them have been acquired digitally by analogic acquisition and subsequent digitization of the films. For breast imaging, this process involves an inherent loss of quality, depending on the digitizer system, loss of information acquisition and non-standardization of image formats. The proposed system architecture is able to manage and centralize a public database of digitally acquired mammographic images. Furthermore, using the proposed web application, a user can have access to information both locally and remotely. The prototype has been designed as a web-based application within the Zend Apache server which is also used as a MySQL database server for data storage. The application links both the PACS and the XML servers which store the clinical cases (as DICOM and annotation files). Database queries are performed through a web form and provide the user with a list of clinical cases according to the query parameters, including both the image and the annotation files. As future work to improve the prototype, we propose to define a single XML file format for the medical community, which would correspond to the annotations of the images associated with mammography. Moreover, we plan to define custom search forms that ensure optimal results and are tailored to the needs of the user. Finally, further improvements could be investigated in terms of security and data transfer and the use of new algorithms to improve data encryption and provide a more secure and robust system. References [1] R.H. Choplin, J.M. Boehme, C.D. Maynard, Picture archiving and communication systems: an overview, Radiographics January (12): , [2] H. Munch, U. Engelman, Web-based distribution of radiological images from PACS to EPR, International Congress Series, , [3] Y. Lin, D. Feng, T.W. Cai, A Web-based Collaborative System for Medical Image Analysis and Diagnosis, Conference on Research and Practice in Information Technology, (2): 93-95, [4] J. Zhang, J. Sun, J.N. Stahl, PACS and Web-based image distribution and display, Computerized Medical Imaging and Graphics, (27,2-3): , [5] J. Kim, D. Feng, T. Cai, A Web-based medical image data processing and management system, ACM International Conference Proceeding Series, (9): 89-91, [6] P. Wunderbaldinger, W. Schima, K. Turetschek, T. H. Helbich, A. A. Bankier and C. J. Herold. World Wide Web and Internet: applications for radiologists, (9): , [7] Extensible Markup Language (XML). [8] Chung L, Hsu T, Deng C, Woei C, Chia H. A Webbased Solution for Viewing Large-Sized Microscopic Images. Journal Digital Imaging, (22): [9] Zend server company official webpage: [10] exist official web page: 8

15 Human-Computer Interaction for Optical Music Recognition tasks Jorge Calvo-Zaragoza Dept. of Software and Computing Systems University of Alicante, Spain Jose Oncina Dept. of Software and Computing Systems University of Alicante, Spain Abstract The need to digitise music scores has led to the development of Optical Music Recognition (OMR) tools. Unfortunately, the performance of these systems is still far from providing acceptable results. This situation forces the user to be involved in the process due to the need of correcting the mistakes made during recognition. However, this correction is performed over the output of the system, so these interventions are not exploited to improve the performance of the recognition. This work sets the scenario in which human and machine interact to accurately complete the OMR task with the least possible effort for the user. 1. Introduction Music is one of the main components of cultural heritage. Over the centuries, musical scores have been stored and preserved in cathedrals, libraries and museums to ensure its maintenance. However, this measure has deprived the access to these scores. Digitizing music scores allows greater dissemination and integrity of this culture, so since decades many effort has been devoted to the development of tools for this purpose. Nowadays, edition tools that allow actions based on mouse and click to locate musical symbols in empty scores are available. Although this mechanism can be very accurate, their use its very time consuming. Moreover, digital instruments (such as MIDI keyboard) from which the musical information can be directly transferred to the PC while playing the score can also be found. However, this mechanism can not be completely accurate and capture all the nuances of the score. Furthermore, this method requires the user to be able to play the piece perfectly, which is not a trivial matter. The emergence of Optical Music Recognition (OMR) [1] systems represented the more comfortable alternative to the user. These systems take a scanned image of a score and try to extract its musical information in order to export it to a digital format (such as MIDI, MusicXML or PDF). OMR raised in 1966 [7], related to fields like Image Processing, Document Analysis and Pattern Recognition. A good example of a common framework to perform OMR processes can be found in [8]. Unfortunately, despite several research efforts, these systems are far from achieving good accuracy rates, especially for handwritten scores [9]. The scores are analysed by OMR tools and its output has to be corrected using a conventional score editor. Hence, the user has to be inevitably involved in the transcription process. However, this involvement is unexploited since it could be used for improving the performance of the OMR system itself. An scenario in which the user and the machine interact to achieve, with the last effort for the user, the perfect transcription of a score can be approached. Previous studies have shown how to take advantage of interactive frameworks for pattern recognition tasks [11, 12]. Following this line, we present in this paper an overview about human-computer interaction for OMR tasks. The remainder of this work is as follows: Section 2 presents the field of study, its foundations and its main scenarios to exploit. Section 3 enumerates some of the first areas to explore in the near future. Finally, Section 4 concludes the paper. 2. Human-Computer Interaction for Optical Music Recognition As occurs in other fields of pattern recognition, the user is the most reliable source to validate an OMR process. The productivity that can be taken from human assistance in OMR tasks is an issue that is yet to be explored. Until now, research has focused on improving the accuracy and speed of the algorithms involved in the process. It is needed a deep analysis about the main weaknesses of these algorithms and how human 9

16 participation can improve its performance or be used to produce new algorithms. In this case, there are two critical processes pertaining to OMR system performance: segmentation and classification. The segmentation stage is aimed to detect and isolate the existing musical symbols in the score. This process is quite complex and requires a comprehensive procedure. It usually involves the following steps: 1. Preprocessing: correction of rotation, binarisation, scaling. 2. Staff lines detection and removal 3. Symbol isolation The preprocessing stage is focused on providing robustness to the system. If posterior stages always have as input an image with the staff lines aligned with respect to the horizontal axis, with equal relative sizes and where the only possible values for a pixel are background or foreground, the systems tend to generalise more easily. Each of these steps can be addressed in different ways and in the literature each author chooses those techniques that are considered more appropriate. The staff lines detection and removal is one of the most critical aspect of the process since both the detection and the classification of musical symbol relies on its accuracy. Much research has been conducted concerning this step (a good comparative study can be found in [2]). Although this stage can be approached in many ways, it finally becomes a trade-off between keeping information and reducing noise. Aggressive approaches greatly reduce the noise but can eliminate relevant information. Moreover, less harmful processes end up producing a high amount of noisy areas. Finally, symbol isolation is performed by searching the remaining meaningful objects in the score. The main problem is that some of the musical symbols are broken by the earlier stages (especially because of staff lines detection and removal). Once single pieces of the score have been isolated, an hypothesis about the type of each one is emitted in the classification stage. The high variability of handwritten music symbols (see Fig. 1) is the main difficulty to overcome. In addition to this, all errors committed in previous stages (incorrect binarisation, partially extracted symbols, remains of staff lines, etc.) are carried, which complicates even more this recognition. Based on what has been explained in these lines, there are several scenarios in which the user could attend the machine. Depending on how it is done and the objective pursued, we can divide this assistance into four categories: error detection, online learning, active learning and supervision. It should be emphasized that Figure 1. Four handwritten eighth notes: example of variability in music notation. the purpose of these scenarios is to reduce user effort. Since human participation is mandatory -OMR systems are not sufficiently accurate and a score with errors is not acceptable- the real goal is to make the effort required to get a perfect transcription be less than it would be if conventional OMR systems were applied and the output posteriorly edited. Next subsections describe these four scenarios. It should be noted that these scenarios are not mutually exclusive but they can be combined depending on the suitability of each one at each particular time Error Detection In the error detection scenario, the user simply marks the places where the system has made a mistake. This information can help to improve both the segmentation and the classification stages. For segmentation assistance, the user can mark a place where there is a symbol and the system has not found anything; or just the opposite, the user can indicate an area in which the machine creates a symbol and there is actually nothing. In this way, the system can know better the current sheet and extract more reliable features of the staff. Regarding the classification, marking a symbol which has been misclassified is a way of getting useful domain knowledge. These errors can be propagated to all those symbols which have been classified in a similar way and modify the model learned to be adapted to this change. Moreover, if a definite order is selected in the corrections (for example, left-to-right), each correction implicitly validates previous hypothesis, which could be very helpful for improving the posterior guesses Online Learning Online learning is a machine learning model where the true label of a certain hypothesis is discovered and used for improving the performance of the learning algorithm. In an online learning scenario for OMR, the user does not only mark where there is a classification error 10

17 but he specifies the correct hypothesis. In this case, the learned classification model gets even more useful information, being able to modify the model to suit the corrections given by the user. As it occurred in the previous scenario, each correction can also validate the previous hypotheses. This scenario implies a greater support for the system, but it also requires a more active user involvement Active Learning Active learning refers to a machine learning approach in which the algorithm can query an oracle to know the true label of a certain sample. Further details about active learning can be found in [10]. At any given time, the OMR process may require the user expert assistance. If this information is correctly analysed, it could be beneficial to both the segmentation and classification stages. To have a more accurate segmentation process, the machine could ask the user to label some areas of the image. For instance, he could indicate if a particular piece of the image is empty of musical symbols so that the algorithm has a seed to learn about interesting (in any way) areas. This may serve to have a greater knowledge of the score, which could be exploited to understand the features of the staff (beneficial for detecting staff lines) or the sheet (beneficial for binarisation). Moreover, once the symbols have been isolated, the machine could ask the user to label some of the detected symbols. Thus, the algorithm obtains domain information and it might be able to make future hypotheses more accurately Supervision In this scenario, the user would act as a reviewer of the involved procedures. After each step, the system shows the output and the user must validate or reject the result. Therefore, the process only progresses when every step has been accepted by the user. The main advantage of this framework is that mistakes are not carried through the different steps, so the final result will be free of errors. This validation allows avoiding general selection of magic numbers in the segmentation stage. After each rejection, the involved procedures could tune some of its parameters to fit the current score. In the classification stage, each hypothesis about a symbol could be presented to the user. User verdict not only helps to finally achieve a transcription free of errors, but the machine could learn from each rejection, as it happens in the error detection scenario. Furthermore, each validation provides a completely reliable prefix, which can be used to improve the accuracy of the next hypotheses. This scenario is especially interesting because an action requires the minimum user effort. Nevertheless, the process could be extended for too long. 3. Future developments As a starting point, the first developments must entail the basics of the interaction between the human and the system. Specifically, we think that there are three aspects that must be considered at first: 1. Development of an optimised OMR system for reducing human interactions. It has been demonstrated that in sequential pattern recognition tasks, giving the most likely output is not optimal for reducing human corrections [6]. This condition has not been taken into account in any current OMR system yet. 2. Development of a system for online recognition of handwritten musical notation. When the user has to correct a misclassification of a symbol, he can search the correct label in a list of musical symbols. However, it is more natural and comfortable to draw the correct symbol over the score itself. To this end, it is necessary to develop a system that recognise musical symbols from the strokes drawn by an user. Although some works has already been done about this issue [5, 3, 4], it is still unexplored how offline and online classification can be more profitably combined. 3. Analysis of the exploitation of user assistance in the segmentation stage. The process of segmentation is a key stage to achieve a good transcription. So far it has not been analysed how this process can be improved using the assistance of the user. Several procedures are involved in this stage (see Section 2), so it would be appropriate to locate the most interesting aspects to focus the human efforts on what is most profitable. Since the staff lines detection and removal step is one of the keys in the performance of an OMR system, developing a new algorithm that takes advantage of user feedback in any of the presented scenarios is advisable. It should be noted that each of these items represents a new and independent line of research, given the magnitude of the involved field. Once these ways have been explored and exploited, it would be interesting to see how they can be combined in the future to produce an efficient human-computer OMR system. 11

18 4. Conclusions This work aims to establish the starting point for research in Human-Computer Interaction for Optical Music Recognition tasks. First, it has been explained the need to develop such kind of systems. Due to the inaccuracy of state-of-art OMR systems, the user has to correct the mistakes so it is worth using this unavoidable effort to improve the score recognition process. Focused on the process of OMR, four scenarios were presented according to the nature of the user intervention: error detection, where the user only marks where an error has been made; online learning, where the user corrects the errors made by the system; active learning, where the user is asked to provide the system with the label of a specific sample; and supervision, where the user sequentially accepts or rejects the steps involved in the process. Regarding the future research, it have been listed the three open lines that should be explored at first: development of an OMR system optimized to reduce human interactions, development of a system for online recognition of handwritten music notation and analysis of the exploitation of user assistance in the segmentation stage. In the future it is intended to achieve a system that optimises the user effort to obtain the perfect transcription of a music score. [7] D. Pruslin. Automatic recognition of sheet music. Sc.d. dissertation, Massachusetts Institute of Technology, [8] A. Rebelo, G. Capela, and J. S. Cardoso. Optical recognition of music symbols - a comparative study. IJDAR, 13(1):19 31, [9] A. Rebelo, I. Fujinaga, F. Paszkiewicz, A. Marcal, C. Guedes, and J. Cardoso. Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, pages 1 18, [10] B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin Madison, [11] A. H. Toselli, E. Vidal, and F. Casacuberta. Multimodal Interactive Pattern Recognition and Applications. Springer, [12] E. Vidal, L. Rodrguez, F. Casacuberta, and I. Garca- Varea. Computer assisted pattern recognition. In Proceedings of the 4th Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, June References [1] D. Bainbridge and T. Bell. The Challenge of Optical Music Recognition. Language Resources and Evaluation, 35:95 121, [2] C. Dalitz, M. Droettboom, B. Pranzas, and I. Fujinaga. A comparative study of staff removal algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(5): , [3] S. E. George. Online pen-based recognition of music notation with artificial neural networks. Comput. Music J., 27(2):70 79, June [4] K. C. Lee, S. Phon-Amnuaisuk, and C.-Y. Ting. Handwritten music notation recognition using hmm a nongestural approach. In Information Retrieval Knowledge Management, (CAMP), 2010 International Conference on, pages , [5] H. Miyao and M. Maruyama. An online handwritten music score recognition system. In Pattern Recognition, ICPR Proceedings of the 17th International Conference on, volume 1, pages , [6] J. Oncina. Optimum algorithm to minimize human interactions in sequential computer assisted pattern recognition. Pattern Recognition Letters, 30(5): ,

19 A Multimodal Genre Recognition Prototype Bernabeu, J.F., Pérez-Sancho, C., Ponce de León, P.J., Iñesta, J.M., Calvo-Zaragoza, J. University of Alicante {jfbernabeu, cperez, pierre, inesta, Abstract In this paper, a multimodal and interactive prototype to perform music genre classification is presented. The system is oriented to multi-part files in symbolic format but it can be adapted using a transcription system to transform audio content in music scores. This prototype uses different sources of information to give a possible answer to the user. It has been developed to allow a human expert to interact with the system to improve its results. In its current implementation, it offers a limited range of interaction and multimodality. Further development aimed at full interactivity and multimodal interactions is discussed. 1. Introduction In this paper, a multimodal and interactive prototype to perform music genre classification is presented. Classification of music into different categories is an important task for retrieval and organisation of music libraries. In our team, several engines to solve this task have been developed. However, music genre recognition is a difficult task due to her subjective nature. Genre classification involves many aspects. For example, genre labels are inherently subjective and influenced by a number of cultural, art, and market trends. So perfect results can not be expected [3]. Moreover, the success rate can be different depending on the particular classifier and the data used to train and test the system. Nevertheless, the combination of several sources can improve the success rate, as shown in [6]. Obtaining descriptive features from an object from different information sources permits to perform a deeper and more informative description of it. A number of papers can be found in the literature where pattern recognition is based on multimodal information. In [9] the authors explain how multimodality in human interaction and multimedia information processing can help to improve the performance in different pattern recognition tasks, like manuscript text processing or gesture recognition from image sequences. In [4] the authors consider a video sequence as a multimodal information source, obtaining features of different nature from speech, audio, text, shapes, or colors. This approach works under an early scheme where features are combined in a compact representation for a single decision. Other approaches use a late scheme where various classifiers are utilized for the different information sources and are then combined into a decision. For example, in [5] a multiple classifier system for OCR is presented, based on hidden Markov models that provide individual decisions. The combination of them is performed with a voting system. In the present work, we present a multimodal genre recognition GUI to help the user to make a decision in the difficult task of classifying a multi-track file MIDI in a given music genre. The GUI provides the user several classifiers from different data sources. Some of these classifiers use the information which is in the melody part. Hence, the GUI provides a tool to find out the track in which the main melody is. Finally, the user can combine the several classifiers to get a proper classification. The next section brings a system overview, including descriptions of its core classification engines and auxiliary modules. Next, its current interaction capabilities are discussed, and finally, some conclusions and further development lines are presented. 2. System design The multimodal genre recognition GUI consists of two main modules: the melody track selection (MTS) module and the genre classification (GC) module. The basic operation mode is described below. An user chooses a multi-track MIDI file which he wants to classify. Then, MTS module does the needed operations to return the track having the highest probability of being the melody. MTS module is described in section 2.1 in more detail. Once we have a melody track selected, 13

20 the flow of the information arrives to the GC module. The GC module needs a track to be labeled as melody, since some of the genre classification engines assume that the features are extracted from a melody line. The GC module is described in section 2.2 in more detail. Finally, the system returns the genre which has the highest probability. After presenting the basic operations of the system we explain in more detail the different modules pointing out the machine learning techniques which are used by the different engines to make the decisions in the classification Melody track selection (MTS) module The function of the MTS module is to help the user to make the decision of melody track selection. For this, we need to assume that, if the melody exists, it is contained in a single voice or track, and it is not changing among several tracks. This assumption is also taken by others authors [2], as there is empirical evidence that it is the case for much of today s symbolically encoded western music. At this point, the system needs an engine that gives the probability of each track to be the main melody. A possible strategy is to use the metadata information found in MIDI files. However, metadata present some drawbacks as for example, unreliability, subjectivity, and they can be missed. Another drawback of this approach is that such a method would obviously tell us nothing about the content of melody tracks. Hence, it was not considered here. Instead, a version of our melody track selector [10] was used for this task as described below. First, empty tracks and tracks playing on the percussion channel (channel MIDI 10) are filtered out in this approach. Each remaining track is described by a vector of numeric descriptors extracted from the track content. Some features describe the track as a whole while others characterise particular aspects of its content. These descriptors are the input to a classifier that assigns to each track its probability of being a melody. A random forest classifier, an ensemble of decision trees, was chosen as the classifier. The WEKA 1 toolkit was used to implement the system. There is a possibility that the MIDI file does not have a melody track. To solve this problem an additional track named NO MELODY with a heuristic fixed probability p 0 = 0.22 is added. Then, each probability track is re-normalized. So this p 0 acts as a threshold, in such a way that for any track i only if its p i > p 0 is considered for being a melody. If p i p 0 for all tracks, a NO MELODY answer for the file is given. 1 Figure 1. MTS module. The GUI has several classifiers which were trained with different corpora. Specifically, four models were built using different data in the training phase. The files were downloaded from a number of freely accessible Internet sites. First, three corpora (JAZ200, CLA200, and PR200) made up of 200 files each, were created to set up the system and tune the parameter values. JAZ200 contains jazz files, CLA200 has classical pieces, and PR200 contains pop-rock songs. The other corpus named ALL600 is the union of these three corpora. The user can choose each model at any time selecting their radio buttons (see Fig 1). The right side shows the result, where each track gets its probability to be a melody displayed as a progress bar. Empty and percussion tracks are not showed by default, but the user have the option to see these tracks. Also, a slider control allows to listen to a specific section of the file and a mute/solo buttons are provided for each track Genre classification (GC) module The function of the GC module is to help the user to make the decision of which genre corresponds to a target file. The working hypothesis is that melodies from a same musical genre may share some common low-level features, permitting a suitable pattern recognition system, based on statistical descriptors, to assign the proper musical genre to them. For this, it uses several engines that compute the probability to belong to a given genre. Now, the several genre classifiers are explained in more detail. SVM based on melodic content features. The first classifier is a Support Vector Machine (SVM) classifier. The input data is based on statistical features of melodic content, like melodic, harmonic, and rhythmic descriptors. There are 49 descriptors in total and they have been designed according to those used in musicological studies. For training the classifier each sample is represented as a labeled vector of statistical descriptors com- 14

21 puted from each melody segment available (see [8]). The SVM Weka implementation has been used to perform the SVM features classifier. N-grams (notes). The second classifier is an N-gram classifier. The N-grams are used here as music words, that captures relevant information of the data and is suitable for a text categorization approach [7]. To do this we use a representation that combine pitch and note durations, using relative measures. The encoding method makes use of pitch intervals and inter-onset time ratios (IOR) to build series of symbols of a given length (N). There are two possible encodings, coupled (intervals and IOR are encoded together) and decoupled (separate symbols). Once we have the MIDI information converted in a sequence of symbols, a language model is built from a training set of documents and used as classifier. For this, given a new, previously unseen, sequence of words, classification is done by selecting the class most likely to have generated that sequence. In this work, building and evaluation of the language models has been performed using the CMU SLM Toolkit 2, and a combination of both techniques, interpolation of models and the Witten-Bell discounting method have been used to solve the problem of the unseen samples. 4-grams models have been used here. N-grams (chords) and metadata. Actually, this classifier can be seen as three classifiers: the first, N-grams (chords), using the chords provided by the harmonic structure of the music sequence; the second, Metadata, using the instrumentation information contained in a MIDI file metadata; and the third, Combined, using an early combination of both data sources. In the three cases, the features give a single vector that will be the input to a classifier after a feature selection procedure. Each file in the dataset is represented as a vector x {0, 1} H+I, where each component x i {0, 1} codes the presence or absence of the i-th feature. H denotes the number of chords in the dictionary of possible harmonic combinations considered, H = 312 different chords in this work (see [7] for more details), and I is the number of possible instruments that, assuming the General MIDI standard for the sequence, will be 128 instruments plus 3 percussion sets. Therefore, I = 131. There will be a probability of each feature associated to each class, depending on the frequencies found in the training set for the items in the classes. The decision will be taken combining these probabilities through a 2 Naïve Bayes classifier. These classifiers are described in more detail in [6]. In order to select the features that contribute the most to class discrimination, a feature ranking has been established based on the Average Mutual Information (AMI) [1], that provides a measure of how much information about a class is able to provide a single feature. Training set. Corpus 9GDB contains both melodic and harmonic information (including tonality). It consists in 856 files MIDI and Band-in-a-Box formats. It is divided in three musical genres: academic, jazz, and popular music. A second split of this database divides each genre in three subgenres, resulting in a total of 9 music subgenres. Figure 2. GC module. This hierarchical structure allows the user to compare the classifiers at different levels, either at the first level with three broad genres, or at the second level with all nine subgenres, making the tool more versatile (see [7] for details). Each classifier was trained with this corpus but each one provides the user different aspects to make a decision. As we explain above some of them uses the melody information and others the information contained in all the tracks or metadata. That is, each classifier uses as input different sources of information and can provide different answers for the same input file. In order to provide a mechanism to tune the final selection recommended by the system, the user can combine the classifiers assigning a weight for each model like a linear combination of the different classifiers. 3. User interaction Music genre classification is clearly subjective and involves different aspects. Then, interaction with a human expert is needed to assess and validate the given answer by the different automatic systems. This interaction begins in the selection of which information 15

22 the system uses and finishes in the validation or correction of the automatic classification. The goal is to minimize the number of interactions that a human expert should perform to obtain a reliable genre classification and when labeling a database of a number of MIDI files Interaction with MTS module When the user works with the MTS module, he can hear the different tracks of the multi-part file and is provided to a mute/solo buttons to select the different tracks which he wants to hear when he is selecting the melody track. The user can see the probability of each track. Moreover, the user can select the several classifiers and can view or not the percussion and empty tracks Interaction with GC module The main interaction with the GC module is to tune the final selection recommended by the system. The user can combine the classifiers assigning a weight for each model like a linear combination of the different classifiers. To do this each classifier have a slider bar to modify its weight in the final selection (see fig 2). Finally, the user has the option to change the selection recommended by the system if he considers that this selection is not proper. 4. Conclusions In the current development state, this multimodal interactive music genre classifier prototype is capable of classifying multi-part music files. It can use several sources of information extracted from a MIDI file, such as melody features, melody notes, chords, and metadata information. The system allows the user to interact with both modules, MTS and GC, selecting and tuning the several classifiers involved. This prototype is still in an early stage of development. It is conceived as a platform for interactive multimodal research in the context of symbolic music data. New features are planned for the near future, including: improved interface usability capabilities., addition of new source data input, such as audio multi-part files, addition of new user input modalitites, such as MIDI instrument live input, addition of new genre classifiers using different data sources, such as bass track or percussion track, addition of new classifiers based in different methods, such as tree grammars or tree automata. The system can be extended to use the feedback user information. This way the classifiers could be trained incrementally with new samples classified by the user. Also, the system can provide a mechanism to save the classifier weights tuned by the user and to train them with user datasets allowing him to change the genre hierarchy. Acknowledments. This work was supported by the projects DRIMS (TIN C02) and the PROMETEO/2012/017. References [1] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, [2] A. Friberg and S. Ahlbäck. Recognition of the melody in a polyphonic symbolic score using perceptual knowledge. In Proceedings of the 4th Conference on Interdisciplinary Musicology, Thessaloniki, Greece, [3] S. Lippens, J. Martens, M. Leman, B. Baets, H. Meyer, and G. Tzanetakis. A comparison of human and automatic musical genre classification. In Proceedings of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2004, volume 4, pages , [4] Z. Liu, Y. Wang, and T. Chen. Audio feature extraction and analysis for scene segmentation and classification. In Journal of VLSI Signal Processing System, pages 61 79, [5] M. Liwicki and H. Bunke. Combining on-line and offline systems for handwriting recognition. In Document Analysis and Recognition, ICDAR Ninth International Conference on, volume 1, pages , [6] T. Pérez-García, C. Pérez-Sancho, and J. M. Iñesta. Harmonic and instrumental information fusion for musical genre classification. In Proc. of. ACM Multimedia Workshop on Music and Machine Learning (MML 2010), pages 49 52, Florence (Italy), October ACM. [7] C. Pérez-Sancho. Stochastic Language Models for Music Information Retrieval. PhD thesis, Alicante, Spain, July [8] P. J. Ponce de León. A statistical pattern recognition approach to symbolic music classification. PhD thesis, Alicante, Spain, September [9] G. Rigoll and S. Müller. Statistical pattern recognition techniques for multimodal human computer interaction and multimedia information processing. In Information Processing, in Survey Paper, Int. Workshop Speech and Computer, pages 60 69, [10] D. Rizo, P. J. Ponce de León, C. Pérez-Sancho, A. Pertusa, and J. M. Iñesta. A pattern recognition approach for melody track selection in midi files. In T. A. Dannenberg R., Lemström K., editor, Proc. of the 7th Int. Symp. on Music Information Retrieval ISMIR 2006, pages 61 66, Victoria, Canada,

23 SAGH a supervised image hashing technique Guillermo García, Mauricio Villegas and Roberto Paredes ITI/DSIC, Universitat Politècnica de València Camí de Vera s/n, València, Spain Abstract Hashing techniques have become very popular to solve the content-based image retrieval problem in gigantic image databases because they allow to represent feature vectors using compact binary codes. Binary codes provide speed and are memory-efficient. Different approaches have been taken by researchers, some of them based on the Spectral Hashing objective function, among these the recently proposed Anchor Graph Hashing. In this paper we propose an extension to the Anchor Graph Hashing technique which deals with supervised/label information. This extension is based on representing the samples in an intermediate semantic space that comes from the definition of an equivalence relation in a intermediate geometric hashing. 1 Introduction Hashing methods map the high-dimensional representation into a binary representation with a fixed number of bits. Binary codes are very storage-efficient, since relatively few bits are required, millions of images can be stored into computer memory. Moreover, computing the hamming distance for binary codes is very fast, as it can be performed efficiently by using bit XOR operation and counting the number of set bits [1, 2]. The hash function design is crucial, this being mainly the unique difference among all these methods. Generally the different methods learn a hash function that preserves the topology of the samples in the original space, i.e. images that are near in the original high dimensional space share the same (or similar) binary code, while images far in the original space have very different binary codes. These methods work with unsupervised information, thus the preservation of the geometric topology is the unique goal to pursue. However, when there is additional information available, which could be supervised (i.e. labels annotated by a human), better performance can be obtained by methods which try to preserve the semantic topology. Since images visually different could contain similar semantic concepts, in these cases the hash code should be designed to (also) preserve the semantic topology. In this paper, an extension of the Anchor Graph Hashing technique is proposed, which makes it capable to deal with supervised information and produce a binary embedding that preserves no only the geometrical topology, but also the semantic topology of the data. 2 Notation and background Let X = {x 1,..., x n } R d be the set of n feature vectors extracted from training images and represented in a d-dimensional space. The goal is to learn a binary embedding function of q bits, f : R d { 1, 1} q, where for convenience the binary symbols have been defined as 1 and 1. The training set X produces the set of binary codes Y = {y 1,..., y n } { 1, 1} q, and for an arbitrary input test sample, the same mapping would be used so that the hamming distance can be employed to find its nearest neighbors from the training set. Additionally, in the supervised scenario we assume that each training sample x i has an associated label vector t i R l which provides semantic information about the sample. Usually the label vector t i is a binary vector indicating the presence or absence of each one of the l terms, t i { 1, 1} l. The hashing function f should preserve the topology of the data assigning similar codes to near samples and dissimilar codes to far samples. To this end a very effective performance measure is to compute a sum of similarity-weighted squares of differences between codes. More concretely Spectral Hashing [3] proposed the constrained optimization: min Y 1 2 n y i y j 2 a ij = Tr(Y T LY ) i,j=1 s.t. Y {1, 1} n q, 1 T Y = 0, Y T Y = ni q q (1) where L = diag(a1) A, being A R n n the 17

24 similarity matrix having a ij as its components, and Y { 1, 1} n q is a matrix having in each row the codes of the training samples y i. However, this integer optimization problem is an NP-hard problem. In order to obtain a tractable optimization the authors proposed the spectral relaxation, by dropping the integer constraint and allow Y R n q. Therefore an approximate solution given by sgn(y ) yields the final desired hash codes. In [4] the authors introduced a very effective approach for image hashing based on the approach just mentioned, Anchor Graph Hashing (AGH). This unsupervised technique aims at capturing and preserving the semantic topology assuming that close-by points usually share labels. The solution to this problem as proposed in [4] is to avoid computing the whole similarity matrix A for all the n samples. To this end, a small set of m points being m n called anchors, are selected (e.g. using k-means clusters). With these anchors, the matrix A is approximated as  = ZΛ 1 Z T, where Λ = diag(z T 1), and the matrix Z R n m is highly sparse, each column only having s values different from zero, which correspond to similarity values of the s nearest anchors. Because of this sparsity, the solution can be obtained by an eigenvalue decomposition of a much smaller m m matrix, instead of n n of matrix A. For further details, the reader should refer to [4]. 3 Supervised AGH The aim of Anchor Graph Hashing is to preserve the original topology by embedding near images to near hashing codes. The results showed in [4] and the reduced computational complexity make this technique a very interesting hashing method for large-scale scenarios. As mentioned above, the main assumption of AGH is that close-by images share labels. However, we can assume that images far in the original space could also share labels and thus being very close in the semantic space. Taking into account that nowadays the images are represented using a very low-level representation, mainly based on bag-of-visual-words, this second assumption is reasonable and motivates the supervised scenario. We propose an extension to AGH that considers side-information provided by the label vectors t, when such information is available. Note that the hashing function of AGH depends on the similarity between the input sample and the m anchor vectors, and since the label information is not available for the test samples, the label information cannot be introduced into the similarity matrix A as can be done for other methods. Therefore the label information original space x AGH u geometric code labels t v semantic space hash code AGH Figure 1. SAGH can be seen as a repeated application of AGH. A first embedding is obtained from the original space to a geometric code. A second embedding is obtained an intermediate semantic space to the final binary hash code. has to be included in an indirect way. Our extension, that we call Supervised AGH (SAGH), is based on an two-step repeated application of AGH that uses the label information in an indirect way and the definition of an equivalence relation. The resulting procedure can be summarized as follows, first the training samples are embedded into a geometric binary code, then a semantic representation is derived from this geometric code and finally a new embedding is performed into the desired binay hash code. 3.1 The proposed SAGH approach In the proposed Supervised AGH, the intermediate semantic representation of the samples allows that semantically similar samples are coded with similar binary codes despite of being far in the original representation space. From this perspective, SAGH not only can have a better performance because of using the label information, it can also potentially encode the data with shorter binary codes. The SAGH is performed by first applying the standard AGH to the training set providing an initial hash code of p bits U = {u 1,..., u n } { 1, 1} p, usually p > q in order to produce a sparser distribution of the data. We will refer to this first hash code as a the geometric code. As mentioned above, the goal of this first hashing is to produce a semantic embedding of the training data. To this end, we define the equivalence relation in the set X. Two samples are equivalent under this relation if these samples have the same geometric code: x i x j u i = u j (2) The equivalence class of a particular sample x X is then defined as: y 18

25 [x] = {x X u x = u x } With this definition we propose a semantic representation of a particular training sample x i as: v i = 1 [x i ] x [x i] t x (3) where [x i ] is the number of elements in the equivalence class and t x is the label vector associated to the sample x. In fact, with this definition all the samples inside an equivalence class share the same semantic representation. Thus, each equivalence class has an associated semantic representation that we denote by v [x]. Alternative equivalence relations could be defined in order to group samples depending on different strategies. For instance, several geometric codes could be obtained from the application of AGH with different parameters, e.g. number of nearest anchors s, anchor selection, etc., that yield different geometric codes for the same sample and allowing to define better equivalence relations. Independently of the equivalence relation definition, equation (3) maps geometric codes u { 1, 1} p into the semantic representations v R l. As a result we have a representation in a semantic space V = {v 1,..., v n } R l. This set of semantic representations for the n training samples is considered as a new input to a second AGH that produces an embedding into the final desired binary representation y { 1, 1} q with q bits. This second AGH uses as input the different intermediate semantic codes v generated by the proposed approach. In principle the number of possible different semantic codes can be min(n, 2 p ), but it is much more less in the practical situation. Figure 1 illustrates the SAGH mechanism. Using this two-step hashing, points that were far in the original space but with similar semantic information should have similar intermediate semantic codes and thus will be mapped to nearby codes in the definitive binary space. 3.2 Hashing query images The process to obtain a hash code for a query image follows a similar procedure. For a query image ˆx a geometric code û is produced using the first AGH mapping. This geometric code could be the same (i.e. having hamming distance zero) than some geometric code u i seen in the training step, thus ˆx x i and then we will assign the same intermediate semantic code, ˆv = v i. But the geometric code û could also be empty in the training step, and then there is no semantic code to assign to it. In this case we propose the following procedure. First we have to find the radius R of the minimum hamming ball with non-empty geometric code u around û: R = min r {1,..., p} s.t. x X : d(u x, û) = r (4) where d(, ) is the hamming distance. This radius defines the set B R of the different equivalence classes inside this hamming distance. We propose to obtain the semantic embedding of the query point as an average of all the semantic representations associated to the equivalence classes inside B R : ˆv = 1 B R [x] B R v [x] (5) Finally, the semantic code ˆv will be embedded using again AGH to the definitive hash code ŷ. It is important to note that the hashing obtained by SAGH will be affected mainly by the equivalence relation definition (2), the procedure to obtain a semantic representation associated to each equivalence class (3) and the semantic embedding for those query images that fall into empty geometric codes (5). 4 Experiments In order to assess the performance of the proposed SAGH technique, we have performed experiments using a dataset widely used in the literature. This dataset is a version of the CIFAR 1 dataset [5], which consists 64,185 images selected from the Tiny Images dataset [6]. The original Tiny Images are pixels, although they have been represented with grayscale GIST descriptors [7] computed at 8 orientations and 4 different scales, resulting in 320-dimensional feature vectors. These images have been manually grouped into 11 ground-truth classes (airplane, automobile, bird, boat, cat, deer, dog, frog, horse, ship and truck), thus we shall refer to this version of the dataset as CIFAR-11, and it is the same dataset that was used in [8]. For comparison, we also ran the experiments with other hashing techniques found in the literature for which there was code freely available. 4.1 CIFAR-11 To estimate the performance of the different methods for the CIFAR-11 dataset, we have employed a 5-time repeated hold-out procedure. In each of the five rounds, 1 kriz/cifar.html 19

26 k=500 hamming radio = SAGH ITQ-CCA 0.1 AGH SH 0.05 LSH L Code size SAGH ITQ-CCA AGH SH LSH Code size Figure 2. CIFAR-11 dataset. On the left, average precision of the top-500 ranked images. On the right, average precision for a hamming radius of 2. 3,000 images were randomly selected for the test set and the remaining was left as the training set. The final results are the average over the five partitions. The results are presented in Figure 2. The performance is measured using the class labels as ground truth. In the figure one of the graphs presents the average precision for the first 500 retrieved images when varying the number of bits, this measures the hash ranking performance. For retrieved images having exactly the same hamming distance, a random reordering was applied. The other graph in the figure shows the average precision for a hamming radius of 2 and this measures the hash lookup performance. As was expected, the two supervised methods, SAGH and ITQ-CCA, perform much better than all of the other unsupervised methods. This is quite understandable since the labels of CIFAR-11 are manually selected and not noisy, thus there is much to gain by using this additional available information. The performance of SAGH is better than ITQ-CCA. Note that in this case, because there are only 11 classes, ITQ-CCA is limited to a maximum of 10 bits, which is a severe limitation. As can be observed, the performance of the proposed SAGH is better than its unsupervised counterpart AGH. This confirms that the proposal effectively is capable of taking advantage the additional information to achieve a better performance. In these results the same behavior as in [4] is observed both for AGH and SAGH. The precision at a hamming radius of 2 does not decrease for large code sizes. Although the performance is not better than for fewer bits. 5 Conclusions In this paper we propose an extension to the Anchor Graph Hashing technique which is capable of taking advantage of supervised/label information. This extension is based on representing the samples in an intermediate semantic space that comes from the definition of an equivalence relation in an intermediate geometric code. The results show that our approach is a very effective way to incorporate such supervised information to the standard AGH. The standard AGH is clearly outperformed by our SAGH in the CIFAR dataset where the supervised information can be considered very clean. Moreover, SAGH is clearly the best technique on this dataset compared to the state-of-the-art ITQ-CCA. References [1] Knuth, D.E.: The Art of Computer Programming, Volume I: Fundamental Algorithms, 3rd Edition. Addison-Wesley (1997) [2] Wegner, P.: A technique for counting ones in a binary computer. Commun. ACM 3 (1960) 322 [3] Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx. Reasoning 50 (2009) [4] Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In Getoor, L., Scheffer, T., eds.: Proceedings of the 28th International Conference on Machine Learning (ICML-11). ICML 11, New York, NY, USA, ACM (2011) 1 8 [5] Krizhevsky, A.: Learning multiple layers of features from tiny images. Master s thesis (2009) [6] Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) [7] Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision 42 (2001) [8] Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: CVPR. (2011) 20

27 Multibiometric Authentication Based on Facial, Fingerprint and Hand Geometric Adaptative Templates Alejandro Vidal, Francisco J. Perales Dep. Matemáticas e Informática de la UIB Universitat de les Illes Balears Abstract Biometric authentication systems are usually based on features extraction. Features are a collection of measurable details, obtained from the biometric trait that defines the identity of a certain person. This collection of data is known as template, and it s stored in the database. The acquired biometrics quality must be controlled in order to model the identity of the individual in a unique and distinct way. The creation and update of templates is a critical task for the correct use of a biometric application. In this paper we propose the implementation of a model that, using biometricindependent tools, intends to improve the templates stored in the database, in what we have called adaptive biometric template". A new multimodal version using fingerprint, hand geometry and facial faces are developed using a PCA method. Some good results are presented. 1. Introduction A generic biometric system can be defined with a very simple working paradigm that is used in most applications and commercial solutions. First, when a security system is being installed, the biometric traits of all users with access to the resource must be acquired, thus creating a database that models the identity of the individuals by means of templates. This step or working mode is known as enrolment. Each time a new genuine user wants to access the resource we must enroll him, acquiring his biometrics. Currently, user s identities stored as templates in the database don t change after enrollment and remain invariable in the database. Subsequently, users who want to enter the system must show their biometrics for comparison with the stored templates and verify that his identity is found in the database and, thus, grant him access. At this moment biometric security system works in authentication mode. With this working scheme the enrolment process is essential since it s the only moment when templates stored in the database are modified. These templates are the main link between the designed user s model and his real identity, and they remain static as long as we don t acquire another set of biometric traits, process that can be annoying to the user. But achieving biometric templates that represent the user s identity in an accurate way can be a difficult task as a consequence of several factors: it is not easy to measure the quality of a biometric trait (the only objective values are FMR and FNMR [1]), the user s biometrics are not in good condition at the time of acquisition (e.g., dry fingers in a fingerprint system or irritated eyes in iris detection [2]) and, though it s not a desirable feature, some biometrics, like voice, could change with time. Figure 1. Working scheme of a traditional biometric system Moreover, if the number of database users is high, a manual control over biometric templates quality 21

28 that rejects incorrect traits and acquires a new 2. When a users tries to access: Actas del III Workshop de Reconocimiento de Formas y Análisis de Imágenes, WSRFAI, Sept (correct) set of traits could not be possible. Two problems surge from the limitation explained above: first, the need of updating biometric templates in order to accommodate them to the trait s real evolution in the individuals and second the proper (correct) selection of templates in order to turn down deficiencies or errors in acquisition, therefore reducing error rates associated with authentication. In the next sections, we propose a new adaptative biometric template system. The proposed system improves the update template process increasing inter-class differences and reducing intra-class differences, using the standard authentication procedure to attain more precise ROC curves. Also our system is designed in an open way, so that future new templates from other biometrics features can also be included and therefore offer a multibiometric approach. 2. Adaptative biometric templates A lot of schemes that bring successful solutions to these problems have been implemented. X. Jiang and W. Ser [3] propose a recursive technique for improving biometric templates that compute average values of minutiae included in each instance of a fingerprint template. Other known methods are biometric independent, like the ones proposed by Jain, Ross and Uludag, [4] that use binary trees between the different instances that form a template (dendograms, DEND method) or average distances of similarity between these instances (MDIST). Scheidat et. al. [5], on the other hand, focus the update problem as if it was a cache pages issue. They propose the use of classics algorithms (FIFO, LRU, clock) for replacing the biometric traits that became obsolete. The paradigm that we will show next (implemented in the structure of a multimodal biometric library) doesn t intend to replace none of the mentioned above techniques, whose efficiency and performance has been proved. The main idea is to provide an automatic tool for supporting adaptative biometric templates that, using the information obtained from the access of the different users, could make the templates stored in the database more different between them and more similar to the real trait of the individual. The working scheme until now was: 1. Acquire user s biometrics and store its features in a biometric template in the database. 2. When a users tries to access: a. Verify that the biometric given is similar to the one stored in the template. We propose the following: 1. Acquire users biometrics and store its features in a biometric template in the database a. Verify that the biometric given is similar to the one stored in the template b. Store the biometric trait used in the access. 3. Periodically and for each user: a. Evaluate the quality of the biometric traits used in the access. b. If the quality of this traits is better then include them in the template, else reject them. For the implementation of this system we need to use a second biometric database, parallel to the main database. The goal of this second database is to store the different tries to access that occurs when the security system works in authentication mode, for his later evaluation. The information stored about these tries is: Date and time of the access. Name of the users whose identity was claimed in the access. Set of biometric features given in the access. The entries to the database are stored in different lists. First, for each user we store a list of all the successful access that he has made, so we can examine directly the evolution of the biometric trait along the different genuine entries. 3. New multimodal adaptative biometric This version is a new improved version of the previous work. In this approach, the system is able to manage new kind of biometric features. In particular we extend the previous work using hand geometric and facial face information Hand Geometry Biometric With respect to hand geometry, different algorithms based on different characteristics, these can be summarized: Algorithms based on geometric feature extraction Algorithms based on analysis of the shape of the hand, without feature extraction Based on the analysis algorithms of the impressions taken from the lines of the palm. Combining algorithms analysis approaches lines palm together with the shape of this. We have selected the algorithms that obtains the characteristics based on functions that make use of segmentation of the hand and the search and convex polygons onto a two dimensional image. The reasons for this choice are the ease of implementation of this algorithm, that algorithm is quite widespread in this environment, and good results offered in conjunction with the economic systems that make up the original 22

29 research (low cost devices). The references have body, etc. but of all these, the only completely Actas del III Workshop de Reconocimiento de Formas y Análisis de Imágenes, WSRFAI, Sept proved very helpful as a point of departure for the reliable results shown are data obtained from the development of this biometrics. In the next figures front views of faces. To detect a face in frontal view, we can see the examples of procedures and criteria to you can choose one of these cascaded Haar evaluate the hand biometric algorithm. classifiers that come with OpenCV: "haarcascade_frontalface_default.xml" "haarcascade_frontalface_alt.xml" "haarcascade_frontalface_alt2.xml" ade_frontalface_alt2.xml" "haarcascade_frontalface_alt_tree.xml" After this step, we normalize illumination face value, size proportions or aspect-ratio, erosion filtering, colour conversion from RGB to gray level images. Finally we generate the eigenfaces sets from the preprocessed set of capture images. Figure 2. Image Hand Characteristics We have a simple threshold pre-processing, a contour detection algorithm (Canny) and finally a capture characteristics process, based in two original images and a arithmetic mean based on Euclidean distance function. d ( q j r j ) 2 j=1 σ j 2 < Where Q = (q1, q2,..., qd) represents the feature vector to be verified, and the same vector R = (r1, r2,..., rd) represents the vector that is stored in the associated database an identity. The variance of the feature j on all registered templates (a measure of the relative importance of the feature). The verification is successful if the distance between R and Q is less than the value we Facial Biometric For facial information, we have used the PCA technique. The PCA algorithm is probably the most commonly used subspace projection for face We used the OpenCV library for image recognition. processing. The first step for face recognition is face detection, the OpenCV library makes it fairly easy to detect a face in an image with its Haar classifier cascade (also known as the Viola-Jones method for detection faces). The "cvhaardetectobjects" OpenCV can detect objects, this requires one pass between different arguments that specify the desired object classifier, this is a. Xml file that contains a series of data obtained from a training in this training teaches OpenCV to recognize an object using functions like "SetImagesForHaarClassifier Cascade". Besides, since the default OpenCV with several different ent classifiers for detecting the front (which has been used in the practice), and some profile faces (side view), the eye detection, detection of nose, mouth detection, detection of the whole we σ 2 j is the Figure 3. Mean capture Image and two eigenfaces After the eigenfaces data base generation we can apply the biometric evaluation, resumed in three steps: 1. Calculate the distance between the detected image and the stored templates.2. Select the template that is closest to the analyzed image.3. If this distance is greater than a threshold value, recognize the individual who is assigned the template compared, otherwise classify it as unknown Fingerprint Biometric For the fingerprint biometric we used the VeriFinger algorithm of features and capabilities, in combination with a optical sensor of fingerprints. This algorithm detects the bifurcations and end in the lines that forms the fingerprints to form the user templates. Figure 4. Characteristics detected by Verifinger The whole process is integrate in and multibiometic API, that we can selected the type of combination and algorithm selected. Primarily, we had implemented some test based on four users. The next table show some preliminary results of the system proposed. 23

30 Table 1. Results with User3 templates results (fingerprint), and the minor weight in the Actas del III Workshop de Reconocimiento de Formas y Análisis de Imágenes, WSRFAI, Sept Face Fingerprint Hand Password biometric with the poor results (hand geometry). User1 13,85% - 63,13% 20% Fingerprint weight: 2 User2 30,02% 0,30% 65,98% 14,29% Facial weight: 1 User3 82,84% 92,90% 100% 100% Hand Geometry weight: 0.25 User4 23,89% 0,30% 66,86% 42,86% We are at the moment evaluating more users and several combination algorithms. About these new tests we get the next results, based on twelve users (S1 S12). We have the next threshold accept values for the different five tests (T1 T5). Facial threshold: 70.00% (5/5 recognitions) Hand geometry threshold: 70.00% (4/5 recognitions) Fingerprint threshold: 10.00% (5/5 recognitions) We are trying to recognize the S1 user in all the tests. Table 2. Biometric Facial Tests Biometric Facial - % of Similarity S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 T T T T T Table 3. Biometric Hand Geometry Tests Biometric Hand Geometry - % of Similarity S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 T T T T T Table 5. Weighted Sum Model Tests Weighted sum model - % of Similarity S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 T T T T T Conclusions and future work In this paper we propose a solution to a common problem in most biometric systems, the update and selection of biometric templates in a database. The solution developed here offers a new paradigm of biometric authentication that intends to achieve two goals at a time: the evolution of stored templates with the real trait of the individual and the selection of those features that are characteristic of the individual (reducing intra-class differences) and that also differentiates him from other individuals (increasing interclass differences). The system proposed here has been validated with real users in a university environment, obtaining successful and promising results. A new version is proposed based in facial features, and hand geometric. The API proposed can integrate in an easy way all the combinations between biometrics geometries. We need a more exhaustive validation with a more number of user in order to detected the best ratio results of multibiometric combinations. References Table 4. Biometric Fingerprint Tests Biometric Fingerprint - % of Similarity S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 T T T T T Also, we have a combination algorithm for the three biometrics, we use the weighted sum model, with a threshold value of 35.00% (5/5 recognitions), and weight modificated in basis of the previous results, with a major weight in the biometric with the best [1] A. K. Jain; R. Bolle; S. Pankanti. Biometrics: Personal Identification in Networked Society, Kluwer Academic Publishers. ISBN United States of America, [2] R. Bolle; J.H. Cornell; S. Pankanti; N. K. Ratha; A. W. Senior. Guide to Biometrics, Springer-Verlag. ISBN United States of America, [3] X. Jiang; W. Ser. Online Fingerprint Template Improvement, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24. United States of America, [4] A.K. Jain; U. Uludag; A. Ross. Biometric Template Selection: A Case Study in Fingerprints, in Proc. of 4th Int'l Conference on Audio- and Video-Based Person Authentication (AVBPA). Guildford (UK), [5] T. Scheidat; A. Makrushin; C. Vielhauer. Automatic Template Update Strategies for Biometrics, Otto-von- Guericke University of Magdeburg. Germany,

31 Un Estudio Comparativo de Descriptores de Características para la Segmentación de Sinapsis y Mitocondrias Kendrick Cetina, Pablo Márquez-Neila, Luis Baumela Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid Abstract Presentamos un estudio comparativo de cinco descriptores de características discriminantes propuestos recientemente para la segmentación de sinapsis y mitocondrias en stacks de imágenes de microscopía electrónica FIB/SEM (Focused Ion Beam Scanning Electron Microscopy). 1. Introducción Descifrar la arquitectura del cerebro es uno de los retos más importantes que tiene planteada la ciencia [4]. Los avances que se han producido en los últimos años en los métodos de toma de imágenes de tejidos del cerebro posibilitan la adquisición automatizada de bloques seriados de alta resolución [5, 8]. El análisis de estas imágenes permite construir mapas detallados de las estructuras celulares y de sus conexiones, a partir de las cuales podremos comprender mejor las funciones cognitivas básicas del cerebro, tales como el aprendizaje y la memoria, así como sus patología asociadas [7]. Dos de las estructuras celulares que despiertan mayor interés son las sinapsis y las mitocondrias. Las sinapsis son el mecanismo fundamental de comunicación entre neuronas. La cuantificación de las sinapsis, y la identificación de los diferentes tipos y su distribución es fundamental para comprender el funcionamiento del cerebro como sus patologías [1]. La morfología y la distribución de las mitocondrias tiene también una gran importancia en la fisiología celular [2] y en la función sináptica [10]. Asimismo, morfologías o distribuciones atípicas de mitocondrias son un indicio de estados celulares anormales o la existencia de enfermedades neurodegenerativas [3]. Aunque hay herramientas para analizar manualmente este tipo de imágenes y segmentar ambas estructuras [14], su complejidad (ver Fig. 4) y el altísimo número de neuronas que hay una pequeña sección del cerebro, hace que la única solución práctica conste de algún tipo de procesamiento automatizado. Recientemente han aparecido trabajos proponiendo algoritmos de segmentación de sinapsis [9] y de mitocondrias [11, 6] que emplean diferentes características discrimiantes para cada una de estas estructuras. En este artículo vamos a estudiar y comparar algunas de estas características discriminantes en un problema de segmentación conjunta de sinapsis y mitocondrias. 2. Descriptores En esta sección se describen los descriptores sobre los que se ha realizado el estudio. Comenzamos con los descriptores de propósito general más simples y procedemos en orden creciente de sofisticación. Los GRIMS y los descriptores de rayos son descriptores diseñados específicamente para el tipo de imágenes de microscopía electrónica que analizamos en este artículo Ventana Simple e Histograma Un descriptor basado en una ventana simple se construye ordenando y almacenando en un vector los n n píxeles contiguos a un píxel dado. El descriptor basado en histogramas toma para cada píxel una vecindad de tamaño n n, sobre la cual calcula el histograma de sus niveles de gris. En [12], se utiliza un histograma junto con los descriptores de rayos de [15] (ver sección 2.4) como elementos del vector de características para la segmentación de mitocondrias. En nuestros experimentos hemos puesto a prueba el histograma de una ventana y los descriptores de rayos por separado. Un esquema de estas dos características se muestra en la figura LBP Los LBP [13] toman para cada píxel, un número k de puntos vecinos a una distancia r, donde r representa el radio desde el píxel central a sus vecinos. Si el valor del píxel central es mayor que el del vecino, se guarda 25

32 Imagen del Stack Actas del III Workshop de Reconocimiento de Formas y Análisis de Imágenes, WSRFAI, Sept [ ] [ ] Ventana Simple Histograma Figura 1: Descriptor de Histograma y Ventana Simple m θ c = c(i, m, θ) Figura 3: La función c retorna la posición c del contorno o borde más cercano en la imagen I a la posición m en la dirección definida por el ángulo θ. un 0, si no, un 1. Así, se genera un código binario de k dígitos. El descriptor de un píxel viene dado por el histograma de los códigos LBP obtenidos en una vecindad de tamaño n n. Este proceso se explica gráficamente en la figura Código=[ ] Figura 2: Dependiendo de los valores de los píxeles vecinos el LBP genera un código binario para luego obtener un valor real. En este caso el valor seria: = GRIMS Los descriptores GRIMS (Gaussian Rotation Invariant and Multi Scale) trabajan aplicando a cada imagen del stack una serie de operadores lineales a diferentes escalas: derivadas de orden cero, uno y dos. Debido a que los operadores se aplican a cada imagen del stack, consideraremos sólo una imagen en la siguiente discusión. Usamos un filtro Gaussiano en donde la desviación estándar define la escala de los operadores. Entonces los operadores lineales que se utilizan son: { G σ, σ G σ x, σ G σ y, σ2 G σ 2 x 2, 84 σ 2 G σ 2 xy, σ2 G σ 2 y 2 }, (1) en donde G σ es un filtro Gaussiano con radio σ y es el operador de convolución. El resultado de aplicar estos operadores a la imagen los llamaremos s 00, s 10, s 01, s 20, s 11 y s 02 donde el subíndice denota el orden de las derivadas. El vector de características que se calcula para cada píxel en la imagen a escala σ es { } s 00, s s201, λ 1, λ 2, (2) donde s s2 01 es la magnitud del gradiente y λ 1 y λ 2 son los autovalores de la matriz Hessiana: λ 1 = 1 ( ) s 20 + s 02 + (s 20 s 02 ) s 2 11 (3) λ 2 = 1 ( ) s 20 + s 02 (s 20 s 02 ) s 2 11 (4) Este procedimiento se repite para varias escalas σ 0,..., σ n 1, y dado que hay 4 características por cada escala, al final obtenemos un vector de características de tamaño 4n. En nuestros experimentos usamos n = 4 escalas diferentes, por lo tanto, en este caso, obtenemos una matriz de características con 16 dimensiones Rayos Los descriptores de rayos [13] se han utilizado en [12] para la segmentación de mitocondrias en supervoxeles. Los descriptores de rayos son eficientes en el análisis de esas formas y en la extracción de características discriminantes en figuras con una forma parecida a las mitocondrias. Las características de los descriptores de rayos dependen de la función: c = c(i, m, θ), (5) que calcula la posición c del contorno o borde más cercano en la imagen I a la posición m en la dirección definida por el ángulo θ. Ver figura 3. El descriptor de rayos para cada píxel en la imagen del stack, viene dado por: f Rayos (I, m, θ) = [f ndist, f norm, f ori ], (6) 26

33 Actas del siendo III Workshop de Reconocimiento de Formas y Análisis de Imágenes, Esto WSRFAI, debidosept. a que 2013 las ventanas deben tener un tamaño suficiente para que pueda contener dentro de ellas una f ndist (I) = c(i, m, θ) m (7) parte completa de la mitocondria o sinapsis y así tener f norm (I) = I(c(I, m, θ)) (8) información para la clasificación. I(c(I, m, θ)) f ori (I) = I(c(I, m, θ)) (cos θ, sin θ) (9) f ndist, es la distancia euclídea al punto c desde el punto m, donde c es el borde más cercano. f norm, retorna la norma del gradiente en el punto c. f ori, considera la orientación del punto en el borde más cercano en dirección θ. Cuando se evalúa un punto m que se encuentra en el centro o cerca del centro de una forma o figura cerrada, f ori suele devolver valores cercanos a 1. El paso final para obtener la matriz con las características es alinear los descriptores a una orientación canónica para hacerlo invariante a rotaciones. Esto es importante ya que pueden haber dos mitocondrias muy similares pero al tener una orientación diferente, los descriptores de rayos podrían devolver características muy distintas. Esto se resuelve ordenando el vector de características comenzando por la característica alineada con la dirección de máxima varianza. 3. Experimentos Para los experimentos usamos un stack de 11 imágenes con sus etiquetas, 5 de las cuales se usaron para el entrenamiento. La figura 4 muestra un ejemplo de las imágenes que utilizadas y sus etiquetas. Para los experimentos con el descriptor basado en el histograma se han variado los bins y el tamaño de la ventana para optimizar el rendimiento. Sin embargo, a medida que la ventana aumenta el porcentaje de sinapsis correctamente clasificadas decrece. La forma delgada de las sinapsis hace que una ventana pequeña las clasifique correctamente. Al crecer el tamaño de la ventana se toman en consideración píxeles innecesarios y, por lo tanto, se reduce el rendimiento del clasificador. Los LBP han mostrado el peor rendimiento entre los métodos de nuestra experimentación, debido a que las imágenes de microscopía electrónica tienen un nivel de ruido muy alto y una iluminación uniforme. En estas condiciones los LBP dan un rendimiento muy bajo. Hemos analizado distintos valores del radio y ventana para encontrar el mejor rendimiento. Los descriptores de rayos utilizan un clasificador SVM one-vs-rest con kernel RBF. Los parámetros libres C y γ se han escogido con validación cruzada obteniendo los valores óptimos C = y γ = 0,1. Uno de los principales problemas que experimentamos fue el balanceo de los datos. Nuestras imágenes tienen más datos del fondo, pocos datos de la clase mitocondria y aún menos de la clase sinapsis. Esto produce sesgos en el aprendizaje. Hicimos pruebas balanceando los datos para obtener los mejores resultados. El rendimiento de los métodos anteriores se puede ver cualitativamente en la figura 5 y cuantitativamente en la las curvas ROC para las clases mitocondria y sinapsis de la figura Conclusiones (a) Imagen Original (b) Label Figura 4: Imagen del Stack (a) y sus etiquetas (b), en color gris la clase mitocondria, en color blanco la clase sinapsis y en color negro, otros. Los resultados de la segmentación no han sido regularizados. Se ha utilizado un clasificador tipo Gaussiano y se ha probado también con un clasificador one-vs-rest SVM para los descriptores de rayos. La precisión de los descriptores de ventana simple mejoran a medida que la ventana aumenta de tamaño. El rendimiento de los GRIMS para la segmentación de mitocondrias es superior al de los otros descriptores mencionados. Los descriptores de rayos estuvieron incluso por debajo del rendimiento de una ventana simple tomada como descriptor. En cuanto a la segmentación de sinapsis, debido a su forma delgada y alargada, necesitan de una ventana de píxeles mas pequeña para una mejor segmentación. En nuestros experimentos utilizamos una ventana de mayor tamaño para la segmentación de las mitocondrias pero eso empeora la segmentación de sinapsis. Aún así, en el caso de las sinapsis, los descriptores: GRIMS, Ventana Simple e Histograma tienen un rendimiento similar, marginalmente superior para los dos segundos. 27

34 (a) Rayos (b) GRIMS (c) Histograma (d) Ventana Simple (e) LBP Figura 5: Resultados de la segmentacion. True Positive Rate True Positive Rate ROC Curve Class Mitochondria 0.6 GRIMS Histogram Simple Window 0.5 Rays with OneVsRest SVM Classifier Rays with Gaussian Classifier LBP False Positive Rate ROC Curve Class Synapse GRIMS 0.70 Histogram Simple Window 0.65 Rays with OneVsRest SVM Classifier Rays with Gaussian Classifier LBP False Positive Rate Figura 6: Curvas ROC para las clases mitocrondria y sinapsis. Referencias [1] L. Blazquez-Llorca, A. Merchán-Pérez, J. R. Rodríguez, J. Gascón, and J. DeFelipe. Fib/sem technology and alzheimer s disease: Three-dimensional analysis of human cortical synapses. ournal of Alzheimer s Disease, 34(4): , [2] S. Campello and L. Scorrano. Mitochondrial shape changes: orchestrating cell pathophysiology. EMBO Rep, [3] D.-H. Cho, T. Nakamura, and S. Lipton. Mitochondrial dynamics in cell death and neurodegeneration. Cellular and Molecular Life Sciences, 67(20): , [4] J. DeFelipe. From the connectome to the synaptome: An epic love story. Science, 330(6008): , [5] W. Denk and H. Horstmann. Serial block-face scanning electron microscopy to reconstruct three-dimensional tissue nanostructure. PLoS Biol, 2(11):e329, [6] R. Giuly, M. Martone, and M. Ellisman. Method: automatic segmentation of mitochondria utilizing patch classification, contour pair classification, and automatically seeded level sets. BMC Bioinformatics, 13(1):29, [7] N. Kasthuri and J. W. Lichtman. Neurocartography. Neuropsychopharmacology, [8] G. Knott, H. Marchman, D. Wall, and B. Lich. Serial section scanning electron microscopy of adult brain tissue using focused ion beam milling. Journal of Neuroscience, 28(12): , [9] A. Kreshuk, C. N. Straehle, C. Sommer, U. Koethe, M. Cantoni, G. Knott, and F. A. Hamprecht. Automated detection and segmentation of synaptic contacts in nearly isotropic serial electron microscopy images. PLoS ONE, 6(10):e24899, [10] D. Lee, K.-H. Lee, W.-K. Ho, and S.-H. Lee. Target cell-specific involvement of presynaptic mitochondria in post-tetanic potentiation at hippocampal mossy fiber synapses. The Journal of Neuroscience, [11] A. Lucchi, K. Smith, R. Achanta, G. Knott, and P. Fua. Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features. Medical Imaging, IEEE Transactions on, 31(2): , [12] A. Lucchi, K. Smith, R. Achanta, G. Knott, and P. Fua. Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features. Medical Imaging, IEEE Transactions on, 31(2): , [13] T. Mäenpää. The Local Binary Pattern Approach to Texture Analysis: Extenxions and Applications. Oulun yliopisto, [14] B. Schmid, J. Schindelin, A. Cardona, M. Longair, and M. Heisenberg. A high-level 3d visualization api for java and imagej. BMC Bioinformatics, 11(1):274, [15] K. Smith, A. Carleton, and V. Lepetit. Fast ray features for learning irregular shapes. In Computer Vision, 2009 IEEE 12th International Conference on, pages IEEE,

35 Real-Time Multiple-Face Age Estimation in Portable Devices Elisardo González-Agulla, Enrique Argones Rúa and José Luis Alba-Castro Abstract Age estimation through facial analysis has been the target of many researches in the last few years. Nowadays there are some commercial solutions that offer good performance on age estimation with goodquality frontal face shots, but their performance decreases when dealing with video sequences in real conditions and constrained computational resources. In this paper we present a complete video analysis system that processes a video flow in a portable device and estimates the age of any number of persons in the scene from 0.1 to 5.4 meters in real time. 1. Introduction Estimation of human age from face images is a challenging recent active topic that has many interest for the computer vision community because it allows to improve many practical applications, like Human- Computer-Interaction, Forensics, Biometrics, Video- Surveillance, Audience Estimation, etc. [1][2]. It can be argued that face-based age estimation is even more challenging than face recognition. Added to the classical problems that affect face recognition, like pose, illumination and expression, age estimation is largely affected by internal or external factors like, genes, ethnicity, living style, illness, sun exposure, etc. This great variety of inputs prevents a faithful modeling of the aging process and explains why it is so difficult even for human beings, to accurately estimate the age of a person from a single picture [3]. The research on this field has been lately intensified due to the compilation of face databases including exact age of the donors, like FG-NET [4], MORPH [5] or BIOSECURE [6]. This label information allows to train multiple-class classifiers (child, young, youngadult, adult, senior) and age regressors [3][7]. Elisardo González-Agulla and José Luis Alba-Castro are with the AtlantTIC Research Center of the University of Vigo, Spain ( Enrique Argones Rúa was at the same institution when doing this work but now he is with GRADIANT. Support Vector Classifiers and Regressors have been successfully applied previously [8]. Classification errors and MAE (Mean Absolute Error) similar to human beings have been obtained with good quality, frontal or near frontal, face images. So, when dealing with video sequences, one question arises: which frame or frames contain the best quality face images for age estimation? In this paper we show a system that answers this question by continuously analyzing the tracked faces in a video sequence to select the best shots regarding the statistics of the dataset used for training. The buffer with the best shots feeds an age estimator that yields a guessed age and confidence interval. This guess is updated if better shots are gathered. In the next sections we will describe the whole process from the video source to the asynchronous writing in a DataBase, taking into account the constrained computational resources. Section 2 deals with the analysis of the scene and simultaneous tracking of the faces present in it. Here it is also explained how features are extracted from best quality face sub-images. Section 3 is dedicated to the age regressor, Section 4 describes the decision mechanism and Section 5 concludes the paper with an example of performance in a smartphone. 2. Analysis of people in the scene Compilation of the OpenCV library [9] for many O.S. has made possible to run state-of-the-art face detection algorithms over almost any embedded video source. Nevertheless, some important adaptations have to be made to optimize the simultaneous tracking of existing faces and searching for new ones appearing in the scene. Two main contributions to speeding up the detection process have been developed. First, the image is scanned in parallel in different threads, each one in charge of a scale range. This way, people at different distances from the camera can be detected at the same time. Depending on the resolution of the 29

36 video stream, a different number of threads are automatically launched to scan different distances of the scene. Second, every time a candidate face is detected, a new thread is created and a particular detector-tracker tandem is set up. This way, the main program can be devoted to searching new faces appearing in the scene at a configurable rate and secondary threads keep tracking of previously detected candidates. New candidate faces in the scene are rapidly evaluated also by using motion-based foreground-background segmentation Tracking and collecting good quality faces Every thread that is in charge of a candidate face has to keep track of it and follow the flow diagram represented in Figure 1. Figure 1: Flow diagram of the tracking-thread for a candidate face. Once a candidate face is detected, a meanshift tracker [10], working on a chromaticity space, and a Kalman filter [11], on the location and size of the face, are launched to predict the position of the face in the next frame. A new observation is then searched in a neighborhood of the predicted location to minimize computational burden in the thread. If a preset time passes without a new detection, the candidate face is discarded, otherwise, the location and size of the new observation update the tracker. Now the previous question arises: is that face good enough to feed the age regressor? In section 4 we will explain how the regressor works, but at this moment we just need to know that the dataset used to train the regressor was mainly composed by frontal faces of medium to high quality (as most face datasets publicly available). A face whose features can hardly be explained by the dataset statistics doesn t yield a reliable age estimation. So a compliant quality filter has been defined and all the images, both for training, testing and operation will have to pass that filter before feeding the age regressor. The filter has two components: one that measures the blurriness of the face image by means of a Laplacian and a Sobel filter and another that measures left-right symmetry, making a Blurriness-Symmetry (B-S) filter. The proportion of each component is set during training [12]. However, in a real scenario it is not odd that a false detection contains a sharply symmetric subimage that would easily pass the B-S filter but it was never seen during training. In order to minimize that risk, a statistical filter is set in cascade with the B-S filter. This statistical filter is the distance to the subspace created with Principal Component Analysis of the filtered training set. A combined quality score will represent the quality of the face candidate with regard to the particular training set. Figure 2 shows some candidate face images with their corresponding combined filter values. 2.2 Feature extraction Once the candidate face has passed the filter, the face is preprocessed to gain robustness against illumination changes using the retina model [13]. Then, a geometrically normalized grid is virtually overlaid on the face image and anchored to the eyes position, so they are weakly aligned. This grid has a fixed number of nodes where a texture descriptor is evaluated. We are using a 15x10 grid and a multiscaled and multi-oriented set of Gabor filters, j (j=5 scales and 8 orientations, see Eq.1) [14]. The magnitudes of the 40 dimensional complex-valued vector Jj(x) resulting from the convolution of the Gabor filters at every node represents the texture of the face around that node (Eq. 1), and the 6K concatenated vector represents the texture of the face. The shape of the face, which is an important descriptor for age estimation, is implicitly represented within this vector. 30

Editorial S. Schnitzer E. Torres Orue S. Pérez Lovelle G. Vargas-Solar N. Das H. Ordoñez

Editorial S. Schnitzer E. Torres Orue S. Pérez Lovelle G. Vargas-Solar N. Das H. Ordoñez Editorial T HIS issue of Polibits includes ten papers by authors from nine different countries: Brazil, Colombia, Cuba, France, Germany, India, Mexico, Portugal, and Spain. The majority of the papers included

Más detalles

DELIVERABLE 6.9. Contract Descriptions at Semantic Level for Dynamic Service Composition

DELIVERABLE 6.9. Contract Descriptions at Semantic Level for Dynamic Service Composition OPAALS PROJECT Contract n IST-034824 WP6: Socio-Economic Constructivism & Language DELIVERABLE 6.9 Contract Descriptions at Semantic Level for Dynamic Service Composition Project funded by the European

Más detalles

WP6: Socio-Economic Constructivism & Language MILESTONE 6.2. December 1 st 2007 - April 30 th 2008

WP6: Socio-Economic Constructivism & Language MILESTONE 6.2. December 1 st 2007 - April 30 th 2008 OPAALS PROJECT Contract n IST-034824 WP6: Socio-Economic Constructivism & Language MILESTONE 6.2 December 1 st 2007 - April 30 th 2008 Contracting and Negotiation Processes in the IST Sector (Region of

Más detalles

A reference book regarding the main legal aspects related to the deployment and use of IPv6

A reference book regarding the main legal aspects related to the deployment and use of IPv6 A reference book regarding the main legal aspects related to the deployment and use of IPv6 Legal Aspects of the New Internet Protocol IPv6 Aspectos Legales del Nuevo Protocolo de Internet Manual de de

Más detalles

Design of Case Studies for Enterprise Project Management by Dennis Vandenbussche

Design of Case Studies for Enterprise Project Management by Dennis Vandenbussche Universiteit Gent Faculteit Ingenieurswetenschappen Vakgroep Technische Bedrijfsvoering & Universidad de Zaragoza Centro Politécnico Superior Departamento de Ingeniería de Diseño y Fabricación Área de

Más detalles

Trabajo Fin de Máster

Trabajo Fin de Máster Trabajo Fin de Máster Integración dinámica de entornos de computación heterogéneos para la ejecución de workflows científicos Autor Sergio Hernández de Mesa Director Pedro Álvarez Pérez-Aradros Escuela

Más detalles



Más detalles


UNIVERSIDAD DE EXTREMADURA UNIVERSIDAD DE EXTREMADURA Escuela Politécnica Máster en Ingeniería Informática Trabajo Final de Máster Desarrollo de un Sistema de Información para realizar búsquedas por contenido en imágenes de satélite,

Más detalles



Más detalles



Más detalles



Más detalles


VIDEO STREAMING MODELIG OVER OPTICAL NETWORKS VIDEO STREAMING MODELIG OVER OPTICAL NETWORKS Author: Marcos Esgueva Martínez Head of department: Algumantas Kajackas Supervisor: Dr. Sarunas Paulikas Supervisor UC3M: Carmen Vázquez Academic Coordinator

Más detalles

1st Symposium on Information Management and Big Data. 8th, 9th and 10th September 2014 Cusco - Peru PROCEEDINGS

1st Symposium on Information Management and Big Data. 8th, 9th and 10th September 2014 Cusco - Peru PROCEEDINGS 1st Symposium on Information Management and Big Data 8th, 9th and 10th September 2014 Cusco - Peru PROCEEDINGS SIMBig, Andina University of Cusco and the authors of individual articles. Proceeding Editors:

Más detalles

Licenciada en Filología Inglesa

Licenciada en Filología Inglesa Human Translation versus Machine Translation and Full Post-Editing of Raw Machine Translation Output Lorena Guerra Martínez Licenciada en Filología Inglesa A dissertation submitted to Dublin City University

Más detalles

Ingeniería de Telecomunicación

Ingeniería de Telecomunicación Universidad Autónoma de Madrid Escuela politécnica superior Proyecto fin de carrera A NON-INTRUSIVE APPLIANCE LOAD MONITORING SYSTEM FOR IDENTIFYING KITCHEN ACTIVITIES Ingeniería de Telecomunicación María

Más detalles

UTRECHT UNIVERSITY. Master Thesis Business Informatics. Gaining insight from professional online social networks to support business decisions

UTRECHT UNIVERSITY. Master Thesis Business Informatics. Gaining insight from professional online social networks to support business decisions i UTRECHT UNIVERSITY Master Thesis Business Informatics Gaining insight from professional online social networks to support business decisions August 29, 2014 Version V1 Author: A. Melendez

Más detalles

Metodologías de Desarrollo de Interfaces de Usuario Dinámicas

Metodologías de Desarrollo de Interfaces de Usuario Dinámicas Antonio Fernández-Caballero (Ed.) Metodologías de Desarrollo de Interfaces de Usuario Dinámicas Desarrollo de Interfaces de Calidad I Jornada sobre Metodologías de Desarrollo de Interfaces de Usuario Dinámicas

Más detalles



Más detalles


Project 3C. Culture Competitiveness Creativity PRACTICAL GUIDE FOR THE DEVELOPMENT OF THE CULTURAL SECTOR AND THE PROMOTION OF CREATIVITY The project is co-financed by the European Commission under the Interreg III/C South initiative. PRACTICAL GUIDE FOR THE DEVELOPMENT OF THE CULTURAL SECTOR AND THE PROMOTION OF CREATIVITY Project 3C Culture

Más detalles

Multicore and GPU Programming

Multicore and GPU Programming Multicore and GPU Programming EDITORS Miguel A. Vega-Rodríguez Manuel I. Capel-Tuñón Antonio J. Tomeu-Hardasmal Alberto G. Salguero-Hidalgo 2015 Title: Multicore and GPU Programming Editors: Miguel A.

Más detalles



Más detalles

2007 Census Of Agriculture

2007 Census Of Agriculture 2007 Census Of Agriculture This application is designed to provide quick access to the 2007 Census data. It can be run directly from a CD or placed onto your computer's hard drive or a network drive (see

Más detalles

Proceedings. Selected papers of the EFQUEL Innovation Forum 2012, Granada (Spain), 5-7 September 2012

Proceedings. Selected papers of the EFQUEL Innovation Forum 2012, Granada (Spain), 5-7 September 2012 Proceedings Selected papers of the EFQUEL Innovation Forum 2012, Granada (Spain), 5-7 September 2012 EFQUEL Innovation Forum 2012 ISSN: 2294-9755 Copyright EFQUEL Editors: Anne-Christin Tannhäuser, Anthony

Más detalles

Optimizing Multi-Core Algorithms for Pattern Search

Optimizing Multi-Core Algorithms for Pattern Search Optimizing Multi-Core Algorithms for Pattern Search Veronica Gil-Costa 1,2, Cesar Ochoa 1 and Marcela Printista 1,2 1 LIDIC, Universidad Nacional de San Luis, Ejercito de los Andes 950, San Luis, Argentina

Más detalles



Más detalles

A Multivariate Analysis Approach to Forecasts Combination. Application to Foreign Exchange (FX) Markets

A Multivariate Analysis Approach to Forecasts Combination. Application to Foreign Exchange (FX) Markets Revista Colombiana de Estadística Junio 2011, volumen 34, no. 2, pp. 347 a 375 A Multivariate Analysis Approach to Forecasts Combination. Application to Foreign Exchange (FX) Markets Una aproximación a

Más detalles

Virtual Archaeology Review. Virtual Museums S P E C I A L I S S U E. VAR. Volumen 3 Número 7. ISSN: 1989-9947

Virtual Archaeology Review. Virtual Museums S P E C I A L I S S U E. VAR. Volumen 3 Número 7. ISSN: 1989-9947 1 Virtual Museums S P E C I A L I S S U E VOLUMEN 3 NÚMERO 7 DICIEMBRE 2012 ISSN Diciembre 1989-9947 2012 EQUIPO EDITORIAL EDITORIAL TEAM VIRTUAL ARCHAEOLOGY REVIEW Directores / Directors Alfredo Grande

Más detalles

Libro de actas DRT4ALL 2013. V Congreso Internacional de Diseño, Redes de Investigación y Tecnología para todos

Libro de actas DRT4ALL 2013. V Congreso Internacional de Diseño, Redes de Investigación y Tecnología para todos V Congreso Internacional de Diseño, Redes de Investigación y Tecnología para todos Fundación ONCE para la Cooperación e Inclusión Social de las Personas con Discapacidad. 2013. Edita: Fundación ONCE para

Más detalles

Advances in Telematics

Advances in Telematics Advances in Telematics Research in Computing Science Series Editorial Board Editors-in-Chief: Grigori Sidorov (Mexico) Gerhard Ritter (USA) Jean Serra (France) Ulises Cortés (Spain) Associate Editors:

Más detalles



Más detalles