1 A Specification and Validating Parser for Simplified Technical Spanish Thesis submitted to the University of Limerick for the degree of M.Sc. in Software Localisation Presented by Remedios Ruiz Cascales Supervised by Dr. Richard F. E. Sutcliffe
2 Acknowledgements I would first like to thank my supervisor, Dr. Richard F. E. Sutcliffe, for his constant support. Without his help this project would not come to light today. Many thanks also to the University of Limerick who have at all times provided me with everything that I have needed to develop this work. I would also like to thank the CASA staff who kindly showed me around their headquarters and provided me with essential material for this research. Finally, I would like to express my gratitude to my parents and my brother, José, and his wife, Laura, who have always encouraged and supported me in the development of this thesis.
3 Abstract A Controlled Language is a subset of a natural language which has a restricted lexicon and controlled grammatical structures. The objective of a Controlled Language is to improve readability, standardisation, accessibility and translatability of documentation. In the last 30 years, there has been a steady increase in the design of Controlled Languages and related applications in different domains. A well-known example is the language designed by AECMA (European Association of Aerospace Industries) in order to increase the readability of aircraft maintenance manuals. Some attempts have been made to produce specifications for languages other than English. Examples include FREM (Français Rationalise Entendu Modulaire) for French aircraft maintenance manuals, ScaniaSwedish for truck maintenance documentation, and Controlled Siemens Documentary German for software documentation. However, no research has been undertaken on the development of a Controlled Spanish. The objective of the work is to design a Controlled Spanish for aircraft maintenance manuals which is similar to AECMA Simplified English and FREM. The stages involved in the work include the development of a number of writing rules in conjunction with a dictionary, and the design and evaluation of a Controlled Spanish checker to verify conformance with some of the rules. The result comprises the Simplified Technical Spanish (STS) Specification, which includes 36 rules, the STS General Vocabulary, which consists of a list of 875 words, and the STS Parser which has been designed to identify five different types of error which relate to six rules in the STS Specification. Being the first of its type, this project intends to be a contribution towards the improvement of the readability and translatability of technical documentation in Spanish.
4 Table of Contents Chapter 1 Introduction Outline Objectives Controlled Languages The STS Specification The STS Parser and its Evaluation Structure of the Thesis...3 Chapter 2 Controlled Languages: An Introduction Outline Controlled Languages Background Definitions of Controlled Language Controlled Languages and Sublanguages Controlled Languages and Style Guides AECMA Simplified English (SE) Other Controlled Languages Acceptance and Implementation Summary...10 Chapter 3 Controlled Language Checkers Outline Controlled Language Checkers Controlled Language Checkers Currently in Use Summary...17 Chapter 4 STS Design: Objectives and Method Outline Aircraft Maintenance Documentation in Spain Justification for a Controlled Spanish The Method The STS Writing Rules The STS General Vocabulary Summary...30 Chapter 5 STS Specification Outline...31
5 5.2 The STS Writing Rules Words Noun Phrases Prepositional Phrases Verbs Sentences Procedures Descriptive Writing Warnings and Cautions Punctuation and Word Counts The STS General Vocabulary Summary...62 Chapter 6 STS Parser: Description Outline The Robust Layered Parser The Simplified Spanish Parser Checks Built into the Parser The Grammar Summary...74 Chapter 7 STS Parser: Evaluation Outline Objectives Method Results using Metric Results using Metric Conclusions Summary...82 Chapter 8 Final Conclusions Introduction Objectives Key Results Future Research Summary...87 Bibliography...88 Appendix I Reference Corpus...93
6 Appendix II The STS Writing Rules...96 Appendix III The STS General Vocabulary...99 Appendix IV The Xerox Xelda Spanish Part-of-Speech Tagset Appendix V Reference Corpus Descriptive Sentences for Analysis Appendix VI Reference Corpus Procedural Sentences for Analysis Appendix VII Sample Output Produced by STS Parser on Analysis of Descriptive Sentences Appendix VIII Sample Output Produced by the STS Parser on Analysis of Procedural Sentences Appendix IX Description of Errors Appendix X Distribution of Error Types in Descriptive Sentences: STS Parser Results Appendix XI Distribution of Error Types in Procedural Sentences: STS Parser Results Appendix XII Distribution of Error Types in Descriptive Sentences: Manually Determined Results Appendix XIII Distribution of Error Types in Procedural Sentences: Manually Determined Results Appendix XIV Distribution of Errors Undetected by the STS Parser in Descriptive Sentences Appendix XV Distribution of Errors Undetected by the STS Parser in Procedural Sentences Appendix XVI Controlled-Language Acronyms Appendix XVII Controlled Language Research and Development Sites...138
7 1.1 Outline Chapter 1 Introduction This chapter starts by stating the objectives of the research. It then outlines the field of Controlled Languages and describes how a specification for Simplified Technical Spanish was drawn up. It then introduces the Validating Parser which was designed for this language and explains how it was evaluated. Finally, the structure of the thesis is summarised. 1.2 Objectives The objectives of the work presented here are: To develop a controlled Spanish for the aviation industry similar in spirit to AECMA Simplified English (SE) (AECMA, 1998), To build and evaluate a validating parser for the new language. In order to meet these objectives, a study of controlled languages (CLs) was first carried out. Following this a corpus of aircraft documentation was obtained and studied, leading to a complete specification for Simplied Technical Spanish (STS). A parser for this language was then developed and its peformance measured. These activities are outlined in the following sections. 1.3 Controlled Languages A Controlled Language (CL) is a subset of a natural language with a restricted grammar and lexicon. CLs are commonly used in technical domains to reduce ambiguity and to facilitate the translation of technical documentation. We carried out a thorough study of previous work on CLs paying special attention to AECMA SE. As will be shown, there are many CLs for English and a growing number for other languages. However, there are many common aspects of these CLs which we were able to adopt in the specification for STS. 1.4 The STS Specification In order to develop the controlled Spanish specification, we studied a reference corpus (1.45 MB) kindly provided by Construccciones Aeronaúticas Sociedad Anónima (CASA), the
8 Introduction 2 leading aircraft manufacturer in Spain (Appendix I). By examining the AECMA specification in detail and by comparing constructs in English with their equivalents in Spanish it was possible to draw up a specification for STS, the first of its kind for the Spanish language. The STS Guide contains 39 writing rules which are divided into 9 sections: words, noun phrases, prepositional phrases, verbs, sentences, procedures, descriptive writing, warnings and cautions, and punctuation and word counts. The following is a summary of the main STS Rules: Use only those words contained in the STS Lexicon, Do not use different terms for the same item, Use articles wherever possible, Do not attach more than three prepositional phrases in a sentence, Do not use the infinitive with a value other than final or imperative, Do not use the past participle with a value other than adjectival, Do not use the present participle with a value other than modal, Do not use the subjunctive mood, Avoid the use of the passive, The sentence length limit for descriptive sentences is 30, while it is 25 for procedural sentences, Use accents ( ) wherever needed, also in capital letters. See Appendix II for the complete version of the STS Rules. The STS Lexicon is composed of a technical terms database and a General Vocabulary. The database is divided into technical names and manufacturing processes. The General Vocabulary contains 875 words. Having developed the specification, the next step was to build a validating parser for it. 1.5 The STS Parser and its Evaluation The Robust Layered Parser (Sutcliffe, 2000) was taken as a basis for the STS Parser design. This is a simple parser but it is easily adapted to different languages and very suitable for applications requiring a partial analysis. The linguistic features of other CL checkers currently in use were studied, focussing in particular on the Boeing Simplified English Checker (BSEC) (Wojcik, Harrison and Bremer, 1993). This led to a specification for a parser which would detect five types of error. These are directly related to Rules 6.1 & 7.1,
9 Introduction 3 1.1, 4.3, 3.1 and 6.2 respectively (See Appendix IX). Following the development of a parser to meet the above specification, an evaluation was carried out using two different metrics. The first measures its performance relative to the five errors which it was designed to detect. The second measures its performance relative to all errors in the test collection, including those which the parser was not designed to detect. Under Metric 1, Precision and Recall figures of 0.87 and 0.97 respectively were recorded. The figures under Metric 2 were 0.87 and 0.68 respectively. These results suggest that the performance with respect to the five chosen errors types is quite good, but that additional checks should be added in order to generate a production system for use by technical writers. 1.6 Structure of the Thesis Chapter 2 is an introduction to the field of CLs and includes a literary review of previous work undertaken in this field. In Chapter 3 we find an analysis of the most important CL tools and a description of some checkers currently in use. Chapter 4 explains the method followed for the design of STS Specification and the creation of the General Vocabulary. Chapter 5 presents the specification and vocabulary themselves. Chapter 6 then describes the STS Validating Parser. An evaluation of it is presented in Chapter 7. Finally, Chapter 8 summarises the work undertaken, draws conclusions and makes recommendations for future research.
10 Chapter 2 Controlled Languages: An Introduction 2.1 Outline This chapter gives an overview of Controlled Languages (CLs). It starts by investigating the background of CLs. Then it discusses some definitions adopted in the literature and establishes the differences between CLs and sublanguages, and CLs and style guides respectively. Next, the main features of AECMA Simplified English and a brief description of other CLs is presented. Finally, the issue of acceptance and implementation of CLs is addressed. 2.2 Controlled Languages Background The notion of CL was first proposed by C. K. Ogden, a British writer and linguist, in his book Basic English, A General Introduction with Rules and Grammar (Ogden, 1932). Basic English was a subset of the English language which, with 850 words, intended to be able to give the sense of anything which might be said in English. At the time, Odgen s objective was to create a language which would allow people of different countries to intercommunicate easily. Basic English differed from previous attempts to construct universal languages in that it was a perfectly well-formed part of English, rather than some entirely artificial or hybrid construction such as Esperanto (Janton, 1994). Although Basic English was never popular in the 1930 s, the idea was taken on later in the 1970 s by multinational companies such as Caterpillar, Ericsson, Kodak and Xerox. By using a controlled English, their objective was to write technical manuals which service engineers and mechanics with limited English skills could easily read, thus avoiding translation costs. Although this was the first intention, two important additional benefits were discovered with CLs: firstly, improvement of readability not only for non-native speakers, but also for native speakers of English and, secondly, better results when using machine translation (MT) and translation memory (TM) applications. As a consequence of the advantages that a CL can offer, there has been a steady increase in the design of CLs and related applications in different domains in the last thirty years. While the majority of work on CLs has focused on English, some attempts have been made to produce specifications for other languages. Examples include GIFAS Français Rationalisé (Lux, 1998), a CL for French aircraft maintenance manuals, ScaniaSwedish (Almqvist and
11 Controlled Languages: An Introduction 5 Hein, 1996), a CL for truck maintenance documentation, Controlled Siemens Documentary German (Schachtl, 1998) for software documentation and Controlled Chinese (Zhang and Shiwen, 1998). However, while shallow parsers have been developed for Spanish (e.g. Gala Pavia, 1999), as well as grammar checkers (e.g. Ramírez and Sánchez, 1996), little research has been undertaken on the development of a controlled Spanish. 2.3 Definitions of Controlled Language There are many definitions in the literature on CLs and they are all very similar. Here are some examples: Van der Eijck, De Koning and Van der Steen (1996: 64) Controlled languages are constructed languages that have precise coverage bounds and are designed to satisfy linguistic constraints such as greatly reduced ambiguity. Hayes, Maxwell and Schmandt (1996: 84) The kinds of control usually considered include restricting the vocabulary used in a document, restricting the allowable meanings of particular words or phrases, restricting the kinds of syntactic constructions that may be used, and restricting the overall complexity of sentences. A collection of restrictions of these kinds is said to define a Controlled English (CE). Heald and Zajac (1998: 124) A Controlled Language is an extreme application of writing rules as expounded in technical writing manuals. Huijsen (1998: 2) A controlled language is an explicitly defined restriction of natural language that specifies constraints on lexicon, grammar, and style. In summary, the objective of a controlled language is to improve readability, standardisation, accessibility and translatability of documentation. This is achieved through increased terminological consistency and standardisation, generally simplified sentence structure, and standardised document format and layout. 2.4 Controlled Languages and Sublanguages A sublanguage is the language used in a specific domain, such as the special set of terms and language used by those knowledgeable of biology, geology, or artificial intelligence. Each of these domains has a vocabulary and grammatical structures specific to its needs. According to Kittredge (1996) this idea of sublanguage would coincide with the concept of natural sublanguage. The critical difference between a CL and a sublanguage is that the
12 Controlled Languages: An Introduction 6 terms, syntax and semantics of a CL are actively and purposefully proscribed, generally with particular objectives in mind, while the restrictions of a sublanguage are unspecified and evolve naturally. 2.5 Controlled Languages and Style Guides There are similarities between CLs and style guides, as both aim to allow information to be transmitted accurately, quickly and economically. However, CLs are designed with other additional objectives: the usability, retrieveability, extractability, and translatability of documents. Also, style guides have fewer constraints on vocabulary and grammatical constructions than CLs, making them less restricted. Lux (1998) has shown that style guides are predecessors of controlled languages. 2.6 AECMA Simplified English (SE) AECMA SE comprises a restricted vocabulary of 1,565 words with an additional set of 57 rules for using that vocabulary (AECMA, 1998). SE originated when in 1979 the Association of European Airlines (AEA) asked the European Association of Aerospace Industries (AECMA) to investigate the readability criteria of its aircraft maintenance documentation. Through its Documentation Working Group (DWG), AECMA set up a project group called the Simplified English Working Group (SEWG), to research the problem and provide a solution. SE was the result of this initiative. After analysing existing texts, a draft list of verbs was identified in The Aerospace Industries Association (AIA) of America also joined the project. In 1986 an initial issue of the SE Guide was made (AECMA Document PSC ). In 1987, the Airline Transport Association (ATA) of America, in its Specification 100 (the Definitive Reference for commercial aircraft support documentation) made AECMA SE a mandatory requirement. In 1995, the SE Guide was re-issued with many new features (Farrington, 1996). The author of SE can use only three sources of words. Firstly there are Approved Words from the SE Guide. These constitute the base vocabulary which contains 1,565 words. There are 196 verbs in the base vocabulary and these are approved in four forms: the infinitive, the third person singular, the past simple and the past participle. Manufacturers then add Technical Names and Manufacturing Processes to the base vocabulary. For the Boeing Simplified English Checker these comprise 7,000 extra terms (Wojcik, 1998).
13 Controlled Languages: An Introduction 7 There are 57 Writing Rules in the SE Guide (AECMA, 1998). The following are some of the better known ones: A sentence length limit of 20 words (25 for descriptive text), A paragraph limit of 6 sentences, A compound noun length limit of 3 words, A prohibition on progressive be and perfective have, A prohibition on the passive in procedures (discouraged in descriptions), A prohibition on the ing form of the verb, A requirement that sequential steps be in separate sentences, A requirement that words only be used in their approved sense, A recommendation that articles be used where possible. 2.7 Other Controlled Languages Some examples of controlled languages in English other than AECMA SE are: Agilent Technologies English (Smartny, 2002), Attempto Controlled English (ACE) (Fuchs and Schwitter, 1996), Boeing Technical English (Wojcik, Holmback, and Hoard, 1998), Caterpillar Technical English (Kamprath, Adolphson, Mitamura, and Nyberg, 1998), Diebold Controlled English (Moore, 2000), Ericsson English (Ericsson, 2000), General Motors CASL (Means and Godden, 1996), Global English (Means, Chapman and Liu, 2000), Kodak English (Kodak, 2000), Nortel Standard English (Smartny, 2002), Océ Technologies English (Smartny, 2002), Perkins PACE (Douglas and Hurst, 1996), Xerox MCE (Xerox, 2001). Examples in other languages include: Controlled Chinese (Zhang and Shiwen, 1998), Controlled Siemens Documentary German (Schachtl, 1998), GIFAS FR for French (Barthe, 1998), ScaniaSwedish (Almqvist and Hein, 1996).
14 Controlled Languages: An Introduction 8 In the next few lines, we give a brief description of the CLs mentioned above. The Chemical Analysis Group of Agilent Technologies (Smartny, 2002) has adopted a Controlled English to write materials for Mass Spectrometry and Gas chemical analysis equipment. Attempto Controlled English (ACE) (Fuchs and Schwitter, 1996) is a subset of English with restricted grammar, a domain-specific vocabulary, and a small set of construction and interpretation principles. Fuchs and Schwitter (1996:125) ACE is a computer processable subset of English for writing requirements specifications. Domain specialists can express their concepts in ACE in a direct and natural way using the objects of the language as abstract entities. Specifications written in ACE are textual views of formal specifications in logic. Boeing Technical English (Wojcik, Holmback, and Hoard, 1998), is a general-purpose controlled English writing standard intended to result in clear and precise technical documents. The idea originates from Boeing s need to improve the readability and consistency of a wide variety of documents for customers of different linguistic backgrounds. Caterpillar Technical English (CTE) (Kamprath, Adolphson, Mitamura, and Nyberg, 1998): is the Controlled English developed at Caterpillar corporation for heavy machinery documentation. The first controlled language developed at Caterpillar was called Caterpillar Fundamental English (CFE) which later became CTE. Diebold Controlled English (Moore, 2000) is a specification whose objective is to facilitate a translation process that uses human translators. It comprises a vocabulary and a set of grammar rules. Ericsson English (Pathinfo, 2002) is the CL used for maintenance manuals relating to Ericsson equipment. General Motors uses two different CL systems each intended for different purposes: Controlled Automotive Service Language (CASL) and Global English (GE). Controlled Automotive Service Language (CASL) (Means and Godden, 1996) is a CL that restricts
15 Controlled Languages: An Introduction 9 grammar and terminology of vehicle service information in order to achieve high-quality machine translation into French. Global English (GE) (Means, Chapman and Liu, 2000) is a CL which consists of 15 rules. It has been adopted by General Motors University (GMU) for all their training courses for English in the U.S. and overseas, and to facilitate translation. The objective is to improve comprehension for non-native speakers of English and to improve translatability. Kodak International Service Language (KISL) (Kodak,2000) is the CL used at Kodak for their maintenance manuals. The main concept behind KISL is one word, one meaning. It has a vocabulary of around 1,000 words. Nortel Standard English (Smartny, 2002) is a controlled English vocabulary for telephony global customer documentation. Nortel Networks is a rapidly growing manufacturer of telephony equipment. Océ Technologies English (Smartny, 2002) is a CL to write global product support documentation. Océ Technologies, B. V. is a multinational Dutch manufacturer of highspeed copiers. The Océ line of copiers are well-known for their reliability and are thus used for applications such as is printing credit card and customer billing statements. Perkins Approved Clear English (PACE) (Douglas and Hurst, 1996) is the CL for Perkins International Limited, based in Peterborough, England. PACE was based on the Caterpillar model. It consists of a lexicon of approximately 2,500 words, together with a set of ten writing rules (Pym 1990): 1. Keep sentences short. 2. Omit redundant words. 3. Order the parts of the sentence logically. 4. Do not change construction in mid-sentence. 5. Take care with the logic of and and or. 6. Avoid elliptical constructions. 7. Do not omit constructions or relatives. 8. Adhere to the PACE dictionary. 9. Avoid strings of nouns. 10. Do not use ing unless the word appears thus in the PACE dictionary.
16 Controlled Languages: An Introduction 10 Finally, Xerox Multinational Customised English (MCE) (Xerox, 2001) is a CL for its documentation division. Turning to languages other than English, Controlled Chinese (CC) (Zhang and Shiwen, 1998), is a precisely defined subset of Chinese. It has a restricted lexicon and grammar. The objective is to make all aspects of Chinese text manipulation easier. Controlled Siemens Documentary German (CSDG) (Schachtl, 1998) is a Controlled German developed at Siemens Corporation to improve the translatability of its technical documents, thus allowing the use of their machine translation tool TopTrans. GIFAS Français Rationalisé (FR) (Barthe, 1998) is a CL for the French aerospace industry. It was originally based on AECMA SE. The objective of FR is to improve the quality of aircraft documentation in French. ScaniaSwedish (Almqvist and Hein, 1996) is a Controlled Swedish for truck maintenance documentation. By writing in ScaniaSwedish, the technical writers would like to facilitate a high quality translation of documentation into seven European languages in a full version, and partly into three others. 2.8 Acceptance and Implementation There has been some resistance to using CLs because of the difficulties many authors face when first using CL tools. Both the CL itself and the checking/correction tools must be learned, and this can be frustrating or inconvenient for engineers and technical writers (Goyvaerts, 1996; Godden, 1998). However, CLs do meet their objectives: Holmback, Shubert and Spyridakis (1996:168) With relatively complex documents, the use of SE (Simplified English) will significantly improve comprehension (and) translations of the SE (Simplified English) versions of the procedures produced significantly higher ratings for style match and significantly fewer minor omissions than translations of the non-se versions. 2.9 Summary In this chapter we presented the background of CLs and gave a brief description of the various CLs in English and other languages. Frequently, those organisations that adopt a CL develop tools which help them to use it for their documentation. In the next chapter, we
17 Controlled Languages: An Introduction 11 explain what language technology can do to facilitate the use of CLs and describe some of the most important CL checkers being used at present.
18 3.1 Outline Chapter 3 Controlled Language Checkers In this chapter, we first present a general description of the most common tools used by technical writers to help them comply with a CL specification. Then we discuss the key linguistic features of the main CL checkers implemented in a number of companies. 3.2 Controlled Language Checkers The most commonly used tools to help technical writers in their process of writing in a CL are grammar checkers, terminology checkers, style checkers and alternative generators. A CL checker usually integrates some or all of these tools. CL grammar checkers indicate whether the controlled text is grammatical or not by performing text parsing and pattern matching against a set of pre-defined grammar rules. CL terminology checkers specify whether the words used in the controlled text are allowed or not by using a terminology database against which the text is checked. CL terminology databases are usually developed through corpus analysis. They may be general, domain specific, or company specific. CL style checkers analyse document types, formats, and layouts. Some of the stylistic conventions which can be checked are date and currency formats, table formats, and spelling variants. CL alternative generators provide a possible phrase or sentence which can substitute for one which does not conform to the specifications of the CL. In most cases, a modification of the original text will be sufficient, but in some cases, an entirely new sentence or expression will be necessary. 3.3 Controlled Language Checkers Currently in Use The Boeing Simplified English Checker The Boeing Simplified English Checker (BSEC) is used to check text conformance with AECMA SE. It is a CL application that has been in production since It uses a grammar formalism based on Generalised Phrase Structure Grammar (Gazdar, 1985). A set of over 350 rules is used to achieve a broad coverage of English technical writing (Wojcik, Harrison and Bremer, 1993). It is based on a bottom-up, left-corner parser which first generates a forest of parses, each taking a different interpretation of the input and then makes a best guess of the correct syntactical analysis.
19 Controlled Language Checkers 13 ERROR TYPES POS NON-SE MISSING ARTICLE PASSIVE TWO-COMMAND ING COMMA ERROR WARNING/CAUTION DESCRIPTION A known word is used in incorrect part of speech An unapproved word is used Articles must be used wherever possible in SE Passives are usually illegal Commands may not be conjoined when they represent sequential activities. Simultaneous commands may be con-joined Progressive participles may not be used in SE A violation of comma usage Warnings and cautions must appear in a special format. Usually, an error arises when a declarative sentence has been used where an imperative one is required. Figure 3.1 Error Types Detected by Boeing Simplified English Checker (Wojcik, Harrison and Bremer, 1993) Some of the more important requirements of AECMA Simplified English that the BSEC can detect are: sentence length (20 or 25 words), paragraph length (6 sentences), noun cluster length (3 words or less), missing articles (based on count and mass distinctions), unapproved verbal auxiliaries (passive, progressive, perfect, modals), unapproved ing participles, multiple commands in a single sentence, and warning, caution and note errors. Figure 3.1 shows a list of the error types detected by BSEC. No other Simplified English checker is as complete or accurate in support of Simplified English requirements as BSEC. The Boeing checker also catches some grammatical and stylistic errors that are not explicitely addressed in the SE standard. Among other things, it detects subject-verb agreement errors, double word errors, misspelled words and punctuation problems.
20 Controlled Language Checkers 14 Figure 3.2 The Boeing Meaning-Based Checker Architecture (Holmback, Duncan, and Harrison, 2000) Boeing has also developed an experimental Meaning-Based Checker (BMBC) to generate more accurate analyses. The BMBC builds on the syntax-based BSEC by adding the capability 1) to determine when an approved word is used in an unapproved meaning and 2) to select only those alternatives for an ambiguous unapproved word that are appropriate for the meaning in which it is used (Holmback, Duncan and Harrison, 2000). POS WORD APPROVED MEANING UNAPPROVED MEANINGS Noun noise unwanted sound electronic interference Verb extinguish stop combustion turn off Adj clear without blockage transparent obvious Prep by manner near Adv together in one place at the same time Tech application piece of software with a result or action of applying Noun specific purpose Tech Adj telescopic retracting into seeing with a telescope Table 3.3 Some Simplified English Word Sense Restrictions in BMBC. (Holmback, Duncan, and Harrison, 2000)