Overview and Coverage
The Japanese Multiword Expression Lexicon (JMWEL) is a comprehensive database with a rich set of grammatical attributes fine-tuned for phrase-based Natural Language Processing (NLP) applications such as machine translation (MT), information retrieval (IR) as well as morphological analysis and syntactic analysis of a wide-range of Japanese documents. It contains about 140,000 entries covering almost all kinds of linguistically idiosyncratic but commonly used Japanese phrases, i.e. idioms, quasi-idioms, collocations, quasi-collocations, clichés, quasi-clichés, proverbs, and old sayings.
JMWEL consists of eight basic sub-lexicons each of which reflects their individual grammatical functions, i.e. sub-lexicon of nominal, verbal, adjectival, adjective-verbal, adverbial, adnominal, connective, and functional expressions. In addition, JMWEL has five topic-based sub-lexicons such as of standard-idioms, onomatopoeic-expressions, adverbs/sayings/clichés, syntactically ill-formed-phrases, and four-kanji-idioms.
Notable features of JMWEL are summarized as follows:
- Extensive collection of expressions
- Deliberate description of the notational variants of each expression
- Detailed description of morphological and syntactic structure of each expression
- Description of internal modifiability, i.e. possibility of the presence of gaps, in each expression
The following table shows the fields for basic information which are common to all sub-lexicons. Some sub-lexicons have a few more fields, depending on their specificities.