Description |
Most multiword expressions (MWEs), especially verbal ones, such as "to pull one's leg", "to make up for sth" or "to pay a visit", are semantically non-compositional. Therefore, their automatic identification in running text is a prerequisite for semantically-oriented downstream applications. This talk will offer a summary on the recent developments around the MWE identification, including multilingual corpus annotation and construction of computational identification models. We will analyze the results of the PARSEME shared task on automatic identification of verbal MWEs and show that this task is harder than related tasks. We will further analyze possible reasons for this state of affairs. They lie in the nature of the MWE phenomenon, as well as in its distributional properties. We will also offer a comparative analysis of the state-of-the-art systems, which exhibit particularly strong sensitivity to unseen data. On this basis, we claim that, in order to make strong headway in MWE identification, the community should bend its mind into coupling identification of MWEs with their discovery, via syntactic MWE lexicons. Such lexicons need not necessarily achieve a linguistically complete modelling of MWEs’ behavior, but they should provide minimal morphosyntactic information to cover some potential uses, so as to complement existing MWE-annotated corpora. We define requirements for such a minimal NLP-oriented lexicon, and we propose a roadmap for the MWE community driven by these requirements.
|