Complexity and Variation in Language and its Application in Medical Coding

A. Complexity and variation in language and its application in medical coding

This project is built around two poles: Complexity, Variation and Frequency in natural languages: The interfaces between linguistic theory, experimental psycholinguistics and computational linguistics and Development of a multiontological categorizer for medical coding. The first pole is basic empirical and theoretical research while the second has an applied orientation. The shared interests of the poles are to better understand the linguistic computation of negation and of non-local dependencies in grammar. Both constitute important challenges in the encoding of medical texts as well as in linguistics, psycholinguistics and computational linguistics.

Complexity, Variation and Frequency in natural languages: The interfaces between linguistic theory, experimental psycholinguistics and computational linguistics

The goals of ComplexVar are to elucidate some of the central issues raised by complexity, variation and frequency in formal descriptions of natural languages, and to deepen scientific collaboration between the language-related disciplines of Syntax, Psycholinguistics and Computational Linguistics. ComplexVar is composed of four interrelated research modules. Two modules are devoted to the study of word order: The Syntax of Word Order and Variation. The Frequency module tackles cross‐linguistic variation from a computational perspective. The third module studies interference effects in syntax with impact on language-acquisition and processing. Finally, the Negation module investigates the structural and pragmatic properties of metalinguistic negation. All modules raise theoretical questions addressed empirically about variation and complexity, deploying different methodologies, namely classical linguistic analysis, experimental methods and computational ones.

Development of a multiontological categorizer for medical coding

In Switzerland, we are facing an increasing pressure on clinical coding. Indeed, clinical coding is crucial to move unambiguously from narrative texts in clinical records to structured data, thus opening the way to the continuous exchange of information. However, at the present time, there is no system, which is used in Switzerland to fulfill the needs of clinical coding. In this context, the MiCMaC project proposes to develop a tool assisting the clinical coding for a terminology of major importance for the clinical domain such as LOINC and SNOMED-CT. Compared to existing tools to code clinical data, the MiCMaC project not only finds appropriate concepts in a terminology, but also associates them following the SNOMED-CT language to construct sentences. Such processing will require focusing on two natural language processing challenges: negation and non-‐local dependencies. The collaboration with the project ComplexVar will be of major importance to identify relevant strategies. Mainly, the project will require: 1) to obtain and prepare a large corpus of clinical documents; 2) to identify key examples regarding the two defined challenges; 3) to investigate and evaluate different NLP approaches (e.g. grammar-based vs. probabilistic) and 4) to code the clinical corpus with SNOMED-CT.