Domains of expertise
To process data efficiently, a strategy regarding knowledge building and handling is needed. Each project needs and creates knowledge. Therefore we need to store and access this new knowledge as well as linking it in a coherent network of meaning. This strategy revolves around three axis.
Building knowledge in a meaningful way and save it in a manner that allow easy access for new projects. How to add semantic to multiple type of data and handle its evolution when new discoveries come.
CONCEPTUAL REPRESENTATION OF KNOWLEDGE
Conceptual representation of knowledge cannot be limited to reference textbooks and dictionaries. It is therefore crucial to explore new possibilities to represent and efficiently use this knowledge. Formal languages, graph databases or ontology engines are some examples of those approaches.
Lexico-sementic resources are the fuel of the knowledge engineering domain. They contain the semantic and allow the coherent linking of data generated in the healthcare setting. Building and maintaining of those resources are a major role of this domain.
One of the two main branches of Artificial Intelligence (AI) resides in symbolic knowledge-driven approaches. Symbolic AI systems represent language phenomena via logical rules.
Lexical analysis is the first step for Natural Language Processing tasks. Words or combinations of words (compound words) occur in texts in their inflected forms (conjugated verbs, plurals, etc.) or in the form of variants such as abbreviations. Lexico-semantic resources such as formalized electronic dictionaries of simple and compound biomedical terms constitute a valuable resource.
Local grammars and finite state automata are developed to describe specific rules of the French medical language (syntactic constraints, lexical ambiguities) and locate occurrences of the patterns in texts. Information and relation extraction tasks are performed based on pattern matching.
KNOWLEDGE REPRESENTATION AND REASONING
Representation of information and logical relations via rules, knowledge graphs, ontologies, semantic nets so that a computer system can solve complex tasks.
To extract valuable information from textual data, one primordial step is to evaluate automatically in which category the document lies in.
When subtle categories needs to be defined a posteriori (not present when generating the description/metadata of the document) automatic classification algorithm allow to tremendously save manual work and enable finer and quicker knowledge extraction.
Possible applications range from automatic classification of documents (i.e. retain only radiology reports describing the presence of new a scaphoid fracture), to classification of shorter textual items (i.e automatic classification of medical concepts into international or local medical classifications such as ICD-10, CHOP, etc).
With “You shall know a word by the company it keeps” J. R. Firth stated already in 1957 the base for developing modern word representations also known as “word embeddings”.
The basic idea is to represent a part of textual data (for example a word) as a vector instead of an index in a vocabulary, to enable automatic and unsupervised “learning” from co-occurrences in large textual corpora.
This technique shows very good performances to deal with words with different meanings (i.e. a “fish bank” versus “bank account”), or understand similarities between concepts (“broken bones” will be represented similarly to “fractured bones” or “fracture of a bone”). French Medical narratives are very specific and call for advanced and specialized word embeddings.
Data generation is increasing exponentially and automatic tools to extract meaningful information become critical in many fields. In care, up to 80% of valuable information is hidden in free text.
As an example, automatic detection of adverse drug effect improves patient safety and permits to find correlations from large collections of patient.
Information retrieval tools aims to find some specific piece of information from large data collections, which can be later analyzed manually within a reasonable timeframe, or used in various clinical decision support systems.