Multimedia Multilingual Integration
The IMM project comes within a context of an increase in the data produced and disseminated in the world, with the volume doubling every year. The project has to fill a need for the development of tools to help a monitoring user extract knowledge from an unstructured data stream (mainly text and audio) that is useful at a given time for producing a report or making a decision.
- Monitoring application platform and prototype
The prime objective is the deployment of a testing, development and evaluation platform for components and applications dedicated to the analysis of multimedia and multilingual content provided by partners (information extraction, speech transcription, translation, information searching, and graph analysis).
- Adaptation for a new language – particularly a little-used language
To be able to deploy and evaluate the complete chain of processing operations, the various components of the platform must be able to process data in different languages (obligatorily French, English and Arabic and, optionally, Russian, Chinese and Persian).
- Ability to overcome noise and adaptation of the processing system to the style of the document
The goal is to study the process of constitution of resources from corpus to adapt a system to a particular style. This is a keystone element of the project that will ensure the robustness of the templates to noise and style variations. The system needs to be able to adapt the processing operations to the properties and salient characteristics of the analyzed documents.
- Advanced information extraction
Improved quality of analysis of individual documents and extraction of basic semantic information, such as named entities, has to benefit higher level functions such as the extraction of facts from documents with diverse content, and navigation within information search results.
The studies to be undertaken will notably address upscaling, support for multiple information items within the links and nodes of the network, support for its dynamic aspect, and the development of visualization tools suitable for large-scale networks. The domains of usage targeted are contingency management, cybersecurity and strategic monitoring.
Doctoral thesis supported by the project
- Real-Time analysis of diffusion processes on large scale social networks (Université Pierre et Marie Curie (LIP6))
- Towards coherent probabilistic knowledge bases (EDIPS / CNRS)