Environment and context of the research
It is undeniable that artificial intelligence is now critical to the competitiveness of French industry by contributing to innovation-based growth. In this context, the integration and/or safe use of artificial intelligence-based technologies is essential to support engineering, industrial production and the development of innovative products and services. « Industrialization of artificial intelligence for mission-critical systems » is one of the major objectives of the national Grand Défi Confiance.IA. This industrialization imperative requires providing an environment to support design, validation and testing. It will focus on reinforcing confidence, explainability, and even allow the certification of artificial intelligence. A group of major industrialists in the fields of Defense, Transportation and Energy has been formed to present the roadmap of this program confiance.ai, with the support of leading academic partners. The SystemX Technological Research Institute is coordinating this program.
The IRT SystemX is located at the heart of the Paris-Saclay scientific campus of world excellence, and has the ambitions of a world-class technological research center in the field of digital systems engineering. Its mission is to generate new knowledge and technological solutions based on breakthroughs in digital engineering and to disseminate its skills in all economic sectors.
The subject of the thesis has been defined by the consortium gathered in the framework of the confiance.ai program and more precisely in the EC7 project (“embarquabilité de l’IA”).
The supervising of the thesis will be ensured by Thomas CARLE (lecturer at IRIT*), Christine ROCHANGE (professor at IRIT*), and Eric JENN (Center of Competence leader at IRT Saint Exupéry** and referent supervisor in the Confiance.IA EC7 project). The PHD student will be enrolled at the Ecole Doctorale Mathématiques, Informatique et Télécommunications de Toulouse (EDMITT).
The research activity will be carried out within the TRACES team of the IRIT lab. Static timing analysis is at the heart of the team’s research activities, with contributions to the modeling of microprocessors, to the determination of possible execution paths within application source and binary code, and to the analysis of task interferences in multi-core platforms. The team has developed OTAWA, an open-source toolset for timing analysis, that is well known in the community and has been used by several research groups as well as industrial partners.
Within the IRT SystemX, the doctoral student will be hierarchically attached to the scientific axis « Infrastructures numériques » whose manager is Makhlouf Hadji.
The position is based in Toulouse.
* Institut de Recherche en Informatique de Toulouse (IRIT) depending on the CNRS, INPT, Universités Toulouse I, II and III.
** Institut de Rercherche Technologique Saint Exupéry, Toulouse
Graphics Processing Units (GPUs) are already used as accelerators for neural network (NN) inference computations in many applications and systems, including those embedded in cars, robots, etc. However, when it comes to integrate GPUs in critical real-time systems, several challenges remain to be addressed.
An important challenge is to guarantee the respect of the timing constraints of the system. Indeed, mature timing analysis techniques target traditional processors (CPUs) which exhibit low instruction-level parallelism, whereas GPUs implement a completely different execution model, relying heavily on thread-level parallelism.
The objective of this thesis is to study and develop new methods for the timing analysis of NN-based applications accelerated on GPUs. Those methods will rely on the abstract interpretation theory to over-approximate the execution model of GPUs, and leverage the particular properties of NN-based applications. Indeed, such applications display typical computation patterns that can be exploited in the analysis: they are mainly composed of matrix products and convolutions implemented using regular loops that access memory in predictable patterns. In this thesis, we will take advantage of this shape to facilitate the analysis of NN applications running on GPUs.
State of the Art
The very specific execution model of GPUs does not fit existing micro-architectural static analysis techniques. In particular, the SIMT (Single Instruction Multiple Threads) execution mode, including the handling of control flow divergence among synchronous threads, and the hardware scheduling scheme requires a substantial extension of usual timing models. In addition, due to the closed nature of GPU devices, details on these mechanisms must be inferred from more or less reliable sources and assessed by micro-benchmark-based reverse engineering techniques.
Few contributions have been made to the Worst-Case Execution Time (WCET) analysis of a task running on a GPU. In  and , focus is put on handling branch divergence among threads that belong to a same warp. A more general analysis and modelling of a GPU’s behaviour is reported in . Surprisingly, there was no follow up of these works in the last 6 years, probably because GPUs were not really considered to implement hard real-time systems until very recently. With the emergence of new applications with very high computation requirements, such as perception, planning, localization, prediction and control modules found in autonomous vehicles, that are mostly based on neural network algorithms, a regain of interest in GPU-based platforms for real-time systems is observed. This has driven recent research that focused on discovering hidden (non-documented) information about how GPUs process code. In , a systematic method to decode GPU (Cuda) binary is introduced. Other papers use micro-benchmarking techniques to complete or clarify the public GPU documentation, in particular with regard to their memory system , synchronisation facilities  and internal scheduling algorithms .
The objective of the thesis is to develop a complete WCET analysis framework for neural networks accelerated on GPUs. This will be achieved according to the following schedule:
- The PhD student will start by conducting a thorough review of the state of the art regarding all aspects of the analysis: thread divergence modelling, warp scheduling policies, memory hierarchies, latencies and interference.
- She/He will then design the necessary models and algorithms to derive a safe yet precise WCET for a warp in isolation and implement them in the OTAWA open source WCET analysis framework (developped and maintained at IRIT) [Une reférence à ce framework serait bien]. These models will target primarily the specifics of neural networks, mostly regular loops, with predictable memory access patterns, but more general patterns will also be considered.
- Once tested and validated, these models will be extended to allow the analysis of a thread block composed of multiple warps. This step involves gathering information on the policy implemented by the warp scheduler, by looking for already available information and by designing micro-kernels to test relevant hypotheses and reverse engineer the warp scheduler if need be.
- The last step of the thesis involves enabling the impact that multiple kernels can have on each other when running in parallel on a GPU.
Techniques developed in this thesis will be exercised on one or several industrial Use Cases of the Confiance.AI project.
-  V. Hirvisalo, ‘On Static Timing Analysis of GPU Kernels’, presented at the 14th International Workshop on Worst-Case Execution Time Analysis (WCET 2014), 2014.
-  A. Betts and A. Donaldson, ‘Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis’, in 2013 25th Euromicro Conference on Real-Time Systems, Los Alamitos, CA, USA, Jul. 2013, pp. 193–202. doi: 10.1109/ECRTS.2013.29
-  K. Berezovskyi, ‘Timing Analysis of General-Purpose Graphics Processing Units for Real-Time Systems: Models and Analyses’, Faculdade de Engenharia da Universidade do Porto, Departamento de Engenharia Electrotécnica e de Computadores, 2015.
-  A. B. Hayes, F. Hua, J. Huang, Y. Chen, and E. Z. Zhang, ‘Decoding CUDA Binary’, in 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Feb. 2019, pp. 229–241. doi: 10.1109/CGO.2019.8661186.
-  X. Mei and X. Chu, ‘Dissecting GPU Memory Hierarchy Through Microbenchmarking’, IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 1, pp. 72–86, Jan. 2017, doi: 10.1109/TPDS.2016.2549523.
-  M. Yang, N. Otterness, T. Amert, J. Bakita, J. H. Anderson, and F. D. Smith, ‘Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems’, 2018.
-  I. S. Olmedo, N. Capodieci, J. L. Martinez, A. Marongiu, and M. Bertogna, ‘Dissecting the CUDA scheduling hierarchy: a Performance and Predictability Perspective’, in 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Apr. 2020, pp. 213–225. doi: 10.1109/RTAS48715.2020.000-5.
Candidate must hold a master or engineering degree in information processing, computer science, telecommunication
Master or engineer diploma in Computer Science with a specialty in embedded systems development
Strong interest for hardware aspects of software engineering, formal methods, machine learning
Open-mindedness / innovation, Autonomy, Rigor/tenacity
Team work/teaching skills, Listening (customer relation) / Relational skills