Emerging high performance computing systems consist of multiple latency optimized processors (e.g., CPUs) and throughput optimized processors (e.g. GPUs) interconnected using a high performance network. The performance of such heterogeneous systems is directly dependent on the processor memory hierarchy. This course covers the trade-offs when designing a high performing memory hierarchy for CPUs, GPUs, and heterogeneous systems consisting of multiple CPUs and GPUs. We will show that different design constraints yield different solutions when designing the memory hierarchy. The lectures will cover fundamental design concepts and state-of-the-art research in virtual memory, cache hierarchy, and main memory systems.
The course will also include personal experiences on commercializing research ideas and also include discussions topics on areas for continued research.
Aamer Jaleel is a Principal Research Scientist in the Architecture Research Group (ARG) at NVIDIA. Prior to joining NVIDIA, Dr. Jaleel was a Principal Engineer in the Versatile Systems and Simulation Advanced Development (VSSAD) group in Intel. During his decade-long career at Intel, Dr. Jaleel's research work contributed towards enhancement in performance modelling and cache hierarchy improvements of Intels next generation microprocessors. Dr. Jaleel received his Ph.D. in Electrical Engineering from the University of Maryland, College Park in 2006. He received his B.S. and M.S. in Computer Engineering, also from the University of Maryland, College Park in 2000 and 2002 respectively. Dr. Jaleel has co-authored more than a dozen patents and over 30 technical publications.