Stanford HPC Summer Speaker Series: HPC Workload Profiling Using VTune Amplifier XE – July 12th@noon

Date for Tutorial: Tuesday, 12 July, 2016
Time: Noon
Duration: 1.5 hours
Location: (Peterson Engineering Laboratory, 550 Panama Mall, Room 200)
Title: HPC Workload Profiling Using VTune Amplifier XE
Hybrid programming models, that utilize both OMP and MPI for efficient parallel scalability, are getting more complex.  Adding to the complexity of SW development, the advancement in HW designs like Intel® Xeon Phi™ processors with many cores and multiple vector processing units (VPU) per core and fast MCDRAM option offers excellent vector performance to HPC workloads.   On the serial performance, workload developers need to make use of all core design features including complex FPs and Integer instruction SSE, SIMD, AVX2, AVX512, … to obtain highest FPs and thus least execution time.  Learning the best compiler options for a particular workload as well as the memory layout of the systems like NUMA are also important.  For parallel performance tuning, scalability of OMP and MPI requires detailed OMP performance analysis and MPI communication profile.   OMP analysis may include overall data load imbalance distributed over number of OMP threads, lock and wait, thread synchronization, … An MPI communication profile can help to reduce the cost of doing communication.   Intel Parallel Studio suit includes a comprehensive set of performance tools which can be effectively use to do these tasks.  In particular, the powerful Intel VTune Performance Analyzer tool is well suit to capture deep dive performance characterization of HPC workloads.  In this presentation, we will cover Intel VTune Performance Analyzer and hands-on demo of it usage to study HPC workload performance.
Speaker Bio:
Thanh Phung is a senior HPC engineer at Intel leading the HPC workload performance characterization and performance tuning.  Thanh joined Intel in 1992 working for the Supercomputer System Division (SSD) as an on-site HPC scientist at NASA/Ames and Caltech.  From 1998 to 2000 Thanh worked for Intel developing HPC tools for optical proximity correction (OPC) lithography.  From 2000 to present, Thanh worked for Intel SSG/DPD/TCAR specializing in employing performance tools like Intel VTune performance analyzer and ITAC for message profiling to do HPC workload deep dive performance analysis, vectorization tuning using  SIMD/AVX2/AVX512, OMP/MPI/Hybrid programming and scalability. Thanh holds a Ph. D. in Chemical Engineering with emphasis in CFD at Caltech in 1992.