IWOMP 2020 – Conference Program

Due to the COVID-19 pandemic, the IWOMP 2020 workshop will be completely virtual. Tutorial Day will take place on Monday, Sept. 21,
followed by the technical program, Sept. 22-24.

To help accommodate EU participation the program was changed to three 6-hr days.

The Tutorial Day will, however, be 7 hours in length.

All times are given for US Central Daylight Time (CDT), UTC -5.

REGISTER TODAY

Monday, September 21 | Tutorials

All times are shown in US Central Daylight Time (CDT) zone.

Time Tutorial Name Presenter
8:00 – 10:00 OpenMP Common Core
In this tutorial, we’ll undertake an interactive exploration of the common core of OpenMP. We will cover through discussion and demos the 20 features of OpenMP programmers use all the time. This will give you what you need to become an OpenMP programmer. It will also prepare you for the more advanced OpenMP tutorials later in the day.The tutorial will be based on the C programming language, though Fortran programmers should be able to keep up as we avoid the more complex corners of C. The contents of the tutorial focusses on novice OpenMP programmers, but more experienced OpenMP programmers will benefit as well; they’ll just need to download the exercises (I’ll provide a link to a GitHub repository) so they can explore on their own during the tutorial, adding all the complexity they need to stay engaged.
Tim Mattson, Intel
Tim Mattson is a parallel programmer obsessed with every variety of science (Ph.D. Chemistry, UCSC, 1985). He is a senior principal engineer in Intel’s parallel computing lab.Tim has been with Intel since 1993 and has worked with brilliant people on great projects including: (1) the first TFLOP computer (ASCI Red), (2) MPI, OpenMP and OpenCL, (3) two different research processors (Intel’s TFLOP chip and the 48 core SCC), (4) Data management systems (Polystore systems and Array-based storage engines), and (5) the GraphBLAS API for expressing graph algorithms as sparse linear algebra.Tim is passionate about teaching. He’s been teaching OpenMP longer than anyone on the planet with OpenMP tutorials at every SC’XY conference but one since 1998. He has published five books on different aspects of parallel computing, the latest (Published November 2019) titled “The OpenMP Common Core: making OpenMP Simple Again”.
10:00 – 10:30 Break
10:30 – 12:30 Advanced OpenMP
This tutorial will cover several advanced topics of OpenMP:

  • NUMA Aware programming
  • Vectorization / SIMD
  • Advanced Tasking

Michael Klemm, Intel

Dr. Michael Klemm is part of the Datacenter Ecosystem Engineering
organization of the Intel Architecture, Graphics and Software group. His focus is on High Performance and Throughput Computing.  He holds an M.Sc.  in Computer Science and a Doctor of Engineering degree (Dr.-Ing.) in Computer Science from the Friedrich-Alexander-University Erlangen-Nuremberg, Germany.  Michael’s research focus is on compilers and runtime optimizations for distributed systems.  His areas of interest include compiler construction, design of programming languages, parallel programming, and performance analysis and tuning.  In 2016, Michael was appointed Chief Executive Officer of the OpenMP Architecture Review Board.
12:20 – 13:00 Break
13:00 – 15:00 OpenMP Offload (GPUs)
While most HPC developers have MPI experience, many are not familiar with the latest OpenMP features and how to use them in their codes. Modern OpenMP can be used in creative ways to map parallelism to current and emerging parallel architectures. This  tutorial that describes recent and new features of OpenMP for accelerators,  with a focus on those features that have proven important applications.The OpenMP 5.1 specification is scheduled for release at SC’20 with exciting new functionality that improves support for Accelerators.Thus it is important not only for developers to be aware of the current standard, and what is available today, but also what is coming next and what will be available in the exascale time frame (2021).The tutorial is expected to cover the following topics:

  • Overview of what is available today in OpenMP 5.0 for GPUs
  • An overview of the accelerator programming model
    • Examples on how to map data structures
    • How to exploit the parallelism in the target regions
  • How to use OpenMP target and tasking constructs to manage and orchestrate work and communication between the CPUs and accelerators and inter-node communication.
  • Some examples of success stories on how applications have used OpenMP to leverage GPUs on HPC systems.
  • Other uses: Using OpenMP offload via other frameworks Raja/Kokkos
  • A deeper introduction to OpenMP 5.1 and preview of  the latest features in the new spec to manage memory for heterogenous shared memory spaces, unified addresses / memories, deep copy, detach tasks, etc.

Oscar Hernandez, Oak Ridge National Laboratory &
Tom Scogland, Lawrence Livermore National Laboratory
Oscar Hernandez received a Phd in Computer Science from the University of Houston. He is a senior staff member of the Computer Science Research (CSR) Group, which supports the Programming Environment and Tools for the Oak Ridge Leadership Computing Facility (OLCF). He represents ORNL in many specifications such as OpenACC, OpenMP, and benchmarking bodies SPEC/HPG. He is currently part of the SOLLVE ECP team and is the application liaison for the project. At ORNL he also works closely with application teams including the CAAR and INCITE efforts and constantly interacts with them to address their programming model and tools needs via HPC software ecosystems. He is currently working on the programming environment for Summit and works very closely with the vendors to track the evolution of the next-generation programming models. He has worked on many projects funded by DOE, DoD, NSF, and Industrial Partners in the Oil & Gas industry.


Thomas R. W. Scogland received his PhD degree in computer science from Virginia Tech in 2014. He is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. His research interests include parallel programming models, heterogeneous computing and resource management at scale. He serves on the OpenMP Language Committee, the WG14-C and WG21-C++ committees, and as co-chair of the Green500.

Tuesday, September 22 | Program

All times are shown in US Central Daylight Time (CDT) zone.  Presentation Slides are provided at the discretion of the paper authors,
and will not be available until the presentation or shortly afterwards.

Time Presentation
08:00 Introduction

Dan Stanzione, Director of TACC (UT, Texas)

08:10 Keynote I: A Tale of Four Packages

Jack Dongarra, Distinguished Professor, University of Tennessee, Oak Ridge National Laboratory and University of Manchester

Abstract.  In this talk we will look at four software packages for dense linear algebra. These packages have been developed over time.  Some of these packages are intended as research vehicles and others as production tools.  The four packages are: BALLISTIC to sustain LAPACK and ScaLAPACK packages.  PLASMA research into multicore implementations.  MAGMA research into GPU implementations. SLATE dense linear algebra for Exascale machines.  We will look at how OpenMP is being used in each.


Speaker Profile: Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers.  He was awarded the IEEE Sid Fernbach Award in 2004;

  • in 2008 he was the recipient of the first IEEE Medal of Excellence in Scalable Computing;
  • in 2010 he was the first recipient of the SIAM Special Interest Group on Supercomputing’s award for Career Achievement;
  • in 2011 he was the recipient of the IEEE Charles Babbage Award;
  • in 2013 he received the ACM/IEEE Ken Kennedy Award;
  • in 2019 he received the ACM/SIAM Computational Science and Engineering Prize, and in 2020 he received the IEEE Computer Pioneer Award.

He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a foreign member of the Russian Academy of Science, a foreign member of the British Royal Society, and a member of the US National Academy of Engineering.

09:00 Papers I: Performance Methodologies
FAROS: A Framework To Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis

Giorgis Georgakoudis, Johannes Doerfert, Ignacio Laguna and Tom Scogland

Evaluating the Efficiency of OpenMP Tasking for Unbalanced Computation on Diverse CPU Architectures

Stephen L. Olivier

10:00 Break
10:30 Papers II: Applications
A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload

Christopher Daley, Hadia Ahmed, Samuel Williams and Nicholas Wright

P-Aevol: an OpenMP Parallelization of a Biological Evolution Simulator, Through Decomposition in Multiple Loops

Laurent Turpin, Thierry Gautier, Jonathan Rouzaud-Cornabas and Christian Perez

Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application

Eric Raut, Jie Meng, Mauricio Araya-Polo and Barbara Chapman

12:00 Break
12:30 Papers III: OpenMP Extensions
Unified Sequential Optimization Directives in OpenMP

Brandon Neth, Tom Scogland, Michelle Mills Strout and Bronis R. de Supinski

Support Data Shuffle Between Threads in OpenMP

Anjia Wang, Xinyao Yi and Yonghong Yan

13:30 Sponsor Tech Talk I

Wednesday, September 23 | Program

All times are shown in US Central Daylight Time (CDT) zone. Presentation Slides are provided at the discretion of the paper authors,
and will not be available until the presentation or shortly afterwards.

Time Presentation
08:00 Keynote II: Programming Models for a Mixed-Signal AI Inference Accelerator

Eric Stotzer, Mythic Inc.

Abstract. This talk will cover Mythic’s hybrid mixed-signal computing architecture and unique software development tools, including a Deep Neural Network (DNN) graph compiler. In addition, some ideas will be proposed on how OpenMP can be used to program this type of architecture. Mythic’s Intelligence Processing Units (IPUs) combine analog compute-in-memory acceleration with digital processing elements. They are designed for high-performance and power-efficient AI inference acceleration.

The Mythic IPU is a tile-based dataflow architecture. Each tile has an analog compute array, flash memory for weight storage, local SRAM memory, a single-instruction multiple-data (SIMD) unit, and a control processor. The tiles are interconnected with an efficient on-chip router network.  Mythic has built a unique suite of development tools, including a DNN graph compiler, to enable the rapid deployment of AI inference applications on the IPU. The tools perform such actions as mapping DNNs to tiles, setting up dataflow conditions, and analog-aware program transformations.


Speaker Profile: Eric Stotzer (Ph.D. University of Houston, 2010) is a Fellow and Director of Compiler Research at Mythic.  He is currently working on developing new programming models for mixed-signal AI inference accelerators. Eric was previously with Texas Instruments for 31 years where he was a Distinguished Member Technical Staff. Over the years, he has worked on compilers and tools for micro-controllers and digital signal processors as well as hardware/software co-design efforts for new processor architectures. Eric was the TI representative on the OpenMP Architecture Review Board and co-chair of the OpenMP accelerator subcommittee. Eric is a co-author of Using OpenMP – The Next Step, The MIT Press.

09:00 Papers IV: Performance Studies
Performance Study of SpMV Towards an Auto-tuned and Task-based SpMV (LASs Library)

Sandra Catalan, Tetsuzo Usui, Leonel Toledo, Xavier Martorell, Jesús Labarta and Pedro Valero-Lara

A Case Study on Addressing Complex Load Imbalance in OpenMP

Fabian Orland and Christian Terboven

10:00 Break
10:30 Papers V: Tools
On-the-fly Data Race Detection with the Enhanced OpenMP Series-Parallel Graph

Nader Boushehrinejadmoradi, Adarsh Yoga and Santosh Nagarakatte

AfterOMPT: An OMPT-based tool for Fine-Grained Tracing of Tasks and Loops

Igor Wodiany, Andi Drebes, Richard Neill and Antoniu Pop

Co-designing OpenMP Programming Model Features with OMPT and Simulation

Matthew Baker, Oscar Hernandez and Jeffrey Young

12:00 Break
12:30 Papers VI: NUMA
sOMP: Simulating OpenMP Task-based Applications with NUMA Effects

Idriss Daoudi, Philippe Virouleau, Thierry Gautier, Samuel Thibault and Olivier Aumage

Virtflex: Automatic Adaptation to NUMA Topology Change for OpenMP Applications

Runhua Zhang, Alan L. Cox and Scott Rixner

13:30 Sponsor Tech Talk II

Thursday, September 24 | Program

All times are shown in US Central Daylight Time (CDT) zone.  Presentation Slides are provided at the discretion of the paper authors,
and will not be available until the presentation or shortly afterwards.

Time Presentation
08:00 Keynote III: 
Mark Papermaster, CTO AMD
09:00 Papers VII: Compilation Techniques
Using OpenMP to Detect and Speculate Dynamic DOALL Loops

Bruno Chinelato Honorio, João P. L. de Carvalho, Munir Skaf and Guido Araujo

ComPar: Optimized Multi-Compiler for Automatic OpenMP S2S Parallelization

Idan Mosseri, Lee-Or Alon, Re’Em Harel and Gal Oren

10:00 Break
10:30 Papers VIII: Heterogeneous Computing
OpenMP Device Offloading to FPGAs Using the Nymble Infrastructure

Jens Huthmann, Lukas Sommer, Artur Podobas, Andreas Koch and Kentaro Sano

Data Transfer and Reuse Analysis Tool for GPU-offloading Using OpenMP

Alok Mishra, Abid Malik and Barbara Chapman

Toward Supporting Multi-GPU Targets via Taskloop and User-defined Schedules

Vivek Kale, Wenbin Lu, Anthony Curtis, Abid Malik, Barbara Chapman and Oscar Hernandez

12:00 Break
12:30 Papers IX: Memory
Preliminary Experience with OpenMP Management Implementation Memory

Adrien Roussel, Patrick Carribault and Julien Jaeger

Memory Anomalies in OpenMP

Lechen Yu, Joachim Protze, Oscar Hernandez and Vivek Sarkar

13:30 Sponsor Tech Talk III