GPU Programming Workshop

The overwhelming interest in the GPU Hackathon hosted at Brookhaven National Laboratory (www.bnl.gov/gpuhackathon) makes it clear that there is a strong demand for hands-on training for GPU programming. We will partner with NVIDIA and in particular its PGI compiler team to organize a 3-day hands-on GPU programming workshop/mini-Hackathon for teams that were not admitted to the Brookhaven Hackathon. 

Registration required.  Click here to register. 

 

DAY 1: The first day will consist of introductory lectures given by PGI compiler engineer Matt Colgrove,NVIDIA and is open to the public.

DAY 2 & 3: The 2nd and 3rd days will be spent on the Hackathon application teams' codes. GPU experts will be present to answer questions and assist the teams to get started with their GPU porting efforts.

 

Agenda for Day 1 - (all times EST):

8:30 - 9:00 - Morning assembly/greetings - coffee

9:00 - 10:30 - Day 1, Lesson 1

10:30 - 10:45 Break

10:45 - 12:00 Day 1, Lesson 2

12:00 - 1:00; Lunch

1:00 - 2:45 Day 1, Lesson 3

2:45 - 3:00 Break

3:00 - 5:00 Day 1, Lesson 4

Topics to be covered across lessons:

OpenACC Training Agenda

  • 1) Introduction to accelerated computing

  • a) Your first OpenACC program
  • b) Simple OpenACC program in C, Fortran, C++
  • c) Data management
  • d) Compute placement
  • e) Building the program
  • f) Compiler feedback
  • g) Running the program
  • 2) Parallel and Loop directives

  • a) Gang, worker, vector concepts
  • b) Parallel construct clauses
  • c) Kernels construct
  • d) Loop directive and loop clauses
  • e) Reduction clause
  • f) Collapse clause
  • g) Independent clause
  • h) Cache directive
  • 3) Data directives, data lifetimes

  • a) Predetermined private
  • b) Implicit private
  • c) Implicit and explicit attributes
  • d) Present and Present_or_ clauses
  • e) default(none)
  • f) Private and firstprivate
  • g) Unstructured or dynamic data lifetimes
  • h) Host_data construct
  • i) Update directive
  • 4) Optimization strategies

  • a) Performance measurement
  • b) Collapse clause
  • c) Strides, data structures
  • d) Async clause for data, update, compute
  • e) Wait directive
  • f) Wait with async
  • g) Inlining Procedure Calls
  • h) The Routine directive
  • i) Atomic operations
  • 5) C++ and issues related to deeply nested structs

  • a) pgc++ and GNU compatibility
  • b) "this" pointers, data members, class methods
  • c) seq routine auto-generation
  • d) PGI's acc_attach() extension
  • e) The future of deep copy
  • f) The future of NVIDIA unified memory
  • 6) CUDA-like interoperability

  • a) C with CUDA C example, Fortran with CUDA Fortran example
  • b) Deviceptr clause
  • c) API routines to manage device data
  • d) CUDA-specific API routines
  • e) Interaction and integration with CUDA C and CUDA Fortran
  • f) Interfacing with CUDA libraries from both host and device
  • 7) PGI Specific Features

  • a) Multiple device support, deviceid clause and API routines
  • b) Host as a device
  • c) Multiple threads
  • d) Command line options, shortloop clause
  • e) Low-level optimization
  • f) Fortran 90 API routines
  • g) C support for multi-dimension (float**) arrays

Days 2 & 3 will nominally be held from 9am - 5pm each day

Actual start times and end times may vary on day of.

Speaker

Matthew Colgrove, NVIDIA

Date

Monday, June 26, 2017 to Wednesday, June 28, 2017

Time

Mon-8a-5p, Tues-10:30a-5p, Wed-9a-5p

Location

IACS Seminar Room