Week One: Introduction to Heterogeneous
Computing, Overview of CUDA C, and Kernel-Based Parallel Programming, with lab tour
and programming assignment of vector addition in CUDA C.
Week Two: Memory Model for Locality, Tiling
for Conserving Memory Bandwidth, Handling Boundary Conditions, and Performance
Considerations, with programming assignment of simple matrix-matrix multiplication
in CUDA C.
Week Three: Parallel Convolution Pattern, with
programming assignment of tiled matrix-matrix multiplication in CUDA C.
Week Four: Parallel Scan Pattern, with
programming assignment of parallel convolution in CUDA C.
Week Five: Parallel Histogram Pattern and
Atomic Operations, with programming assignment of parallel scan in CUDA C.
Week Six: Data Transfer and Task
Parallelism, with programming assignment of parallel histogram in CUDA C.
Week Seven: Introduction to OpenCL,
Introduction to C++AMP, Introduction to OpenACC, with programming assignment of
vector addition using streams in CUDA C.
Week Eight: Course Summary,
Other Related Programming Models –Thrust, Bolt, and CUDA FORTRAN, with
programming assignment of simple matrix-matrix multiplication in choice of
OpenCL, C++AMP, or OpenACC.
Week Nine: complete any
remaining lab assignments, with optional, bonus programming assignments in choice
of OpenCL, C++AMP, or OpenACC.