

## CUDA Technical Training

Volume I: Introduction to CUDA Programming

Prepared and Provided by NVIDIA

Q2 2008

## **Table of Contents**

| Section                               | Slide |
|---------------------------------------|-------|
| Introduction to GPU Computing         | 1     |
| CUDA Programming Model Overview       | 10    |
| CUDA Programming – The Basics         | 25    |
| Performance Optimization              | 56    |
| G8x Hardware                          | 62    |
| Memory Optimizations                  | 67    |
| Execution Configuration Optimizations |       |
| Instruction Optimizations             | 115   |
| CUDA Libraries                        |       |
| CUBLAS                                |       |
| CUFFT                                 | 142   |
| Additional CUDA Topics                | 157   |
| CUDA Texture Functionality            |       |
| CUDA Fortran Interoperability         |       |
| CUDA Event API                        |       |
| Device Management                     | 174   |
| CUDA Graphics Interoperability        | 176   |




















































































































































































































































| Floating Point Characteristics        |                                    |                                               |                                |                             |
|---------------------------------------|------------------------------------|-----------------------------------------------|--------------------------------|-----------------------------|
|                                       | G8x                                | SSE                                           | IBM Altivec                    | Cell SPE                    |
| Format                                | IEEE 754                           | IEEE 754                                      | IEEE 754                       | IEEE 754                    |
| Rounding modes for<br>FADD and FMUL   | Round to nearest and round to zero | All 4 IEEE, round to nearest, zero, inf, -inf | Round to nearest only          | Round to zero/truncate only |
| Denormal handling                     | Flush to zero                      | Supported,<br>1000's of cycles                | Supported,<br>1000's of cycles | Flush to zero               |
| NaN support                           | Yes                                | Yes                                           | Yes                            | No                          |
| Overflow and Infinity support         | Yes, only clamps to max norm       | Yes                                           | Yes                            | No, infinity                |
| Flags                                 | No                                 | Yes                                           | Yes                            | Some                        |
| Square root                           | Software only                      | Hardware                                      | Software only                  | Software only               |
| Division                              | Software only                      | Hardware                                      | Software only                  | Software only               |
| Reciprocal estimate accuracy          | 24 bit                             | 12 bit                                        | 12 bit                         | 12 bit                      |
| Reciprocal sqrt<br>estimate accuracy  | 23 bit                             | 12 bit                                        | 12 bit                         | 12 bit                      |
| log2(x) and 2^x<br>estimates accuracy | 23 bit                             | No                                            | 12 bit                         | No                          |
| © NVIDIA Corporation 2008 123         |                                    |                                               |                                |                             |






























































































| Pinned memory example                                                                                                                                                                                                                                                                                                                    |                                |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
| Pinned memory provides a fast PCI-e transfer speed and enables<br>•Allocation needs to be done with cudaMallocHost<br>•Use new Fortran 2003 features for interoperability with C.                                                                                                                                                        | use of streams:                |
| use iso_c_binding<br>! The allocation is performed by C function calls. Define the C pointer as type<br>type(C_PTR) :: cptr_A, cptr_B, cptr_C<br>! Define Fortran arrays as pointer.<br>real, dimension(:,:), pointer :: A, B, C                                                                                                         | (C_PTR)                        |
| <pre>! Allocating memory with cudaMallocHost.<br/>! The Fortan arrays, now defined as pointers, are then associated with the C p<br/>! new interoperability defined in iso_c_binding. This is equivalent to allocate(A<br/>res = cudaMallocHost ( cptr_A, m1*m1*sizeof(fp_kind) )<br/>call c_f_pointer ( cptr_A, A, (/ m1, m1 /) )</pre> | ointers using the<br>A(m1,m1)) |
| ! Use A as usual.<br>! See example code for cudaMallocHost interface code                                                                                                                                                                                                                                                                |                                |
| @ NVIDIA Connection 2008                                                                                                                                                                                                                                                                                                                 | 171                            |



















## Notice

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

## Trademarks

NVIDIA, the NVIDIA logo, CUDA and Tesla are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

## Copyright

© 2008 NVIDIA Corporation. All rights reserved.

