GPU Paralellism in HPC
Overview
Tutorial: 30 min
Learn how target GPUs using Numba.
Understand the difference between kernel functions and device functions.
We will use the GPU programming in Numba to accelerate our code.
Kernel Function
A kernel function is a GPU function called from CPU code that cannot return values directly. It also define how GPU threads hierarchy (threads, blocks and grids) is used.
1@cuda.jit
2def polar_to_cartesian(rho, theta):
3 x = rho * math.cos(theta)
4 y = rho * math.sin(theta)
Device Functions
Device functions are used to perform computations on the GPU, and they can be invoked from within other device functions or kernels. Unlike a kernel function, a device function can return a value like normal functions.
1@cuda.jit(device=True)
2def polar_to_cartesian(rho, theta):
3 x = rho * math.cos(theta)
4 y = rho * math.sin(theta)
5 return x, y
@vectorize can also target GPU.
1@cuda.jit(device=True)
2def polar_to_cartesian(rho, theta):
3 x = rho * math.cos(theta)
4 y = rho * math.sin(theta)
5 return x, y
6
7@vectorize(['float32(float32, float32, float32, float32)'], target='cuda')
8def polar_distance(rho1, theta1, rho2, theta2):
9 x1, y1 = polar_to_cartesian(rho1, theta1)
10 x2, y2 = polar_to_cartesian(rho2, theta2)
11
12 return ((x1 - x2)**2 + (y1 - y2)**2)**0.5
Thread Indexing
When launching a kernel, you should also specify the thread arrangements.
1@cuda.jit
2def increment_a_2D_array(an_array):
3 x, y = cuda.grid(2)
4 if x < an_array.shape[0] and y < an_array.shape[1]:
5 an_array[x, y] += 1
6
7threadsperblock = (16, 16)
8blockspergrid_x = math.ceil(an_array.shape[0] / threadsperblock[0])
9blockspergrid_y = math.ceil(an_array.shape[1] / threadsperblock[1])
10blockspergrid = (blockspergrid_x, blockspergrid_y)
11increment_a_2D_array[blockspergrid, threadsperblock](an_array)
You can learn more about thread indexing in the tutorial Introduction to Parallel Programming Using Python .
Key Points
@vectorize can target GPUs.
Device functions can only be invoked from another device functions or kernel functions.