GPU Paralellism in HPC

Overview

Tutorial: 30 min

Learn how target GPUs using Numba.

Understand the difference between kernel functions and device functions.

We will use the GPU programming in Numba to accelerate our code.

Kernel Function

A kernel function is a GPU function called from CPU code that cannot return values directly. It also define how GPU threads hierarchy (threads, blocks and grids) is used.

@cuda.jit
def polar_to_cartesian(rho, theta):
    x = rho * math.cos(theta)
    y = rho * math.sin(theta)

Device Functions

Device functions are used to perform computations on the GPU, and they can be invoked from within other device functions or kernels. Unlike a kernel function, a device function can return a value like normal functions.

@cuda.jit(device=True)
def polar_to_cartesian(rho, theta):
    x = rho * math.cos(theta)
    y = rho * math.sin(theta)
    return x, y

@vectorize can also target GPU.

@cuda.jit(device=True)
def polar_to_cartesian(rho, theta):
    x = rho * math.cos(theta)
    y = rho * math.sin(theta)
    return x, y

@vectorize(['float32(float32, float32, float32, float32)'], target='cuda')
def polar_distance(rho1, theta1, rho2, theta2):
    x1, y1 = polar_to_cartesian(rho1, theta1)
    x2, y2 = polar_to_cartesian(rho2, theta2)

    return ((x1 - x2)**2 + (y1 - y2)**2)**0.5

Thread Indexing

When launching a kernel, you should also specify the thread arrangements.

@cuda.jit
def increment_a_2D_array(an_array):
    x, y = cuda.grid(2)
    if x < an_array.shape[0] and y < an_array.shape[1]:
       an_array[x, y] += 1

threadsperblock = (16, 16)
blockspergrid_x = math.ceil(an_array.shape[0] / threadsperblock[0])
blockspergrid_y = math.ceil(an_array.shape[1] / threadsperblock[1])
blockspergrid = (blockspergrid_x, blockspergrid_y)
increment_a_2D_array[blockspergrid, threadsperblock](an_array)

You can learn more about thread indexing in the tutorial Introduction to Parallel Programming Using Python .

Key Points

@vectorize can target GPUs.

Device functions can only be invoked from another device functions or kernel functions.