Streams
Overview
Tutorial: 10 min
Learn how target GPUs streams using Numba.
Understand how to create and use CUDA streams in Numba.
Streams are sequences of operations that are executed in order on the GPU. Operations in different streams can run concurrently, allowing for parallel execution and better utilization of GPU resources. CUDA streams in Numba allow you to manage and execute multiple tasks concurrently on a GPU, enhancing performance by overlapping computation and data transfer operations.
1from numba import cuda
2import numpy as np
3
4# Define a simple kernel function
5@cuda.jit
6def add_kernel(a, b, c):
7 tx = cuda.threadIdx.x
8 ty = cuda.blockIdx.x
9 bw = cuda.blockDim.x
10
11 pos = tx + ty * bw
12
13 if pos < a.size:
14 c[pos] = a[pos] + b[pos]
15
16# Create two streams
17stream1 = cuda.stream()
18stream2 = cuda.stream()
19
20# Initialize data
21size = 1000000
22a_cpu = np.arange(size, dtype=np.float32)
23b_cpu = np.arange(size, dtype=np.float32) * 2
24c_cpu = np.zeros(size, dtype=np.float32)
25
26# Allocate device memory
27a_gpu = cuda.to_device(a_cpu)
28b_gpu = cuda.to_device(b_cpu)
29c_gpu = cuda.device_array(size, dtype=np.float32)
30
31# Define block and grid dimensions
32threads_per_block = 256
33blocks_per_grid = (size + (threads_per_block - 1)) // threads_per_block
34
35# Launch kernels in different streams
36add_kernel[blocks_per_grid, threads_per_block, stream1](a_gpu, b_gpu, c_gpu)
37add_kernel[blocks_per_grid, threads_per_block, stream2](b_gpu, c_gpu, a_gpu)
38
39# Wait for the streams to complete
40stream1.synchronize()
41stream2.synchronize()
42
43# Copy result back to host
44c_cpu = c_gpu.copy_to_host()
Key Points
Streams can be used to run concurrent operations in GPUs.
Numba allows you to create and manage CUDA streams for parallel execution.