7. Launch kernels that do real work
Write kernels that run thousands of threads, pass data to the device, launch work, and copy results back. You will build vector addition and simple array transforms while learning launch configuration and error checking.