b78242cc38
Add a port of cpi.c to CUDA. Use a GPU kernel to compute the partial areas at each process, then sum them with a final MPI_Reduce from device memory into CPU memory. This is intended to be used as smoke test for functioning GPU support.