CUDA Streams Usage#

This applies to the nvptx plugin only.

The library provides elements that perform asynchronous movement of data and asynchronous operation of computing constructs. This asynchronous functionality is implemented by making use of CUDA streams [1].

The primary means by that the asynchronous functionality is accessed is through the use of those OpenACC directives which make use of the async and wait clauses. When the async clause is first used with a directive, it creates a CUDA stream. If an async-argument is used with the async clause, then the stream is associated with the specified async-argument.

Following the creation of an association between a CUDA stream and the async-argument of an async clause, both the wait clause and the wait directive can be used. When either the clause or directive is used after stream creation, it creates a rendezvous point whereby execution waits until all operations associated with the async-argument, that is, stream, have completed.

Normally, the management of the streams that are created as a result of using the async clause, is done without any intervention by the caller. This implies the association between the async-argument and the CUDA stream will be maintained for the lifetime of the program. However, this association can be changed through the use of the library function acc_set_cuda_stream. When the function acc_set_cuda_stream is called, the CUDA stream that was originally associated with the async clause will be destroyed. Caution should be taken when changing the association as subsequent references to the async-argument refer to a different CUDA stream.