CUDA Streams Usage#
This applies to the nvptx
plugin only.
The library provides elements that perform asynchronous movement of data and asynchronous operation of computing constructs. This asynchronous functionality is implemented by making use of CUDA streams [1].
The primary means by that the asynchronous functionality is accessed
is through the use of those OpenACC directives which make use of the
async
and wait
clauses. When the async
clause is
first used with a directive, it creates a CUDA stream. If an
async-argument
is used with the async
clause, then the
stream is associated with the specified async-argument
.
Following the creation of an association between a CUDA stream and the
async-argument
of an async
clause, both the wait
clause and the wait
directive can be used. When either the
clause or directive is used after stream creation, it creates a
rendezvous point whereby execution waits until all operations
associated with the async-argument
, that is, stream, have
completed.
Normally, the management of the streams that are created as a result of
using the async
clause, is done without any intervention by the
caller. This implies the association between the async-argument
and the CUDA stream will be maintained for the lifetime of the program.
However, this association can be changed through the use of the library
function acc_set_cuda_stream
. When the function
acc_set_cuda_stream
is called, the CUDA stream that was
originally associated with the async
clause will be destroyed.
Caution should be taken when changing the association as subsequent
references to the async-argument
refer to a different
CUDA stream.