The smallest computational unit in CUDA is a thread that runs on a scalar processor. This thread must be associated with one processor in the AGM. Further, the set of threads is combined into a computing unit, which is executed independently of other blocks on its multiprocessor. Ah, because AGM was developed on the basis of real graphic multiprocessors, then this computing unit must be associated with AGM. When developing a parallel computing kernel in CUDA, the processing of a single block is taken into account, because from one block it is not possible to change the calculation data from another.
Copyrights © 2022