Fine-grained Thread Execution Mechanism for GPGPU
While GPUs are successful in computing homogeneously parallel problems such as matrix calculation and image processing, there have been few attempts to apply GPUs to heterogeneously parallel problems. This project aims at developing fine-grained nested thread execution mechanisms for GPGPU so that we can easily write parallel computation by forking threads inside of GPU cores. With such mechanisms, wider variety of programs would be accelerated by GPUs.
As the first step, we propose a thread execution model based on DynaSOAr, a highly efficient parallel object allocator for GPGPU, and investigate its performance, overheads, optimizations and compilation techniques.