Abstract:
Memory fragmentation is a widely studied problem of dynamic memory allocators. It is well known that fragmentation can lead to premature out-of-memory errors and poor cache performance. With the recent emergence of dynamic memory allocators for SIMD accelerators, memory fragmentation is becoming an increasingly important problem on such architectures. Nevertheless, it has received little attention so far. Memory-bound applications on SIMD architectures such as GPUs can experience an additional slowdown due to less efficient vector load/store instructions. We propose CompactGpu, an incremental, fully-parallel, in-place memory defragmentation system for GPUs. CompactGpu is an extension to the DynaSOAr dynamic memory allocator and defragments the heap in a fully parallel fashion by merging partly occupied memory blocks. We developed several implementation techniques for memory defragmentation that are efficient on SIMD/GPU architectures, such as finding defragmentation block candidates and fast pointer rewriting based on bitmaps. Benchmarks indicate that our implementation is very fast with typically higher performance gains than compaction overheads. It can also decrease the overall memory usage.
Reference:
Massively Parallel GPU Memory Compaction (Matthias Springer and Hidehiko Masuhara), In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM 2019) (Harry Xu, ed.), 2019.
Bibtex Entry:
@inproceedings{springer2019ismm,
author = {Matthias Springer and Hidehiko Masuhara},
title = {Massively Parallel {GPU} Memory Compaction},
booktitle = {Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM 2019)},
year = 2019,
editor = {Harry Xu},
location = {Phoenix, AZ},
isbn = {978-1-4503-6722-6},
pages = {14--26},
numpages = {13},
url = {http://doi.acm.org/10.1145/3315573.3329979},
doi = {10.1145/3315573.3329979},
keywords = {GPUs, dynamic allocation, fragmentation},
month = jun,
date = {2019-06-23},
keywords = {CUDA, Ikra, DynaSOAr, C++},
organization = {{ACM}},
pdf = {ismm2019.pdf},
abstract = {Memory fragmentation is a widely studied problem of dynamic memory allocators. It is well known that fragmentation can lead to premature out-of-memory errors and poor cache performance.
With the recent emergence of dynamic memory allocators for SIMD accelerators, memory fragmentation is becoming an increasingly important problem on such architectures. Nevertheless, it has received little attention so far. Memory-bound applications on SIMD architectures such as GPUs can experience an additional slowdown due to less efficient vector load/store instructions.
We propose CompactGpu, an incremental, fully-parallel, in-place memory defragmentation system for GPUs. CompactGpu is an extension to the DynaSOAr dynamic memory allocator and defragments the heap in a fully parallel fashion by merging partly occupied memory blocks. We developed several implementation techniques for memory defragmentation that are efficient on SIMD/GPU architectures, such as finding defragmentation block candidates and fast pointer rewriting based on bitmaps.
Benchmarks indicate that our implementation is very fast with typically higher performance gains than compaction overheads. It can also decrease the overall memory usage.}
}