MengJia Yin
Wuhan University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Optimized Merge Sort on Modern Commodity Multi-core CPUs Ming Xu; Xianbin Xu; MengJia Yin; Fang Zheng
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 14, No 1: March 2016
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v14i1.2741

Abstract

Sorting is a kind of widely used basic algorithms. As the high performance computing devices are increasingly common, more and more modern commodity machines have the capability of parallel concurrent computing. A new implementation of sorting algorithms is proposed to harness the power of newer SIMD operations and multi-core computing provided by modern CPUs. The algorithm is hybrid by optimized bitonic sorting network and multi-way merge. New SIMD instructions provided by modern CPUs are used in the bitonic network implementation, which adopted a different method to arrange data so that the number of SIMD operations is reduced. Balanced binary trees are used in multi-way merge, which is also different with former implementations. Efforts are also paid on minimizing data moving in memory since merge sort is a kind of memory-bound application. The performance evaluation shows that the proposed algorithm is twice as fast as the sort function in C++ standard library when only single thread is used. It also outperforms radix sort implemented in Boost library.
A Hybrid Sorting Algorithm on Heterogeneous Architectures Ming Xu; Xianbin Xu; Fang Zheng; Yuanhua Yang; Mengjia Yin
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 13, No 4: December 2015
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v13i4.1896

Abstract

Nowadays high performance computing devices are more common than ever before. The capacity of main memories becomes very huge, CPUs get more cores and computing units that have greater performance. There are more and more machines get accelerators such as GPUs, too. Take full advantages of modern machines that use heterogeneous architectures to get higher performance solutions is a real challenge. There are so much literatures on only use CPUs or GPUs, however, research on algorithms that utilize heterogeneous architectures is comparatively few. In this paper, we propose a novel hybrid sorting algorithm that let CPU cooperate with GPU. To fully utilize computing capability of both CPU and GPU, we used SIMD intrinsic instructions to implement sorting kernels that run on CPU, and adopted radix sort kernels that implemented by CUDA(Compute Unified Device Architecture) that run on GPU. Performance evaluation is promising that our algorithm can sort one billion 32-bit float data in no more than 5 seconds.