Optimization of a Parallel CFD Code and Its Performance Evaluation on Tianhe-1A

Authors

  • Yonggang Che Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan
  • Lilun Zhang Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan
  • Chuanfu Xu Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan
  • Yongxian Wang Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan
  • Wei Liu Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan
  • Zhenghua Wang Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan

Keywords:

Computational fluid dynamics, slightly compressible model, large-scale parallel computing, uniprocessor optimizations, in memory grid exchange, scalability, efficiency

Abstract

This paper describes performance tuning experiences with a parallel CFD code to enhance its performance and flexibility on large scale parallel computers. The code solves the incompressible Navier-Stokes equations based on the novel Slightly Compressible Model on three-dimensional structure grids. High level loop transformations and argument based code specialization are utilized to optimize its uniprocessor performance. Static arrays are converted into dynamically allocated arrays to improve the flexibility. The grid generator is coupled with the flow solver so that they can exchange grid data in the memory. A detailed performance evaluation is performed. The results show that our uniprocessor optimizations improve the performance of the flow solver for 1.38 times to 3.93 times on Tianhe-1A supercomputer. In memory grid data exchange optimization speeds up the application startup time by nearly two magnitudes. The optimized code exhibits an excellent parallel scalability running realistic test cases. On 4 096 CPU cores, it achieves a strong scaling parallel efficiency of 77.39 % and a maximum performance of 4.01 Tflops.

Downloads

Download data is not yet available.

Downloads

Published

2015-02-10

How to Cite

Che, Y., Zhang, L., Xu, C., Wang, Y., Liu, W., & Wang, Z. (2015). Optimization of a Parallel CFD Code and Its Performance Evaluation on Tianhe-1A. Computing and Informatics, 33(6), 1377–1399. Retrieved from http://147.213.75.17/ojs/index.php/cai/article/view/1393