EXECUTION REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS
Abstract
Clusters of shared-memory symmetric multiprocessors are increasingly used for high performance computing. To exploit in a convenient way both the inner parallelism of nodes and the parallelism between nodes, programming models for communicating threads are being developed. However, most of these models result in programs exhibiting non-deterministic behavior. This makes cyclic debugging of programs impossible, unless an efficient execution replay system can be provided. This article describes such an execution replay system for distributed thread programming combining synchronization primitives for threads sharing the same node, with communication primitives for threads of different nodes. The execution replay system combines the most efficient trace size reduction technique for shared memory, based on the use of logical clocks, with a very efficient compression technique for trace data that originates from the test functions used in non-blocking communications.Downloads
Download data is not yet available.
Published
2012-03-01
How to Cite
Kergommeaux, J. C. de, Ronsse, M., & Bosschere, K. D. (2012). EXECUTION REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS. Computing and Informatics, 19(6), 511–526. Retrieved from http://147.213.75.17/ojs/index.php/cai/article/view/575
Issue
Section
Articles