We prototyped our proposal on top of a Java parallel programming framework, andĮvaluated it from a performance perspective, against cache neglectful domain decompositions. The application’s data representation and manipulation. With this, the programmer has only to reason about Accordingly, the programmer’s responsibilities areĬonfined to the definition of procedures for decomposing an application’s domain, intoĪn arbitrary number of partitions. Realizing these limitations, the goal of this thesis is to delegate these hierarchy-aware Performance portability across a wide range of architectures. These, however, lack the desired abstraction level, forcing the programmer to haveĭeep knowledge of computer architecture and parallel programming, in order to ensure This problem has been recognized by the community, which has proposed languagesĪnd models to express and tune applications according to the underlying machine’s hierarchy. This oblivion prevents such applications from fully harnessing the computing Application development is typically oblivious of thisĬomplexity and diversity, taking only into consideration the number of available executionĬores. The latter, in particular, differĬonsiderably between the many processors currently available in the market, resulting inĪ wide variety of configurations. The architecture of nowadays’ processors is very complex, comprising several computationalĬores and an intricate hierarchy of cache memories. ![]() Experimental results indicate that the hierarchical model not only provides greater expressive power but also enhances performance, all three benchmarks exceed the performance of the standard UPC implementations after being incrementally enhanced with hierarchical parallelism. This paper presents a detailed description of proposed approaches and demonstrates their effectiveness in the context of the NAS Parallel Benchmarks and the Unbalanced Tree Search (UTS). The first approach orchestrates computations on multiple sets of thread groups, the second approach extends UPC with nested, shared memory multi-threading. In this paper, we explore two explicit hierarchical programming approaches based on UPC to improve programmability and performance on hierarchical architectures. While UPC provides a welcome improvement over message passing libraries, users still program with a single level of parallelism in the context of SPMD. Partitioned Global Address Space (PGAS) languages such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a globally shared address space with locality awareness. This makes the use of richer execution models imperative in order to fully exploit hierarchical parallelism. ![]() However, the execution model with single-level parallelism embodied in legacy parallel programming models falls short in exploiting the multi-level parallelism opportunities in both hardware architectures and applications. High-Performance Computing (HPC) systems are increasingly moving towards an architecture that is deeply hierarchical.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |