Shared-Memory Parallelism Can be Simple, Fast, and Scalable , livre ebook

Association for Computing Machinery and Morgan & Claypool Publishers - Julian Shun

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

287 pages

English

Vous pourrez modifier la taille du texte de cet ouvrage

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with high-level tools to enable them to develop solutions easily, and at the same time emphasize the theoretical and practical aspects of algorithm design to allow the solutions developed to run efficiently under many different settings. This thesis addresses this challenge using a three-pronged approach consisting of the design of shared-memory programming techniques, frameworks, and algorithms for important problems in computing. The thesis provides evidence that with appropriate programming techniques, frameworks, and algorithms, shared-memory programs can be simple, fast, and scalable, both in theory and in practice. The results developed in this thesis serve to ease the transition into the multicore era.

The first part of this thesis introduces tools and techniques for deterministic parallel programming, including means for encapsulating nondeterminism via powerful commutative building blocks, as well as a novel framework for executing sequential iterative loops in parallel, which lead to deterministic parallel algorithms that are efficient both in theory and in practice. The second part of this thesis introduces Ligra, the first high-level shared memory framework for parallel graph traversal algorithms. The framework allows programmers to express graph traversal algorithms using very short and concise code, delivers performance competitive with that of highly-optimized code, and is up to orders of magnitude faster than existing systems designed for distributed memory. This part of the thesis also introduces Ligra+, which extends Ligra with graph compression techniques to reduce space usage and improve parallel performance at the same time, and is also the first graph processing system to support in-memory graph compression.

The third and fourth parts of this thesis bridge the gap between theory and practice in parallel algorithm design by introducing the first algorithms for a variety of important problems on graphs and strings that are efficient both in theory and in practice. For example, the thesis develops the first linear-work and polylogarithmic-depth algorithms for suffix tree construction and graph connectivity that are also practical, as well as a work-efficient, polylogarithmic-depth, and cache-efficient shared-memory algorithm for triangle computations that achieves a 2–5x speedup over the best existing algorithms on 40 cores.

This is a revised version of the thesis that won the 2015 ACM Doctoral Dissertation Award.

Table of Contents: Introduction / Preliminaries and Notation / Programming Techniques for Deterministic Parallelism / Internally Deterministic Parallelism: Techniques and Algorithms / Deterministic Parallelism in Sequential Iterative Algorithms / A Deterministic Phase-Concurrent Parallel Hash Table / Priority Updates: A Contention-Reducing Primitive for Deterministic Programming / Large-Scale Shared-Memory Graph Analytics / Ligra: A Lightweight Graph Processing Framework for Shared Memory / Ligra+: Adding Compression to Ligra / Parallel Graph Algorithms / Linear-Work Parallel Graph Connectivity / Parallel and Cache-Oblivious Triangle Computations / Parallel String Algorithms / Parallel Cartesian Tree and Suffix Tree Construction / Parallel Computation of Longest Common Prefixes / Parallel Lempel-Ziv Factorization / Parallel Wavelet Tree Construction / Conclusion and Future Work / Bibliography

Sujets

Sciences formelles

Informatique théorique

Algorithmes

Microprocesseur multi-c?ur

Multicore

Informatique

Computers

Computer Architecture

Informations

Publié par	Association for Computing Machinery and Morgan & Claypool Publishers
Date de parution	01 juin 2017
Nombre de lectures	1
EAN13	9781970001907
Langue	English
Poids de l'ouvrage	1 Mo

Extrait

Shared-Memory Parallelism Can Be Simple, Fast, and Scalable
ACM Books
Editor in Chief
M. Tamer zsu, University of Waterloo
ACM Books is a new series of high-quality books for the computer science community, published by ACM in collaboration with Morgan Claypool Publishers. ACM Books publications are widely distributed in both print and digital formats through booksellers and to libraries (and library consortia) and individual ACM members via the ACM Digital Library platform.
Shared-Memory Parallelism Can Be Simple, Fast, and Scalable
Julian Shun, University of California, Berkeley
2017
Computational Prediction of Protein Complexes from Protein Interaction Networks
Sriganesh Srihari, The University of Queensland Institute for Molecular Bioscience
Chern Han Yong, Duke-National University of Singapore Medical School
Limsoon Wong, National University of Singapore
2017
The Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations
Editors: Sharon Oviatt, Incaa Designs
Bj rn Schuller, University of Passau and Imperial College London
Philip R. Cohen, Voicebox Technologies
Daniel Sonntag, German Research Center for Artificial Intelligence (DFKI)
Gerasimos Potamianos, University of Thessaly
Antonio Kr ger, German Research Center for Artificial Intelligence (DFKI)
2017
Communities of Computing: Computer Science and Society in the ACM
Thomas J. Misa, Editor, University of Minnesota
2017
Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining
ChengXiang Zhai, University of Illinois at Urbana-Champaign
Sean Massung, University of Illinois at Urbana-Champaign
2016
An Architecture for Fast and General Data Processing on Large Clusters
Matei Zaharia, Stanford University
2016
Reactive Internet Programming: State Chart XML in Action
Franck Barbier, University of Pau, France
2016
Verified Functional Programming in Agda
Aaron Stump, The University of Iowa
2016
The VR Book: Human-Centered Design for Virtual Reality
Jason Jerald, NextGen Interactions
2016
Ada s Legacy: Cultures of Computing from the Victorian to the Digital Age
Robin Hammerman, Stevens Institute of Technology
Andrew L. Russell, Stevens Institute of Technology
2016
Edmund Berkeley and the Social Responsibility of Computer Professionals
Bernadette Longo, New Jersey Institute of Technology
2015
Candidate Multilinear Maps
Sanjam Garg, University of California, Berkeley
2015
Smarter Than Their Machines: Oral Histories of Pioneers in Interactive Computing
John Cullinane, Northeastern University; Mossavar-Rahmani Center for Business and Government, John F. Kennedy School of Government, Harvard University
2015
A Framework for Scientific Discovery through Video Games
Seth Cooper, University of Washington
2014
Trust Extension as a Mechanism for Secure Code Execution on Commodity Computers
Bryan Jeffrey Parno, Microsoft Research
2014
Embracing Interference in Wireless Systems
Shyamnath Gollakota, University of Washington
2014
Shared-Memory Parallelism Can Be Simple, Fast, and Scalable
Julian Shun
University of California, Berkeley
ACM Books 15
Copyright 2017 by the Association for Computing Machinery and Morgan Claypool Publishers
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means-electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews-without the prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Claypool is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.
Shared-Memory Parallelism Can Be Simple, Fast, and Scalable
Julian Shun
books.acm.org
www.morganclaypoolpublishers.com
ISBN: 978-1-97000-191-4 hardcover
ISBN: 978-1-97000-188-4 paperback
ISBN: 978-1-97000-189-1 ebook
ISBN: 978-1-97000-190-7 ePub
Series ISSN: 2374-6769 print 2374-6777 electronic
DOIs:
10.1145/3018787 Book
10.1145/3018787.3018788 Preface
10.1145/3018787.3018789 Chapter 1
10.1145/3018787.3018790 Chapter 2
10.1145/3018787.3018791 Chapter 3
10.1145/3018787.3018792 Chapter 4
10.1145/3018787.3018793 Chapter 5
10.1145/3018787.3018794 Chapter 6
10.1145/3018787.3018795 Chapter 7
10.1145/3018787.3018796 Chapter 8
10.1145/3018787.3018797 Chapter 9
10.1145/3018787.3018798 Chapter 10
10.1145/3018787.3018799 Chapter 11
10.1145/3018787.3018800 Chapter 12
10.1145/3018787.3018801 Chapter 13
10.1145/3018787.3018802 Chapter 14
10.1145/3018787.3018803 Chapter 15
10.1145/3018787.3018804 References
A publication in the ACM Books series, 15
Editor in Chief: M. Tamer zsu, University of Waterloo
First Edition
10 9 8 7 6 5 4 3 2 1
To my family
Contents

Preface
Chapter 1
Introduction

1.1 Shared-Memory Programming

1.2 Shared-Memory Algorithm Design

1.3 Shared-Memory Performance

1.4 The Problem Based Benchmark Suite

1.5 Contributions of this Book
Chapter 2
Preliminaries and Notation

2.1 Parallel Programming Model

2.2 Algorithmic Complexity Model

2.3 Parallel Primitives

2.4 Graphs

2.5 Strings

2.6 Problem Definitions

2.7 Experimental Environment
PART I
PROGRAMMING TECHNIQUES FOR DETERMINISTIC PARALLELISM
Chapter 3
Internally Deterministic Parallelism: Techniques and Algorithms

3.1 Introduction

3.2 Programming Model

3.3 Commutative Building Blocks

3.4 Internally Deterministic Parallel Algorithms

3.5 Experimental Results
Chapter 4
Deterministic Parallelism in Sequential Iterative Algorithms

4.1 Introduction

4.2 Analysis Tools

4.3 Algorithmic Design Techniques

4.4 Maximal Independent Set

4.5 Maximal Matching

4.6 Random Permutation

4.7 List Contraction

4.8 Tree Contraction

4.9 Limited Randomness

4.10 Experiments
Chapter 5
A Deterministic Phase-Concurrent Parallel Hash Table

5.1 Introduction

5.2 Related Work

5.3 Preliminaries

5.4 Deterministic Phase-Concurrent Hash Table

5.5 Applications

5.6 Experiments
Chapter 6
Priority Updates: A Contention-Reducing Primitive for Deterministic Programming

6.1 Introduction

6.2 Priority Updates

6.3 Contention in Shared Memory Operations

6.4 Applications of Priority Update

6.5 Experiment Study: Applications
PART II
LARGE-SCALE SHARED-MEMORY GRAPH ANALYTICS
Chapter 7
Ligra: A Lightweight Graph Processing Framework for Shared Memory

7.1 Introduction

7.2 Related Work

7.3 Framework

7.4 Applications

7.5 Experiments
Chapter 8
Ligra+: Adding Compression to Ligra

8.1 Introduction

8.2 Previous Work

8.3 Ligra+ Implementation

8.4 Experiments
PART III
PARALLEL GRAPH ALGORITHMS
Chapter 9
Linear-Work Parallel Graph Connectivity

9.1 Introduction

9.2 Linear-Work Low-Diameter Decomposition

9.3 Linear-Work Connectivity

9.4 Implementation Details

9.5 Experiments
Chapter 10
Parallel and Cache-Oblivious Triangle Computations

10.1 Introduction

10.2 Preliminaries

10.3 Triangle Counting

10.4 Exact Triangle Counting

10.5 Approximate Triangle Counting

10.6 Extensions

10.7 Evaluation

10.8 Parallelization of the Pagh-Silvestri Algorithm

10.9 Prior and Related Work
PART IV
PARALLEL STRING ALGORITHMS
Chapter 11
Parallel Cartesian Tree and Suffix Tree Construction

11.1 Introduction

11.2 Preliminaries

11.3 Parallel Cartesian Trees

11.4 Cartesian Trees and the ANSV Problem

11.5 Experiments
Chapter 12
Parallel Computation of Longest Common Prefixes

12.1 Introduction

12.2 Preliminaries

12.3 Algorithms and Analysis

12.4 Experiments
Chapter 13
Parallel Lempel-Ziv Factorization

13.1 Introduction

13.2 Preliminaries

13.3 Parallel Lempel-Ziv Factorization Algorithm

13.4 Implementations

13.5 Experiments
Chapter 14
Parallel Wavelet Tree Construction

14.1 Introduction

14.2 Preliminaries

14.3 Related Work

14.4 Parallel Wavelet Tree Construction

14.5 Experiments

14.6 Parallel Construction of Rank/Select Structures

14.7 Extensions
Chapter 15
Conclusion and Future Work

15.1 Summary

15.2 Future Work

References

Index

Author s Biography
Preface
Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with high-level tools to enable them to develop solutions easily, and at the same time emphasize the theoretical and practical aspects of algorithm design to allow the solutions developed to run efficiently under many different settings. This book addresses this challenge using a three-pronged approach consisting of the design of shared-memory programming techniques, frameworks, and algorithms for important problems in computing. The book provides evidence that with appropriate programming techniques, frameworks, and algorithms, shared-memory programs can be simple, fast, and scalable, both in theory and practice. The results developed in this book serve to ease the transition into the multi-core era.
The first part of this book introduces tools and techniques for deterministic parallel programming, including means for encapsulating nondeterminism via powerful commutative building blocks, as well as a novel framework for executing sequential iterative loops in parallel, which lead to deterministic parallel algorithms that are efficient both in theory and practice.
The secon