A Sophomoric∗Introduction to
Shared-Memory Parallelism and Concurrency
Dan Grossman
Version of February23,2012
Contents
1Meta-Introduction:An Instructor’s View of The Notes2
1.1Where This Material Fits in a Changing Curriculum (2)
1.2Six Thes On A Successful Approach to this Material (3)
1.3How to U The Notes—And Improve Them (3)
1.4Acknowledgments (4)
2Introduction4
2.1More Than One Thing At Once (4)
2.2Parallelism vs.Concurrency (5)
2.3Basic Threads and Shared Memory (6)
2.4Other Models (9)
3Basic Fork-Join Parallelism10
3.1A Simple Example:Okay Idea,Inferior Style (10)
3.2Why Not To U One Thread Per Processor (14)
3.3Divide-And-Conquer Parallelism (15)
3.4The Java ForkJoin Framework (19)
3.5Reductions and Maps (22)
3.6Data Structures Besides Arrays (24)
4Analyzing Fork-Join Algorithms24
4.1Work and Span (25)
4.2Amdahl’s Law (27)
4.3Comparing Amdahl’s Law and Moore’s Law (28)
5Fancier Fork-Join Algorithms:Prefix,Pack,Sort28
与老妈的故事5.1Parallel-Prefix Sum (29)
5.2Pack (30)
5.3Parallel Quicksort (32)kt8000
5.4Parallel Mergesort (33)
∗As in intended for cond-year students,not as in immaturely pretentious
1
6Basic Shared-Memory Concurrency34
6.1The Programming Model (35)
6.2The Need for Synchronization (36)
6.3Locks (39)
6.4Locks in Java (40)
7Race Conditions:Bad Interleavings and Data Races43
7.1Bad Interleavings:An Example with Stacks (44)
7.2Data Races:Wrong Even When They Look Right (47)
8Concurrency Programming Guidelines50
8.1Conceptually Splitting Memory in Three Parts (50)
8.2Approaches to Synchronization (52)
9Deadlock56 10Additional Synchronization Primitives59
10.1Reader/Writer Locks (59)
10.2Condition Variables (60)
10.3Other (65)
1Meta-Introduction:An Instructor’s View of The Notes
1.1Where This Material Fits in a Changing Curriculum
The notes teach parallelism and concurrency as part of an advanced sophomore-level data-structures cour –the cour that covers asymptotic complexity,balanced trees,hash tables,graph algorithms,sorting,etc. Why parallelism and concurrency should be taught early:
Parallelism and concurrency are increasingly important topics in computer science and engineering.Traditionally, most undergraduates learned rather little about the topics and did so rather late in the curriculum:Senior-level operating-systems cours cover threads,scheduling,and synchronization.Early hardware cours have circuits and functional units with parallel parts.Advanc
ed architecture cours discuss cache coherence.Electives might cover parallel algorithms or u distributed computing to solve embarrassingly parallel tasks.
Little of this scattered material emphasizes the esntial concepts of parallelism and concurrency—and certainly not in a central place such that subquent cours can rely on it.The days,most desktop and laptop computers have multiple cores.Modern high-level languages have threads built into them and standard libraries u threads (e.g.,Java’s Swing library for GUIs).It no longer ems reasonable to bring up threads“as needed”or delay until an operating-systems cour that should be focusing on operating systems.There is no reason to introduce threadsfirst in C or asmbly when all the usual conveniences of high-level languages for introducing core concepts apply.
Why parallelism and concurrency should not be taught too early:
Converly,it is tempting to introduce threads“from day one”in introductory programming cours before stu-dents learn“quential habits.”I suspect this approach is infeasible in most curricula.For example,it may make little n in programs where most students in introductory cours do not end up majoring in computer science.“Messing with intro”is a high-risk endeavor,and introductory cours are already packed with esntial concepts like variables, functions,arrays,linked structures,etc.There is probably no room.
So put it in the data-structures cour:
There may be multiple natural places to introduce parallelism and concurrency,but I claim“sophomore-level”data structures(after CS2and discrete math,but before“nior-level”algorithms)works very well.Here are some reasons:
2
•There is room:We made room by removing three weeks on skew heaps,leftist heaps,binomial queues,splay trees,disjoint-t,and networkflow.Some of the trimming was painful and will be compensated for in nior-level algorithms,but all this material ems relatively less important.There was still plenty of room for esntial data structures and related concepts such as asymptotic analysis,(simple)amortization,graphs,and sorting.
•Fork-join parallel algorithms are amenable to asymptotic analysis in terms of“work”and“span”over dags—all concepts thatfit very naturally in the cour.Amdahl’s Law is fundamentally an asymptotic argument too.
•Ideas from quential algorithms already in the cour reappear with parallelism.For example,just a
s constant factors compel efficient quicksort implementations to switch to an O(n2)sort for small n,constant factors compel efficient parallel algorithms to switch to quential algorithms for small problem sizes.Parallel sorting algorithms are also good examples of non-trivial parallelization.
•Many canonical examples for concurrency involve basic data structures:bounded buffers for condition variables, dictionaries for reader/writer locks,parallel unstructured graph traversals,etc.Making a data structure“thread-safe”is an ideal way to think about what it means to be“thread-safe.”
•We already ud Java in the cour.Java7’s ForkJoin framework is excellent for teaching fork-join parallelism.
Java’s built-in support for threads,locks,and condition variables is sufficient for teaching concurrency.
On the other hand,instructors wishing to introduce message passing or distributed computation will have to consider whether it makes n in this cour.I focus on shared memory,only mentioning other models.I do not claim shared memory is“better”(that’s an endless argument on which I have nofirm opinion),only that it is an important model and a good one to start with pedagogically.I also do not emphasize asynchrony and masking I/O latency.Such topics are probably better covered in a co
菠萝蜜的籽能吃吗
ur on systems programming,though one could add them to the notes.
1.2Six Thes On A Successful Approach to this Material
In summary,the notes rest on veral thes for how to teach this material:
1.Integrate it into a data-structures cour.
2.Distinguish parallelism(using extra computational units to do more work per unit time)from concurrency
(managing access to shared resources).Teach parallelismfirst becau it is easier and helps establish a non-quential mindt.
3.Teach in a high-level language,using a library for fork-join parallelism.Teach how to u parallelism,threads,
陈伟南locks,etc.Do not teach how to implement them.
4.Converly,do not teach in terms of higher-order parallel patterns like maps and reduces.Mention the,but
have students actually do the divide-and-conquer underlying the patterns.
5.Assume shared memory since one programming model is hard enough.
6.Given the limited time and student background,do not focus on memory-hierarchy ,caching),much
like the issues are ,with B-trees)but rarely central in data-structures cours.(Adding a discussion should prove straightforward.)
If you strongly disagree with the thes,you will probably not like the notes—but you might still try them.
1.3How to U The Notes—And Improve Them
The notes were originally written for CSE332at the University of Washington
(www.cs.washington.edu/education/cours/c332).They account for3weeks of a required10-week cour (the University us a quarter system).Alongside the notes are PowerPoint slides,homework assignments,and a
幼儿教学课程3
programming project.In fact,the notes were the last aspect to be written—thefirst edition of the cour went great without them and students reported parallelism to be their favorite aspect of the cour.
Surely the notes have errors and explanations could be improved.Plea let me know of any problems youfind.
I am a perfectionist:if youfind a typo I would like to know.Ideally you wouldfirst check that the most recent version of the notes does not already have the problemfixed.
As for permission,do with the notes whatever you like.Seriously:“steal the notes.”The L A T E X sources are available so you can modify them however you like.I would like to know if you are using the notes and how.It motivates me to improve them and,frankly,it’s not bad for my ego.Constructive criticism is also welcome.That said, I don’t expect any thoughtful instructor to agree completely with me on what to cover and how to cover it.
Contact me at the email djg and then the at-sign and then cs.washington.edu.The current home for t
he notes and related materials is www.cs.washington.edu/homes/djg/teachingMaterials.Students are more than welcome to contact me:who better to let me know where the notes could be improved.
俄罗斯外汇储备1.4Acknowledgments
西塘住宿I derve no credit for the material in the notes.If anything,my role was simply to distill decades of wisdom from others down to three weeks of core concepts and integrate the result into a data-structures cour.When in doubt,I stuck with the basic and simplest topics and examples.
I was particularly influenced by Guy Blelloch and Charles Leirn in terms of teaching parallelism before con-currency and emphasizing divide-and-conquer algorithms that do not consider the number of processors.Doug Lea and other developers of Java’s ForkJoin framework provided a wonderful library that,with some hand-holding,is usable by sophomores.Larry Snyder was also an excellent resource for parallel algorithms.
The treatment of shared-memory synchronization is heavily influenced by decades of operating-systems cours, but with the distinction of ignoring all issues of scheduling and synchronization implementation.Moreover,the em-phasis on the need to avoid data races in high-level languages is fr
ustratingly under-appreciated despite the noble work of memory-model experts such as Sarita Adve,Hans Boehm,and Bill Pugh.
Feedback from Ruth Anderson,Kim Bruce,Kristian Lieberg,Tyler Robison,Cody Schroeder,and Martin Tompa helped improve explanations and remove typos.Tyler and Martin derve particular mention for using the notes when they were very new.
I have had enlightening and enjoyable discussions on“how to teach this stuff”with too many rearchers and educators over the last few years to list them all,but I am grateful.
This work was funded in part via grants from the National Science Foundation and generous support,financial and otherwi,from Intel.
2Introduction
2.1More Than One Thing At Once
In quential programming,one thing happens at a time.Sequential programming is what most people learnfirst and how most programs are written.Probably every program you have written in Java is quential:Execution starts at the beginning of main and proceeds one assignment/call/retur
n/arithmetic operation at a time.
Removing the one-thing-at-a-time assumption complicates writing software.The multiple threads of execution (things performing computations)will somehow need to coordinate so that they can work together to complete a task —or at least not get in each other’s way while they are doing parate things.The notes cover basic concepts related to multithreaded ,programs where there are multiple threads of execution.We will cover:
•How to create multiple threads
•How to write and analyze divide-and-conquer algorithms that u threads to produce results more quickly
•How to coordinate access to shared objects so that multiple threads using the same data do not produce the wrong answer
4
A uful analogy is with cooking.A quential program is like having one cook who does each step of a recipe in order,finishing one step before starting the next.Often there are multiple steps that could
be done at the same time—if you had more cooks.But having more cooks requires extra coordination.One cook may have to wait for another cook tofinish something.And there are limited resources:If you have only one oven,two cooks won’t be able to bake casroles at different temperatures at the same time.In short,multiple cooks prent efficiency opportunities,but also significantly complicate the process of producing a meal.
Becau multithreaded programming is so much more difficult,it is best to avoid it if you can.For most of computing’s history,most programmers wrote only quential programs.Notable exceptions were:
•Programmers writing programs to solve such computationally large problems that it would take years or cen-turies for one computer tofinish.So they would u multiple computers together.
•Programmers writing systems like an operating system where a key point of the system is to handle multiple things happening at once.For example,you can have more than one program running at a time.If you have only one processor,only one program can actually run at a time,but the operating system still us threads to keep track of all the running programs and let them take turns.If the taking turns happens fast ,10 milliconds),humans fall for the illusion of simultaneous execution.This is called time-slicing.
Sequential programmers were lucky:since every2years or so computers got roughly twice as fast,most programs would get exponentially faster over time without any extra effort.
Around2005,computers stopped getting twice as fast every2years.To understand why requires a cour in computer architecture.In brief,increasing the clock rate(very roughly and technically inaccurately speaking,how quickly instructions execute)became infeasible without generating too much heat.Also,the relative cost of memory access can become too high for faster processors to help.
Nonetheless,chip manufacturers still plan to make exponentially more powerful chips.Instead of one processor running faster,they will have more processors.The next computer you buy will likely have4processors(also called cores)on the same chip and the number of available cores will likely double every few years.
What would256cores be good for?Well,you can run multiple programs at once—for real,not just with time-slicing.But for an individual program to run any faster than with one core,it will need to do more than one thing at once.This is the reason that multithreaded programming is becoming more important.To be clear,multithreaded programming is not new.It has existed for decades and all the ke
y concepts the notes cover are just as old.Before there were multiple cores on one chip,you could u multiple chips and/or u time-slicing on one chip—and both remain important techniques today.The move to multiple cores on one chip is“just”having the effect of making multithreading something that more and more software wants to do.我是一名小学生
2.2Parallelism vs.Concurrency
The notes are organized around a fundamental distinction between parallelism and concurrency.Unfortunately,the way we define the terms is not entirely standard,so you should not assume that everyone us the terms as we will.Nonetheless,most computer scientists agree that this distinction is important.
Parallel programming is about using additional computational resources to produce an answer faster.
As a canonical example,consider the trivial problem of summing up all the numbers in an array.We know no quential algorithm can do better thanΘ(n)time.Suppo instead we had4processors.Then hopefully we could produce the result roughly4times faster by having each processor add1/4of the elements and then we could just add the4partial results together with3more additions.Θ(n/4)is still
Θ(n),but constant factors can matter.Moreover, when designing and analyzing a parallel algorithm,we should leave the number of processors as a variable,call it P. Perhaps we can sum the elements of an array in time O(n/P)given n processors.As we will e,in fact the best bound under the assumptions we will make is O(log n+n/P).
In terms of our cooking analogy,parallelism is about using extra cooks(or utensils or pans or whatever)to get a large mealfinished in less time.If you have a huge number of potatoes to slice,having more knives and people is really helpful,but at some point adding more people stops helping becau of all the communicating and coordinating
5