1.2 Race Conditions

This section builds on the simple fork-join and SPMD patterns and introduces the notion of a race condition. By the end of this section, you should be able to define what a race condition is, identify race conditions in sample code, and extrapolate some strategies for fixing them.

1.2.1 Estimating the Area Under a Curve

As our next example, let’s look at the problem of estimating the area under a curve. If you have taken a calculus course, you may recognize this problem as the Riemann sum or Trapezoidal, which approximates the area under the curve (i.e. the integral) by splitting the area under the curve into a series of trapezoids.

In the following programs, we attempt to use the trapezoid rule to approximate the integral

\[\int_0^{\pi} sin(x)_{dx}\]

using \(2^{20}\) equal subdivisions. The answer from this computation should be 2.0. The following video shows how a single thread would solve this problem:

In this example, the single thread serially computes the area of each trapezoid and adds all the trapezoids together to one value.

A C implementation of this program may look like the following:

Let’s now consider how we can use multiple threads to approximate the area under the curve in parallel. One strategy would be to assign each thread a subset of the total set of subdivisions, so that each thread separately computes its assigned set of trapezoids.

The following video illustrates how 4 threads would work together to approximate the area under the curve:

1.2.2 Parallel Integration - First Attempt

One of the advantages of OpenMP is the ability to incrementally add parallelism to a program. Using what we learned about the fork-join pattern and available pragams in the last section, let’s update the serial version of the integration program with OpenMP pragmas:

Our parallel implementation adds just two lines the serial code. First, we include the header file <omp.h>, in order to access all the functions available to us in the OpenMP library. The second line is the inclusion of the omp parallel for pragma on line 26.

Recall that the omp parallel for pragma combines the functionality of the omp parallel and omp for pragmas we covered in the last section. Specifically, the omp parallel for pragma:

  • creates a team of threads

  • assigns each thread a subset of iterations of the for loop

  • joins the threads back into a single threaded process at the end of the for loop.

For a machine running 4 threads, each thread receives n/4 (in this case 262,144) iterations of the for loop, with each thread running its subset of the for loop in parallel. The omp parallel for pragma has some additional clauses. The private(i) clause states that the variable i is private to each thread. In other words, each thread has its own copy of variable i. In contrast, the shared(a, n, h, integral) clause specifies that the variables a, n, h, and integral are shared amongst the threads. In other words, there is exactly one copy of the a, n, h, and integral variables, and all the threads have equal access to them.

If our program is parallelized correctly, the program should estimate the area under the curve as 2.00, which would be identical to the output of the serial program.

1.2.3 Returning to the Array Example

While some may find it tempting to point the finger at task parallelism for our incorrect result, the truth is that we can get an incorrect result even when employing the SPMD pattern. Suppose we modify the populating array problem to one of array addition. In other words, instead of simply populating an array with random values, we are adding up the values in the array in parallel.

Here is a modified code snippet with the omp parallel for pragma placed in the correct place but commented out. Since the sum of n elements from \(1 \ldots n\) is \(\frac{n(n+1)}{2}\), we know the sum when n is 20 million is a really long number: 200,000,010,000,000.

Here is an updated code snippet that has the pragma around the sum commented out. Running it should confirm that the sum is the long value shown above.

1.2.4 Race Conditions and Critical Sections

To understand what is going on, let’s use an analogy to describe the process of adding an array in parallel.

To understand what is going on, we need to define a few new terms, specifically race condition critical section, and lock. Watch the following video to learn what these terms mean:

Let’s watch another video that explains how these mechanisms can help fix the issue in our program:

The omp critical pragma allows a programmer to define a critical section. At an initial glance, it is tempting to place the omp critical pragma around the statement sum+=array. However, since the critical section should be as small as possible, its necessary to separate the summing of the array elements from the update to the shared sum variable.

The following program does just that:

If you are having trouble understanding how the sum+=local_sum statement on line 34 translates to a read-modify-write pattern, recall that a single line of code often translates to multiple instructions in assembly. The line can also be written as sum = sum + local_sum. This single line of code involves:

  • reading the sum variable (and local_sum variable)

  • adding the values of sum and local_sum together, and

  • writing the total to the sum variable.

Lastly, observe that fixing the race condition added several lines to our program. One way to fix this is to use a new pragma called omp atomic. The omp atomic pragma forces the program to treat the enclosed section as being atomic, or something that is executed without interruption.

The following program illustrates how the omp atomic pragma can be used to shorten the program:

In the next section, we cover a technique called reduction that offers another alternative.

You have attempted of activities on this page