# 27-1 Implementing parallel loops using nested parallelism

Consider the following multithreaded algorithm for performing pairwise addition on $n$-element arrays $A[1..n]$ and $B[1..n]$, storing the sums in $C[1..n]$:

SUM-ARRAYS(A, B, C)
parallel for i = 1 to A.length
C[i] = A[i] + B[i]


a. Rewrite the parallel loop in $\text{SUM-ARRAYS}$ using nested parallelism (spawn and sync) in the manner of $\text{MAT-VEC-MAIN-LOOP}$. Analyze the parallelism of your implementation.

Consider the following alternative implementation of the parallel loop, which contains a value $grain\text-size$ to be specified:

SUM-ARRAYS'(A, B, C)
n = A.length
grain-size = ?      // to be determined
r = ceil(n / grain-size)
for k = 0 to r - 1
spawn ADD-SUBARRAY(A, B, C, k * grain-size + 1, min((k + 1) * grain-size, n))
sync

ADD-SUBARRAY(A, B, C, i, j)
for k = i to j
C[k] = A[k] + B[k]


b. Suppose that we set $grain\text -size = 1$. What is the parallelism of this implementation?

c. Give a formula for the span of $\text{SUM-ARRAYS}'$ in terms of $n$ and $grain\text-size$. Derive the best value for grain-size to maximize parallelism.

a. See the algorithm $\text{SUM-ARRAYS}(A, B, C)$. The parallelism is $O(n)$ since it's work is $n\lg n$ and the span is $\lg n$.

b. If grainsize is $1$, this means that each call of $\text{ADD-SUBARRAY}$ just sums a single pair of numbers. This means that since the for loop on line 4 will run $n$ times, both the span and work will be $O(n)$. So, the parallelism is just $O(1)$.

SUM-ARRAYS(A, B, C)
n = floor(A.length / 2)
if n == 0
C = A + B
else
spawn SUM-ARRAYS(A[1..n], B[1..n], C[1..n])
SUM-ARRAYS(A[n + 1..A.length], B[n + 1..A..length], C[n + 1..A.length])


c. Let $g$ be the grainsize. The runtime of the function that spawns all the other functions is $\left\lceil \frac{n}{g} \right\rceil$. The runtime of any particular spawned task is $g$. So, we want to minimize

$$\frac{n}{g} + g.$$

To do this we pull out our freshman calculus hat and take a derivative, we have

$$0 = 1 − \frac{n}{g^2}.$$

To solve this, we set $g = \sqrt n$. This minimizes the quantity and makes the span $O(n / g + g) = O(\sqrt n)$. Resulting in a parallelism of $O(\sqrt n)$.