UBCO COSC370, BSPlib project: Week 2: Finding the largest number (in parallel!)
In this exercise you will write a BSPlib program, for finding the maximum in an input array of numbers.
1 Finding the maximum
The goal of this week's tasks is to write a BSPlib programs that finds the largest integer in a distributed array of size \(p*M\). In practice, the input data would have to be distributed: typically one processor reads the data from disk and distributes it to the other processors using a broadcast. For the sake of this example, we will generate the distributed array of integers locally using the rand-function.
The algorithm to be implemented consists of 3 supersteps. The first superstep does some setup. In the second superstep, processes finds their the maximum of their part of the distributed array and sends it to process 0. In the last superstep, process 0 finds the global maximum element by selecting the largest number in the array of received local maxima.
In more details, the algorithm should be implemented like this:
- (1: Setup)
- Each process allocates and fills an array
Inputwith \(M\) randomly chosen numbers. - Each process allocates an array
Maxsof length \(p\) (as returned bybsp_nprocs()(man)).- Since we do not know the value of
bsp_nprocs()before executing the program, we must dynamically allocate this array usingcalloc(man). Here is an example:
- Since we do not know the value of
- Each process allocates and fills an array
int *Maxs = calloc(bsp_nprocs(), sizeof(int));
- The call to
callocwill allocatebsp_nprocs()elements, each one of sizesizeof(int). That is, an array ofbsp_nprocs()integers.callocthen returns the adress of this allocation which is stored in the pointerMaxs. - The array
Maxswill be used to communicate local maximas into, and must thus be registered usingbsp_push_reg(man). SinceMaxsis a pointer, you should write something like the following:
bsp_push_reg(Maxs, bsp_nprocs() * sizeof(int));
meaning: register the address stored in the pointer Maxs and
the following bsp_nprocs() * sizeof(int) bytes.
- The superstep is terminated using
bsp_sync. - (2: Finding local maxima)
- Each process iterates over the array
Inputand selects the largest numbermax. - Each process sends the largest number
maxinto indexbsp_pid()(man) of the arrayMaxson process zero. - At this point, the registration of
Maxsis no longer needed, and can be removed usingbsp_pop_reg()(man). - The superstep is terminated using
bsp_sync.
- Each process iterates over the array
- (3: Finding global maximum)
Here is an illustration of a run of the algorithm with 4 processors and \(M=3\), i.e. 3 elements per processors:
For each of the tasks that you complete, please make one separate source-file (i.e. keep each separate solution). Don't worry if you haven't got time to finish all the tasks.
Task 1: Implement the program as described above. For inspiration, you can study the implementation of inner product in BSPedupack, which has a very similar structure. Verify that the algorithm works by having each process print its local array.
Task 2: As described above, each process allocates the array Maxs
of length \(p\). However, only processor 0 will actually store something
in this array. In this case little memory is lost, but other contexts
it might be interesting to try to save some space. Change the program
so that the size of Maxs is zero if bsp_pid() \(\neq 0\) and verify
that the program still works.
Task 3: (Ambitious). In the above description, the input data is generated locally which is not very realistic. Change the program so that the input array of \(p*M\) elements is stored in a text file, with numbers separated by spaces.
To read a file into a string, study fopen (man) and fgets (man). Once the file
has been read into a string, each separate number can be read using
strtok (man) and scanf (man).
Then in the first superstep, processor 0 reads this file into an array of size \(p*M\) and distributes it to the other processors. Here is the pseudo-code for performing the distribution:
// assuming the array Input has been registered in the previous superstep.
// assuming the array Read contains the p*M elements read from the input file
for i in 0 .. p*M - 1 :
// using bsp_put, send the one integer Read[i] to processor i/M,
into Input at offset i % M:
bsp_put( i / bsp_nprocs() , &Read[i] , Input, i % m, sizeof(int));
bsp_sync();
Now the array Read has been distributed, and each local part is
contained in the array Input. Then algorithm can then continue as
before.
–
Back to overview.