MPI Examples

The purpose of these exercises is to provide a set of examples that will help understand the basic ideas of MPI parallel programming by demonstrating the key features of message passing interface through these sample programs.


Contents

List of Examples


C Fortran C++ Description

Simple MPI Codes:

hello.c hello.f hello.cpp MPI Hello world!
hello_from.c hello_from.f hello_from.cpp Hello From..! MPI_COMM_RANK

Point-to-Point Communications:

sendrecv.c sendrecv.f sendrecv.cpp Passing a message: MPI_Send; MPI_Recv
deadlock.c deadlock.f deadlock.cpp Deadlock Situation
deadlock_fix.c deadlock_fix.f deadlock_fix.cpp Deadlock Situation – Fixed
ring_bl.c ring_bl.f Not yet available Ring (Blocking Communication)
ring_nb.c ring_nb.f Not yet available Ring (Non-blocking Communication)
array.c array.f Not yet available Simple Array Assignment
timing.c timing.f Not yet available MPI Communication Timing Test
pical.c pical.f Not yet available Pi Calculation
matmul.c matmul.f Not yet available Matrix Multiplication

Collective Communications:

collectives.c collectives.f Not yet available MPI Collective Communication Functions

Contents

Hello World 1 – MPI Hello world!



  1. The objective of this exercise is to demonstrate the most fundamental MPI calls.
  2. Examine the Hello World! program hello.c/hello.f/hello.cpp.
    Notice that every process prints Hello World! and that the Hello World! program:


    1. Includes a header,

    2. Initializes MPI,

    3. Prints a Hello World! message, and

    4. Finalizes MPI


  3. Running this program on 8 processors will assign a rank to each of them.
    Then the program will output 8 lines depending on the rank of the process.
  4. Hello World (from masternode)
    Hello WORLD!!! (from worker node)
    Hello WORLD!!! (from worker node)
    Hello WORLD!!! (from worker node)
    Hello WORLD!!! (from worker node)
    Hello WORLD!!! (from worker node)
    Hello WORLD!!! (from worker node)
    Hello WORLD!!! (from worker node)


Contents

Hello World 2 – Hello From..! MPI_COMM_RANK



  1. This is just another simple example of how the ranks are assigned and can be
    addressed later on in the code.
  2. Running this program on 8 processors will assign a rank to each of them.
    Then the program will output 8 lines depending on the rank of the process.
  3. Hello from 5.
    Hello from 1.
    Hello from 0.
    Hello from 4.
    Hello from 2.
    Hello from 7.
    Hello from 3.
    Hello from 6. 


Contents

Passing a message: MPI_Send; MPI_Recv



  1. This example performs communication between two processors.

  2. Processor 0 sends a message Hello, World usind blocking MPI_Send
    to processor 1, which receives this message using blocking receive MPI_Recv.
  3. Running this program on any number of processors will output 1 line similar
    to the one below
  4. Node 1 : Hello, World


Contents

Deadlock Situation



  1. This example shows improper use of blocking calls resulting in deadlock run
    on two nodes

  2. All tasks are simply waiting for events that haven’t been initiated


Contents

Deadlock Situation – Fixed



  1. This program is the solution showing the use of a non-blocking send to
    eliminate deadlock

  2. Running this program on 2 processors will have an output similar to the following
  3.  Task             1  has sent the message
     Task             0  has sent the message
     Task            1  has received the message
     Task            0  has received the message 


Contents

Ring (Blocking Communication)



  1. This program allows a processor to communicate its rank around a ring.

  2. The sum of all ranks is then accumulated and printed out by each processor.

  3. Consider a set of processes arranged in a ring as shown below. Use a
    token passing method to compute the sum of the ranks of the processes.

  4.    1
     /   \
    0     2
     \   /
       3
    

    Figure 1: Four processors arranged in a ring. Messages are sent
    from 0 to 1 to 2 to 3 to 0 again, sum of ranks is 6.


  5. Each processor stores its rank in MPI_COMM_WORLD as an integer
    and sends this value to the processor on its right. It then receives an
    integer from its left neighbor. It keeps track of the sum of all the integers
    received. The processors continue passing on the values they receive until
    they get their own rank back. Each process should finish by printing out
    the sum of the values. Use synchronous sends MPI_Ssend() (blocking)
    or MPI_Issend() (non-blocking) for this program. Watch out for
    deadlock situations. If you use non-blocking sends, make sure that you
    do not overwrite information. You are asked to use synchronous message
    passing because the standard send can be either buffered or synchronous,
    and you should learn to program for either possibility.

  6. Blocking Communication is used

  7. Running this program on 8 processors will have an output similar to the following
  8. Proc 2 sum = 28
    Proc 1 sum = 28
    Proc 3 sum = 28
    Proc 7 sum = 28
    Proc 4 sum = 28
    Proc 5 sum = 28
    Proc 6 sum = 28
    Proc 0 sum = 28


Contents

Ring (Non-blocking Communication)



  1. This program allows a processor to communicate its rank around a ring.

  2. The sum of all ranks is then accumulated and printed out by each processor.

  3. Consider a set of processes arranged in a ring as shown below. Use a
    token passing method to compute the sum of the ranks of the processes.

  4.    1
     /   \
    0     2
     \   /
       3
    

    Figure 1: Four processors arranged in a ring. Messages are sent
    from 0 to 1 to 2 to 3 to 0 again, sum of ranks is 6.


  5. Each processor stores its rank in MPI_COMM_WORLD as an integer
    and sends this value to the processor on its right. It then receives an
    integer from its left neighbor. It keeps track of the sum of all the integers
    received. The processors continue passing on the values they receive until
    they get their own rank back. Each process should finish by printing out
    the sum of the values. Use synchronous sends MPI_Ssend() (blocking)
    or MPI_Issend() (non-blocking) for this program. Watch out for
    deadlock situations. If you use non-blocking sends, make sure that you
    do not overwrite information. You are asked to use synchronous message
    passing because the standard send can be either buffered or synchronous,
    and you should learn to program for either possibility.

  6. Non-blocking Communication is used

  7. Running this program on 8 processors will have an output similar to the following
  8. Proc 6 sum = 28
    Proc 2 sum = 28
    Proc 3 sum = 28
    Proc 0 sum = 28
    Proc 4 sum = 28
    Proc 5 sum = 28
    Proc 1 sum = 28
    Proc 7 sum = 28


Contents

Simple Array Assignment



  1. This is a simple array assignment used to demonstrate the distribution
    of data among multiple tasks and the communications required to accomplish
    that distribution.

  2. The master distributes an equal portion of
    the array to each worker. Each worker receives its portion of the array
    and performs a simple value assignment to each of its elements. Each worker
    then sends its portion of the array back to the master. As the master receives
    a portion of the array from each worker, selected elements are displayed.

  3. Note: For this example, the number of processes should
    be set to an odd number (aprun -n 7), to ensure even distribution of the
    array to numtasks-1 worker tasks.

  4. Running this program on 7 processors will have an output similar to the following
  5. *********** Starting MPI Example 1 ************
    MASTER: number of worker tasks will be= 6
    Sending to worker task 1
    Sending to worker task 2
    Sending to worker task 3
    Sending to worker task 4
    Sending to worker task 5
    Sending to worker task 6
    ---------------------------------------------------
    MASTER: Sample results from worker task = 1
       result[0]=1.000000
       result[100]=101.000000
       result[1000]=1001.000000
    
    ---------------------------------------------------
    MASTER: Sample results from worker task = 2
       result[10000]=10001.000000
       result[10100]=10101.000000
       result[11000]=11001.000000
    
    ---------------------------------------------------
    MASTER: Sample results from worker task = 3
       result[20000]=20001.000000
       result[20100]=20101.000000
       result[21000]=21001.000000
    
    ---------------------------------------------------
    MASTER: Sample results from worker task = 4
       result[30000]=30001.000000
       result[30100]=30101.000000
       result[31000]=31001.000000
    
    ---------------------------------------------------
    MASTER: Sample results from worker task = 5
       result[40000]=40001.000000
       result[40100]=40101.000000
       result[41000]=41001.000000
    
    ---------------------------------------------------
    MASTER: Sample results from worker task = 6
       result[50000]=50001.000000
       result[50100]=50101.000000
       result[51000]=51001.000000
    
    MASTER: All Done! 


Contents

MPI Communication Timing Test



  1. The objective of this exercise is to investigate the amount of time required
    for message passing between two processes, i.e. an MPI communication timing test
    is performed.

  2. In this exercise different size messages are sent back and forth
    between two processes a number of times. Timings are made for each
    message before it is sent and after it has been received. The difference
    is computed to obtain the actual communication time. Finally, the average
    communication time and the bandwidth are calculated and output to the
    screen.

  3. For example, one can run this code on two nodes (one process on each node)
    passing messages of length 1, 100, 10,000, and 1,000,000 and record the results
    in a table like the one below:





























  4.   Communication
    Length Time

    (μSec)
    Bandwidth

    (Megabit/Sec)
    1 0.000001 65.440140
    100 0.000002 2936.930591
    10,000 0.000052 12321.465896
    1,000,000 0.005133 12468.521884


Contents

Pi Calculation



  1. This program calculates π-number by integrating f(x) = 4 /(1+x^2) .

  2. Area under the curve is divided into rectangles and the rectangles are
    distributed to the processors.

  3. Running this program on 8 processors will have an output similar to the following
  4. Process 2 of 8 on nid03631
    Process 3 of 8 on nid03631
    Process 0 of 8 on nid03631
    Process 1 of 8 on nid03631
    pi is approximately 3.1415926544231247, Error is 0.0000000008333316
    wall clock time = 0.000421
    Process 5 of 8 on nid03632
    Process 4 of 8 on nid03632
    Process 7 of 8 on nid03632
    Process 6 of 8 on nid03632
    


Contents

Matrix Multiplication



  1. This example is a simple matrix multiplication program. AxB=C

  2. Matrix A is copied to every processor. Matrix B is divided into blocks and
    distributed among processors

  3. The data is distributed among the workers who
    perform the actual multiplication in smaller blocks and send back their
    results to the master.

  4. Running this program on 8 processors will have an output similar to the following
  5. Number of worker tasks = 7
       sending 15 rows to task 1
       sending 15 rows to task 2
       sending 14 rows to task 3
       sending 14 rows to task 4
       sending 14 rows to task 5
       sending 14 rows to task 6
       sending 14 rows to task 7
    Here are the first 30 rows of the result (C) matrix
    
        0.00   1015.00   2030.00   3045.00   4060.00   5075.00   6090.00
        0.00   1120.00   2240.00   3360.00   4480.00   5600.00   6720.00
        0.00   1225.00   2450.00   3675.00   4900.00   6125.00   7350.00
        0.00   1330.00   2660.00   3990.00   5320.00   6650.00   7980.00
        0.00   1435.00   2870.00   4305.00   5740.00   7175.00   8610.00
        0.00   1540.00   3080.00   4620.00   6160.00   7700.00   9240.00
        0.00   1645.00   3290.00   4935.00   6580.00   8225.00   9870.00
        0.00   1750.00   3500.00   5250.00   7000.00   8750.00  10500.00
        0.00   1855.00   3710.00   5565.00   7420.00   9275.00  11130.00
        0.00   1960.00   3920.00   5880.00   7840.00   9800.00  11760.00
        0.00   2065.00   4130.00   6195.00   8260.00  10325.00  12390.00
        0.00   2170.00   4340.00   6510.00   8680.00  10850.00  13020.00
        0.00   2275.00   4550.00   6825.00   9100.00  11375.00  13650.00
        0.00   2380.00   4760.00   7140.00   9520.00  11900.00  14280.00
        0.00   2485.00   4970.00   7455.00   9940.00  12425.00  14910.00
        0.00   2590.00   5180.00   7770.00  10360.00  12950.00  15540.00
        0.00   2695.00   5390.00   8085.00  10780.00  13475.00  16170.00
        0.00   2800.00   5600.00   8400.00  11200.00  14000.00  16800.00
        0.00   2905.00   5810.00   8715.00  11620.00  14525.00  17430.00
        0.00   3010.00   6020.00   9030.00  12040.00  15050.00  18060.00
        0.00   3115.00   6230.00   9345.00  12460.00  15575.00  18690.00
        0.00   3220.00   6440.00   9660.00  12880.00  16100.00  19320.00
        0.00   3325.00   6650.00   9975.00  13300.00  16625.00  19950.00
        0.00   3430.00   6860.00  10290.00  13720.00  17150.00  20580.00
        0.00   3535.00   7070.00  10605.00  14140.00  17675.00  21210.00
        0.00   3640.00   7280.00  10920.00  14560.00  18200.00  21840.00
        0.00   3745.00   7490.00  11235.00  14980.00  18725.00  22470.00
        0.00   3850.00   7700.00  11550.00  15400.00  19250.00  23100.00
        0.00   3955.00   7910.00  11865.00  15820.00  19775.00  23730.00
        0.00   4060.00   8120.00  12180.00  16240.00  20300.00  24360.00
    


Contents

MPI Collective Communication Functions



  1. This program illustartes the functionalities of the five basic MPI collective communication routines:
  2. MPI_Gather
    MPI_Allgather
    MPI_scatter
    MPI_Alltoall
    MPI_Bcast
    

  3. Running this program on 4 processors will have an output similar to the following
  4.  * This program demonstrates the use of collective MPI functions
     * Four processors are to be used for the demo
     * Process 1 (of 0,1,2,3) is the designated root
    
       Function      Proc      Sendbuf         Recvbuf
       --------      ----      -------         -------
    MPI_Gather:         0   a
    MPI_Gather:         2   c
    MPI_Gather:         1   b                 a   b   c   d
    MPI_Gather:         3   d
    MPI_Allgather:      1   b                 a   b   c   d
    MPI_Allgather:      0   a                 a   b   c   d
    MPI_Allgather:      2   c                 a   b   c   d
    MPI_Allgather:      3   d                 a   b   c   d
    MPI_scatter:        1   e   f   g   h     f
    MPI_scatter:        0   a   b   c   d     e
    MPI_scatter:        2   i   j   k   l     g
    MPI_scatter:        3   m   n   o   p     h
    MPI_Alltoall:       1   e   f   g   h     b   f   j   n
    MPI_Alltoall:       2   i   j   k   l     c   g   k   o
    MPI_Alltoall:       0   a   b   c   d     a   e   i   m
    MPI_Alltoall:       3   m   n   o   p     d   h   l   p
    MPI_Bcast:          1   b                 b
    MPI_Bcast:          2                     b
    MPI_Bcast:          0                     b
    MPI_Bcast:          3                     b
    


Contents

Documentation for MPI and Additional Resources



  1. There are man pages available for MPI which should be
    installed in your MANPATH. The following man pages have some introductory
    information about MPI.
  2. % man MPI
    % man cc
    % man ftn
    % man qsub
    % man MPI_Init
    % man MPI_Finalize
    

  3. MPI man pages are also available online.
    http://www.mcs.anl.gov/mpi/www/

  4. Main MPI web page at Argonne National Laboratory
    http://www-unix.mcs.anl.gov/mpi

  5. Set of guided exercises
    http://www-unix.mcs.anl.gov/mpi/tutorial/mpiexmpl

  6. MPI Forum home page contains the official copies of the MPI standard.

    http://www.mpi-forum.org/

  7. Books on and about MPI


Contents

Acknowledgments



    The original MPI training materials for workstations were developed under the Joint Information Systems Committee (JISC) New Technologies Initiative by the Training and Education Centre at Edinburgh Parallel Computing Centre (EPCC-TEC), University of Edinburgh, United Kingdom.


    NCCS and NICS staff at UTK/ORNL and their resources


    All contributors to the art and science of parallel computing



Contents

National Center for Computational Sciences