MPI Examples
The purpose of these exercises is to provide a set of examples that will help understand the basic ideas of MPI parallel programming by demonstrating the key features of message passing interface through these sample programs.
Contents
- List of Examples
- Simple MPI Codes
- Point-to-Point Communications
- Passing a message: MPI_Send; MPI_Recv
- Deadlock Situation
- Deadlock Situation – Fixed
- Ring (Blocking Communication)
- Ring (Non-blocking Communication)
- Simple Array Assignment
- Timing an MPI Code
- Pi Calculation
- Matrix Multiplication
- Collective Communications
- Documentation for MPI and Additional Resources
- Acknowledgments
List of Examples
| C | Fortran | C++ | Description |
|---|---|---|---|
|
Simple MPI Codes: |
|||
| hello.c | hello.f | hello.cpp | MPI Hello world! |
| hello_from.c | hello_from.f | hello_from.cpp | Hello From..! MPI_COMM_RANK |
|
Point-to-Point Communications: |
|||
| sendrecv.c | sendrecv.f | sendrecv.cpp | Passing a message: MPI_Send; MPI_Recv |
| deadlock.c | deadlock.f | deadlock.cpp | Deadlock Situation |
| deadlock_fix.c | deadlock_fix.f | deadlock_fix.cpp | Deadlock Situation – Fixed |
| ring_bl.c | ring_bl.f | Not yet available | Ring (Blocking Communication) |
| ring_nb.c | ring_nb.f | Not yet available | Ring (Non-blocking Communication) |
| array.c | array.f | Not yet available | Simple Array Assignment |
| timing.c | timing.f | Not yet available | MPI Communication Timing Test |
| pical.c | pical.f | Not yet available | Pi Calculation |
| matmul.c | matmul.f | Not yet available | Matrix Multiplication |
|
Collective Communications: |
|||
| collectives.c | collectives.f | Not yet available | MPI Collective Communication Functions |
Hello World 1 – MPI Hello world!
- The objective of this exercise is to demonstrate the most fundamental MPI calls.
- Examine the Hello World! program hello.c/hello.f/hello.cpp.
Notice that every process prints Hello World! and that the Hello World! program: - Includes a header,
- Initializes MPI,
- Prints a Hello World! message, and
- Finalizes MPI
- Running this program on 8 processors will assign a rank to each of them.
Then the program will output 8 lines depending on the rank of the process.
Hello World (from masternode) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node)
Contents
Hello World 2 – Hello From..! MPI_COMM_RANK
- This is just another simple example of how the ranks are assigned and can be
addressed later on in the code.
- Running this program on 8 processors will assign a rank to each of them.
Then the program will output 8 lines depending on the rank of the process.
Hello from 5. Hello from 1. Hello from 0. Hello from 4. Hello from 2. Hello from 7. Hello from 3. Hello from 6.
Contents
Passing a message: MPI_Send; MPI_Recv
- This example performs communication between two processors.
- Processor 0 sends a message Hello, World usind blocking MPI_Send
to processor 1, which receives this message using blocking receive MPI_Recv.
- Running this program on any number of processors will output 1 line similar
to the one below
Node 1 : Hello, World
Contents
Deadlock Situation
- This example shows improper use of blocking calls resulting in deadlock run
on two nodes - All tasks are simply waiting for events that haven’t been initiated
Contents
Deadlock Situation – Fixed
- This program is the solution showing the use of a non-blocking send to
eliminate deadlock - Running this program on 2 processors will have an output similar to the following
Task 1 has sent the message Task 0 has sent the message Task 1 has received the message Task 0 has received the message
Contents
Ring (Blocking Communication)
- This program allows a processor to communicate its rank around a ring.
- The sum of all ranks is then accumulated and printed out by each processor.
- Consider a set of processes arranged in a ring as shown below. Use a
token passing method to compute the sum of the ranks of the processes. - Each processor stores its rank in MPI_COMM_WORLD as an integer
and sends this value to the processor on its right. It then receives an
integer from its left neighbor. It keeps track of the sum of all the integers
received. The processors continue passing on the values they receive until
they get their own rank back. Each process should finish by printing out
the sum of the values. Use synchronous sends MPI_Ssend() (blocking)
or MPI_Issend() (non-blocking) for this program. Watch out for
deadlock situations. If you use non-blocking sends, make sure that you
do not overwrite information. You are asked to use synchronous message
passing because the standard send can be either buffered or synchronous,
and you should learn to program for either possibility. - Blocking Communication is used
- Running this program on 8 processors will have an output similar to the following
1 / \ 0 2 \ / 3
Figure 1: Four processors arranged in a ring. Messages are sent
from 0 to 1 to 2 to 3 to 0 again, sum of ranks is 6.
Proc 2 sum = 28 Proc 1 sum = 28 Proc 3 sum = 28 Proc 7 sum = 28 Proc 4 sum = 28 Proc 5 sum = 28 Proc 6 sum = 28 Proc 0 sum = 28
Contents
Ring (Non-blocking Communication)
- This program allows a processor to communicate its rank around a ring.
- The sum of all ranks is then accumulated and printed out by each processor.
- Consider a set of processes arranged in a ring as shown below. Use a
token passing method to compute the sum of the ranks of the processes. - Each processor stores its rank in MPI_COMM_WORLD as an integer
and sends this value to the processor on its right. It then receives an
integer from its left neighbor. It keeps track of the sum of all the integers
received. The processors continue passing on the values they receive until
they get their own rank back. Each process should finish by printing out
the sum of the values. Use synchronous sends MPI_Ssend() (blocking)
or MPI_Issend() (non-blocking) for this program. Watch out for
deadlock situations. If you use non-blocking sends, make sure that you
do not overwrite information. You are asked to use synchronous message
passing because the standard send can be either buffered or synchronous,
and you should learn to program for either possibility. - Non-blocking Communication is used
- Running this program on 8 processors will have an output similar to the following
1 / \ 0 2 \ / 3
Figure 1: Four processors arranged in a ring. Messages are sent
from 0 to 1 to 2 to 3 to 0 again, sum of ranks is 6.
Proc 6 sum = 28 Proc 2 sum = 28 Proc 3 sum = 28 Proc 0 sum = 28 Proc 4 sum = 28 Proc 5 sum = 28 Proc 1 sum = 28 Proc 7 sum = 28
Contents
Simple Array Assignment
- This is a simple array assignment used to demonstrate the distribution
of data among multiple tasks and the communications required to accomplish
that distribution. - The master distributes an equal portion of
the array to each worker. Each worker receives its portion of the array
and performs a simple value assignment to each of its elements. Each worker
then sends its portion of the array back to the master. As the master receives
a portion of the array from each worker, selected elements are displayed. - Note: For this example, the number of processes should
be set to an odd number (aprun -n 7), to ensure even distribution of the
array to numtasks-1 worker tasks. - Running this program on 7 processors will have an output similar to the following
*********** Starting MPI Example 1 ************ MASTER: number of worker tasks will be= 6 Sending to worker task 1 Sending to worker task 2 Sending to worker task 3 Sending to worker task 4 Sending to worker task 5 Sending to worker task 6 --------------------------------------------------- MASTER: Sample results from worker task = 1 result[0]=1.000000 result[100]=101.000000 result[1000]=1001.000000 --------------------------------------------------- MASTER: Sample results from worker task = 2 result[10000]=10001.000000 result[10100]=10101.000000 result[11000]=11001.000000 --------------------------------------------------- MASTER: Sample results from worker task = 3 result[20000]=20001.000000 result[20100]=20101.000000 result[21000]=21001.000000 --------------------------------------------------- MASTER: Sample results from worker task = 4 result[30000]=30001.000000 result[30100]=30101.000000 result[31000]=31001.000000 --------------------------------------------------- MASTER: Sample results from worker task = 5 result[40000]=40001.000000 result[40100]=40101.000000 result[41000]=41001.000000 --------------------------------------------------- MASTER: Sample results from worker task = 6 result[50000]=50001.000000 result[50100]=50101.000000 result[51000]=51001.000000 MASTER: All Done!
Contents
MPI Communication Timing Test
- The objective of this exercise is to investigate the amount of time required
for message passing between two processes, i.e. an MPI communication timing test
is performed. - In this exercise different size messages are sent back and forth
between two processes a number of times. Timings are made for each
message before it is sent and after it has been received. The difference
is computed to obtain the actual communication time. Finally, the average
communication time and the bandwidth are calculated and output to the
screen. - For example, one can run this code on two nodes (one process on each node)
passing messages of length 1, 100, 10,000, and 1,000,000 and record the results
in a table like the one below:
| Communication | ||
| Length | Time (μSec) | Bandwidth (Megabit/Sec) |
| 1 | 0.000001 | 65.440140 |
| 100 | 0.000002 | 2936.930591 |
| 10,000 | 0.000052 | 12321.465896 |
| 1,000,000 | 0.005133 | 12468.521884 |
Contents
Pi Calculation
- This program calculates π-number by integrating f(x) = 4 /(1+x^2) .
- Area under the curve is divided into rectangles and the rectangles are
distributed to the processors. - Running this program on 8 processors will have an output similar to the following
Process 2 of 8 on nid03631 Process 3 of 8 on nid03631 Process 0 of 8 on nid03631 Process 1 of 8 on nid03631 pi is approximately 3.1415926544231247, Error is 0.0000000008333316 wall clock time = 0.000421 Process 5 of 8 on nid03632 Process 4 of 8 on nid03632 Process 7 of 8 on nid03632 Process 6 of 8 on nid03632
Contents
Matrix Multiplication
- This example is a simple matrix multiplication program. AxB=C
- Matrix A is copied to every processor. Matrix B is divided into blocks and
distributed among processors - The data is distributed among the workers who
perform the actual multiplication in smaller blocks and send back their
results to the master. - Running this program on 8 processors will have an output similar to the following
Number of worker tasks = 7
sending 15 rows to task 1
sending 15 rows to task 2
sending 14 rows to task 3
sending 14 rows to task 4
sending 14 rows to task 5
sending 14 rows to task 6
sending 14 rows to task 7
Here are the first 30 rows of the result (C) matrix
0.00 1015.00 2030.00 3045.00 4060.00 5075.00 6090.00
0.00 1120.00 2240.00 3360.00 4480.00 5600.00 6720.00
0.00 1225.00 2450.00 3675.00 4900.00 6125.00 7350.00
0.00 1330.00 2660.00 3990.00 5320.00 6650.00 7980.00
0.00 1435.00 2870.00 4305.00 5740.00 7175.00 8610.00
0.00 1540.00 3080.00 4620.00 6160.00 7700.00 9240.00
0.00 1645.00 3290.00 4935.00 6580.00 8225.00 9870.00
0.00 1750.00 3500.00 5250.00 7000.00 8750.00 10500.00
0.00 1855.00 3710.00 5565.00 7420.00 9275.00 11130.00
0.00 1960.00 3920.00 5880.00 7840.00 9800.00 11760.00
0.00 2065.00 4130.00 6195.00 8260.00 10325.00 12390.00
0.00 2170.00 4340.00 6510.00 8680.00 10850.00 13020.00
0.00 2275.00 4550.00 6825.00 9100.00 11375.00 13650.00
0.00 2380.00 4760.00 7140.00 9520.00 11900.00 14280.00
0.00 2485.00 4970.00 7455.00 9940.00 12425.00 14910.00
0.00 2590.00 5180.00 7770.00 10360.00 12950.00 15540.00
0.00 2695.00 5390.00 8085.00 10780.00 13475.00 16170.00
0.00 2800.00 5600.00 8400.00 11200.00 14000.00 16800.00
0.00 2905.00 5810.00 8715.00 11620.00 14525.00 17430.00
0.00 3010.00 6020.00 9030.00 12040.00 15050.00 18060.00
0.00 3115.00 6230.00 9345.00 12460.00 15575.00 18690.00
0.00 3220.00 6440.00 9660.00 12880.00 16100.00 19320.00
0.00 3325.00 6650.00 9975.00 13300.00 16625.00 19950.00
0.00 3430.00 6860.00 10290.00 13720.00 17150.00 20580.00
0.00 3535.00 7070.00 10605.00 14140.00 17675.00 21210.00
0.00 3640.00 7280.00 10920.00 14560.00 18200.00 21840.00
0.00 3745.00 7490.00 11235.00 14980.00 18725.00 22470.00
0.00 3850.00 7700.00 11550.00 15400.00 19250.00 23100.00
0.00 3955.00 7910.00 11865.00 15820.00 19775.00 23730.00
0.00 4060.00 8120.00 12180.00 16240.00 20300.00 24360.00
Contents
MPI Collective Communication Functions
- This program illustartes the functionalities of the five basic MPI collective communication routines:
- Running this program on 4 processors will have an output similar to the following
MPI_Gather MPI_Allgather MPI_scatter MPI_Alltoall MPI_Bcast
* This program demonstrates the use of collective MPI functions * Four processors are to be used for the demo * Process 1 (of 0,1,2,3) is the designated root Function Proc Sendbuf Recvbuf -------- ---- ------- ------- MPI_Gather: 0 a MPI_Gather: 2 c MPI_Gather: 1 b a b c d MPI_Gather: 3 d MPI_Allgather: 1 b a b c d MPI_Allgather: 0 a a b c d MPI_Allgather: 2 c a b c d MPI_Allgather: 3 d a b c d MPI_scatter: 1 e f g h f MPI_scatter: 0 a b c d e MPI_scatter: 2 i j k l g MPI_scatter: 3 m n o p h MPI_Alltoall: 1 e f g h b f j n MPI_Alltoall: 2 i j k l c g k o MPI_Alltoall: 0 a b c d a e i m MPI_Alltoall: 3 m n o p d h l p MPI_Bcast: 1 b b MPI_Bcast: 2 b MPI_Bcast: 0 b MPI_Bcast: 3 b
Contents
Documentation for MPI and Additional Resources
- There are man pages available for MPI which should be
installed in your MANPATH. The following man pages have some introductory
information about MPI. - MPI man pages are also available online.
http://www.mcs.anl.gov/mpi/www/ - Main MPI web page at Argonne National Laboratory
http://www-unix.mcs.anl.gov/mpi - Set of guided exercises
http://www-unix.mcs.anl.gov/mpi/tutorial/mpiexmpl - MPI Forum home page contains the official copies of the MPI standard.
http://www.mpi-forum.org/
- Books on and about MPI
- Using MPI, 2nd Edition by William Gropp, Ewing Lusk, and Anthony Skjellum, published by MIT Press ISBN 0-262-57132-3.
The example programs from this book are available at ftp://ftp.mcs.anl.gov/pub/mpi/using/UsingMPI.tar.gz.
The Table of Contents is also available.
An errata for the book is available.
Information on the first edition of Using MPI is also available, including the errata.
Also of interest may be The LAM companion to “Using MPI…” by Zdzislaw Meglicki (gustav@arp.anu.edu.au). - Designing and Building Parallel Programs is Ian Foster’s online book that includes a chapter on MPI. It provides a succinct introduction to an MPI subset. (ISBN 0-201-57594-9; Published by Addison-Wesley>)
- MPI: The Complete Reference, by Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra, The MIT Press .
- MPI: The Complete Reference – 2nd Edition: Volume 2 – The MPI-2 Extensions, by William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir, The MIT Press.
- Parallel Programming With MPI, by Peter S. Pacheco, published by Morgan Kaufmann.
- RS/6000 SP: Practical MPI Programming, by Yukiya Aoyama and Jun Nakano (IBM Japan), and available as an IBM Redbook.
- Supercomputing Simplified: The Bare Necessities for Parallel C Programming with MPI,
by William B. Levy and Andrew G. Howe, ISBN: 978-0-9802-4210-2. See the website for more information.
- Using MPI, 2nd Edition by William Gropp, Ewing Lusk, and Anthony Skjellum, published by MIT Press ISBN 0-262-57132-3.
% man MPI % man cc % man ftn % man qsub % man MPI_Init % man MPI_Finalize
Contents
Acknowledgments
The original MPI training materials for workstations were developed under the Joint Information Systems Committee (JISC) New Technologies Initiative by the Training and Education Centre at Edinburgh Parallel Computing Centre (EPCC-TEC), University of Edinburgh, United Kingdom.
NCCS and NICS staff at UTK/ORNL and their resources
All contributors to the art and science of parallel computing
Contents
National Center for Computational Sciences
