September 2011
Michael
Creel
Universitat Autònoma de Barcelona
You can check
for more recent versions of this document at
http://pelicanhpc.org/Tutorial/PelicanTutorial.html
PelicanHPC is a rapid (~5 minutes, when you know what you're doing) means of setting up a "high performance computing" (HPC) cluster for parallel computing using MPI. This tutorial gives a basic description of what PelicanHPC does, addresses how to use the released CD images to set up a HPC cluster, and gives some basic examples of usage.
PelicanHPC is a distribution of GNU/Linux that runs as a "live
CD" or bootable USB image (it can also be booted from a
hard disk partition, or it can be used
as a virtualized OS). If the ISO image file is put on a CD or USB, it
can then be used to boot a computer. The computer on
which PelicanHPC is booted is referred to as the "frontend
node". It is the computer with which the user interacts. Once
PelicanHPC is running, a script - "pelican_setup" -
may be run. This script configures the frontend node as a netboot
server. After this has been done, other computers can boot copies of
PelicanHPC over the network. These other computers are referred to as
"compute nodes". PelicanHPC configures the cluster made up
of the frontend node and the compute nodes so that MPI-based parallel
computing may be done.
A "live CD" such as
PelicanHPC by default does not use the hard disks of any of the nodes
(except
Linux swap space, if it exists), so it will not destroy or alter your
installed operating system. When the PelicanHPC cluster is shut down,
all of the computers are in their original state, and will boot back
into whatever operating systems are installed on them. PelicanHPC can optionally be made to use hard
disk storage, so that its state can be preserved across boots. It can
be configured to boot without user intervention, with access possible
by ssh. There is
also the possibility of making the compute nodes boot using
wake-on-LAN. With these more advanced optional features, PelicanHPC can
be used to run a headless permanent cluster.
PelicanHPC
is made using Debian GNU/Linux
as its base, through the Debian
Live system. It is made by running a single script using the
command "sh make_pelican-v*".
Customized versions of PelicanHPC, for example, containing additional
packages, can easily be made by modifying the make_pelican script. The
make_pelican script and the needed packages are provided on PelicanHPC,
so you can build a custom image using the provided images. You can also
run make_pelican from any GNU/Linux distro if you install live-build
and a few other packages.
PelicanHPC is a CD image made by running a script (see below). The
script is licensed GPL v3. The resulting CD image contains software
from the Debian distribution of
GNU/Linux, and several other sources, which is subject to the licenses
chosen by the authors of
that software.
This released PelicanHPC CD images are
distributed in the hope that they will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.
The two main commands for administration of the cluster are
"pelican_setup", to configure the frontend as a server, NFS
export /home, etc., and "pelican_restarthpc", which is used
to add/remove nodes after the initial setup. The rest of this
explains how this works.
The frontend and all compute nodes
must be networked together. IMPORTANT: the frontend node will act as
a DHCP server, so be sure to isolate the network used for the cluster
from other networks, to avoid conflicts with other DHCP servers. If
you start handing out IP addresses to your co-workers' computers,
they may become annoyed. If the frontend node has multiple
network interfaces, you can use one to connect to the cluster and
another to connect to the Internet.
Put the CD in the computer
that will be the frontend, and turn it on. Make sure the BIOS setup
lets you boot from CD. When you boot up, you'll see something like
the following. Here, if you press <Tab>, you have the opportunity
to enter options to set
keyboard mappings, or special "cheatcodes" to make the CD
boot on problematic hardware. For example, I can get a Spanish keyboard
by pressing <Tab> and then adding "keyb=es" to the default
configuration. Either explore the options by highlighting the Help line
and pressing <Enter>, or edit options by pressing <Tab>, or
just press
<Enter> to boot up using default settings.

Once you boot up, eventually you see:

This
screen gives you the opportunity to use a permanent storage device
for the /home directory of the PelicanHPC user. By default, if you
just press <Enter>, hard disks are not used, and PelicanHPC
does not alter any of the computers used in the cluster. This is safe
and easy, but it has the disadvantage that any work you do disappears
when you shut down the cluster. To use permanent storage, you can
type in the name of a device (hard disk partition, USB drive, etc.)
that has a formatted ext2 or ext3 partition, which will be mounted at
/home. For example, you could replace "ram1" with "sda2"
or "hdb5" (no quotes). If you do this, a directory "user"
will be created at the root of the specified device, and will be used
as the home directory of the cluster user (username "user").
If you shut down the cluster, the directory will not be removed, and
it can be re-used when you restart PelicanHPC. If you have any doubts
about this, just press <Enter>. For casual experimentation, you
do not need this feature. This feature is provided as a convenience
for advanced users. It is impossible to test this feature on all
possible hardware configurations, so NO GUARANTEES ARE MADE THAT IT
WILL NOT DESTROY YOUR HARD DISK. Back up your data before trying
anything but the default.
IMPORTANT NOTE: there is another way to use permanent storage that
is quite flexible. This is documented in the file
/home/user/pelican_config, which you can see if you boot using the
default. If this is your first experience with PelicanHPC, I recommend
doing a default boot, study pelican_config, and then choose the option
for permanent storage that you find most appropriate.
Next, you will see

You will probably want to choose "yes", unless you are
re-using work you saved in a previous session.
Next, you are
prompted to change the default password:

You
should backspace to remove the default and then type in a new
password. This will be the password for user "user" on the
frontend node and on all of the compute nodes, too.
Finally, you are all booted up and the login prompt appears:

Enter the username "user" and then the password that you set a moment
ago.
Now you are logged in:


After
you choose the net device, services need to be started. Please read the
warning in the following screenshot. Setting up a PelicanHPC dhcp
server will get you in trouble with your network administrators if you
do this on an open network. You should make sure that the network
device used for the cluster is isolated from all networks except the
cluster. When you see
the following screen, choose "yes".

Next
you will see

Press
enter, and go turn on the compute nodes.
When a compute node
starts to netboot, you'll see this whiz by:

When
a compute node is done booting, you'll see this, supposing that it
has a monitor:

Back
on the frontend node, you see the following:

Once
a node has booted up, the count goes up:

Keep
hitting "no" until all of your compute nodes have booted
up. Once you click yes, you'll see something like the following,
depending on how many nodes you have.

Finally, a quick test of the cluster is run. You should see something
like the following:

OK,
that's it, the cluster is ready to use. Some other tips:
you can add software to the frontend node using "apt-get install whatever", supposing that the frontend has a second net card that you have configured to enable Internet access. This software is not available on the compute nodes. To add software so that it is available to all the nodes, it should be installed somewhere in /home/user.
the default MPI setup is in the file /home/user/tmp/bhosts. This assigns ranks to hosts in a round robin fashion. If your hosts have different speeds, numbers of cores, etc., you should modify this file. If the frontend node is virtual but the compute nodes are real, you should probably remove the frontend node from the calculations.
ksysguard is available, and a small amount of effort will turn it into a nice cluster monitor. See this post for general information on how to do it.
if you need other packages, then you can make your own version pretty easily using the make_pelican script that is available on the PelicanHPC homepage. This is explained (somewhat) below.
You can resize the cluster (add or remove compute nodes) whenever you like, by running "pelican_restarthpc".
IMPORTANT: In the
/home/user directory is the file pelican_config. This file
contains switches for advanced options that allow features such as use
of permanent storage, booting without user intervention, changing the
network of the cluster, wake-on-LAN, etc. Casual users do not need to
explore this, but people who want a permanent cluster should look at
it. It is self-documented.
PelicanHPC has the Linpack
HPL benchmark, and some extensive examples from the field
of econometrics that use GNU
Octave.
Econometrics is a field of study that applies statistical
methods to economic models. The software is in the Econometrics
directory:

There
is a document "econometrics.pdf" that has a lot of
information, including some about parallel computing:

Open
a terminal, type "octave" and then "kernel_example"
(please note that underscore back there, ... music please, maestro):

et
viola! some nice pictures:

That last picture screenshot shows the output of kernel_example.m if it
is run serially, on a single core. To see how to run it in parallel,
see the next shot. NOTE: the kernel
routines do no computations on rank 0 (it is used to gather the
results), so you must specify at least 2 MPI ranks.

Other
things to try are "bfgsmin_example",
"mle_example", "gmm_example", "mc_example" and a few others I'm
forgetting about. To
find where the code is, type "help mc_example", for
example, while in Octave. Then, go edit the relevant file to
learn
more about what it does. Or, while in Octave, type "edit
bfgsmin_example" (or edit whatever you like) and the file found in
Octave's path will open up in the vim editor.
Return to
contents
By default, PelicanHPC images put /home/user on a ramdisk which
disappears when you shut down. You need to save your work between
sessions, if you want to re-use it. There are many options, such as
mounting a hard disk, using a USB device, etc. If you have an
Internet connection configured, you can email it to yourself, as is
illustrated in the next shot:

If
you use PelicanHPC for serious work, it is very convenient to mount a
storage device to use as /home, so that your work will be saved
between sessions without taking any special steps. When you boot up
the frontend node, you have the option to select a storage device to
use. This is a feature for advanced users, and I strongly advise that
you dedicate a hard disk partition for use with
PelicanHPC. If you use a partition with other data on it, you should
make sure to back it up before using it with PelicanHPC! Only ext2 and
ext3 formats are known to work. This
feature has been tested using a very limited set of hardware, so use it
with caution. There is also the option to automatically mount a volume
that has a special name. See pelican_config in /home/user. This is the
best solution for users who want to use PelicanHPC on a long term basis.
Return
to contents
The distributed ISO images provide a bare bones cluster setup system, plus some packages that I use in my research and teaching. There are a few examples taken from my work, which may be of interest to those learning the basics of MPI, or to people interested in econometrics. However, many users will find that Pelican does not contain packages that they need. If one uses pelican_config properly, it is possible to give all nodes of the cluster internet access through the connection of the frontend node, so packages can be simply added using "apt-get". Nevertheless, some users will prefer to have a custom version of the CD image. PelicanHPC is made by running a single script "make_pelican" (with a version number appended), which is available on the download page, and also on the released images. If you have the prerequisites for running the script, it is very easy to make a customized version of Pelican. The prerequisites are installed on PelicanHPC, so you can build a custom version using the released version. The prerequisites are:
an installed version of GNU/Linux.
This can be a minimal installation in
a chroot jail or a virtual machine running under Linux, Windows or
MacOS. If you use a virtual machine, make sure to allocate several GB
of disk space.
the live-build package, version 2.x. Use the version available in Debian Stable: get it at http://packages.debian.org/squeeze/all/live-build/download. It is available as a .deb, and also as source code for use with other distributions. You also need the debootstrap, wget and rsync packages.