SPDCP


SPDCP is a parallel Lustre aware copy tool. The tool can be used to copy large datasets between Lustre filesystems.

General utilities such as cp and mv are not Lustre aware and do not maintain Lustre attributes.

Because spdcp is Lustre aware, it can be used to transfer directory trees preserving file and directory Lustre attributes.

Availability


SPDCP is currently available through the spdcp module (module load spdcp) on the XT4 and XT5 partition of the NCCS Cray XT, Jaguar.


Note:The tool should not be used to transfer data from non-Lustre filesystems. Transfers between non-Lustre filesystem will fail.

top




Use


spdcp <options> source destination

The system’s compute processors will be used by spdcp to copy a dataset in parallel. spdcp can access compute processors through the following:

Command Line
spdcp can be executed from the command line of a login node. In this case, a batch script will be submitted to allocate the required number of compute nodes.
Examples
Batch Script
SPDCP can be executed from within a batch script. In this case, the tool will use the allocated resources to copy the dataset.
Examples
Interactive Batch Job
SPDCP can be executed from within a interactive batch job. In this case, the tool will use the allocated resources to copy the dataset.
Examples



top




Common Options



Option Description
-A <accountID> If submitting a batch job, causes the job time to be charged to <account>. The account string XXXYYY is typically composed of three letters followed by three digits and optionally followed by a subproject identifier. The utility showproj can be used to list your valid assigned project ID(s). This option is required by all jobs.
-k If submitting a batch job, store the batch job’s standard out and error to the .spdcp directory in the submitting user’s home directory.
-q <queue> If submitting a batch job, submit job to given queue.
-R Recursive copy.
-s <clients> Number of parallel clients to be used in move. If submitting batch job, request specified number of cores. Under production load, we recommend using one core per OST.
-v Add verbosity.
-w <seconds> If submitting a batch job, request given number of seconds as maximum wall-clock time. Note: Providing walltime seconds outside the batch limits will result in a segmentation fault.

More information can be found through the spdcp man page ( man spdcp ).

top




Examples


Command Line

The following will transfer srcdir to destdir using 8 cores.

  • A batch script will be submitted to the debug queue requesting 8 cores with a maximum walltime of 15 minutes against project ABC123.
  • The -v option can be used to view the batch submission command line. It will also poll the batch system returning once the job exits the queue.
  • The -k option will send the batch job’s standard out and error to the .spdcp directory in the submitting user’s home directory.
> spdcp -v -k -R -s 8 -w 900 -A ABC123 -q debug srcdir destdir
qsub -V -Nstaging -joe -lwalltime=900 -lsize=8 -A ABC123 -q debug /tmp/pbs_scriptOmuk2o
Polling:  648646.nid03588
...
Polling:  648646.nid03588
Finished: 648646.nid03588
>

top


Batch Script

The following batch script will transfer srcdir to destdir using 64 cores:

#!/bin/csh
#PBS -A abc123
#PBS -N SPDCPtst
#PBS -j oe
#PBS -l walltime=1:00:00,size=64

source $MODULESHOME/init/csh
module load spdcp

spdcp -R -s64 scrdir destdir

top


Interactive Batch Job

The following example launches an interactive batch job and transfers srcdir to destdir using the allocated 128 cores:

>  qsub -I -lsize=128 -Aabc123 -lwalltime=01:00:00
qsub: waiting for job 648618.nid03588 to start
qsub: job 648618.nid03588 ready

> module load spdcp
> spdcp -R -s128 scrdir destdir

top