Redwood User Guide
Redwood is an HPC cluster of the New CHPC Protected Environment, designed and built for researchers whose data contains Protected Health Information (PHI) and other sensitive data. This cluster is considered HIPAA Compliant.
Contents
Redwood general cluster hardware overview
- 4 Intel XeonSP (Skylake) nodes each with 32 cores and 192GB RAM (128 total cores)
- 9 Intel Broadwell nodes each with 28 cores and 128GB RAM (308 total cores)
- 2 AMD Epyc (Rome) nodes each with 64 cores and 512 GB RAM (128 total cores)
- 1 Intel Xeon (X7560) nodes with 32 cores and 1 TB RAM
- 2 GPU nodes
- 32 cores each (Intel XeonSP), 192GB RAM
- 4 x GTX1080Ti GPUs
- Mellanox EDR Infiniband interconnect (with the pre-skylake generation nodes connected at FDR)
- Gigabit Ethernet interconnect for management
- 300TB General Scratch server (/scratch/general/pe-nfs1)
- 2 interactive nodes
In addition to the general resources, there are owner nodes. Owner nodes can be accessed by all users of redwood in a premptable guest manner using the partition redwood-guest and account owner-guest.
Redwood-shared-short nodes
Added Feburary 2022: Two of the Intel Broadwell general nodes were repurposed to be redwood-shared-short nodes. Each node has 28 physical cores and 128 GB of memory.
These nodes are available for use by all users, regardless if they have access to a general allocation; the use of these nodes will not count against any allocation. To use set both the partition and the account toredwood-shared-short. As node sharing is being used – users MUST specify the number of cores and the amount of memory – see https://www.chpc.utah.edu/documentation/software/node-sharing.php for additional details.
In order to maximize throughput of short jobs, and provide access to all users, they have been placed in a separate partition, with node sharing enabled. Use of these nodes is limited:
- Maximum wall time is 8 hours
- Maximum running jobs per user is 2
- Maximum cores per user is 8
- Maximum memory per user is 32 GB
- Maximum cores per job is 8
- Maximum memory per job is 32 GB
Important Differences between Redwood and Other CHPC Clusters - NEW!
The design of the protected environment is fundamentally different than other CHPC clusters. The general clusters (ember, kingspeak, lonepeak etc.) were designed to be open (within reason, taking a balanced approach on security). Redwood was designed primarily to mitigate risk and protect data. If you have used the general clusters, the first thing you will notice when you login is that the home directory is on a completely different file system, where your other CHPC home directory is mounted on all the various general and more open clusters.
FAQ section
Please refer to the Protected Environment Frequently Asked Questions (FAQ).
Redwood Cluster Usage
CHPC resources are available to qualified faculty, students (under faculty supervision), and researchers from any Utah institution of higher education. Users can request accounts for CHPC computer systems by filling out an account request form. This can be found by following this link: account request form.
As Redwood is part of the CHPC Protected Environment, users must also have permission to access that. See the Protected Environment Frequently Asked Questions (FAQ) for how to apply and qualify for access.
Redwood will be using allocation - see the Alloction section on our Protected Environment page for details.
Redwood Cluster Access and Environment
As part of the CHPC protected environment, redwood requires that you undergo authentication more rigorous than is needed for the other CHPC clusters. See the Protected Environment Frequently Asked Questions (FAQ).
Once you are able to connect to the Protected Environment, the redwoodcluster can be accessed via ssh (secure shell) at the following addresses:
- redwood.chpc.utah.edu (general PE users; round robins between redwood1.chpc.utah.edu and redwood2.chpc.utah.edu)
Redwood does not mount the same account directory as the unprotected clusters do. If you have files on your regular CHPC account that you wish to use on redwood, you must copy them using a secure protocol such as scp.
You will have access to a project directory on the mammoth file system. This is the location for storage of data common to a project. All PE users who are vetted by the Institutional Review Board or other relevant authority as having rights to access it will be able to work with the data in this directory. Project members may create files there or modify them, as well.
Redwood compute nodes mount the following scratch file systems:
- /scratch/general/pe-nfs1
- /scratch/ucgd/lustre (group specific access)
As a reminder, the non-restricted scratch file systems are automatically scrubbed of files that have not been accessed for 60 days.
At the present time, the CHPC supports two types of shells: tcsh
and bash
. Tcsh shell users need to select the .tcshrc
login script. Users whose shell is bash
need the .bashrc
file to log in.
Your environment is setup through the use of modules. Please see the User Environment section of the General Cluster Information page for details in setting up your environment for batch and other applications.
Using the Batch System on Redwood
The batch implementation on all CHPC systems is Slurm.
The creation of a batch script on the redwood cluster
A shell script is a bundle of shell commands which are fed one after another to a
shell (bash
, tcsh
,..). As soon as the first command has successfully finished, the second command is
executed. This process continues until either an error occurs or the whole list of
individual shell commands has been executed. A batch script is a shell script which
defines the tasks a particular job has to execute on a cluster.
Below this paragraph a batch script example for running in Slurm on the Redwood cluster is shown. The lines at top of the file all begin with #SBATCH which are interpreted by the shell as comments, but give options to Slurm.
For jobs with heavy I/O it is recommended to move input files to the scratch directory (whether /scratch/local/ or /scratch/general/pe-nfs1) and then cd to that directory, run the job, and then move output back to your project space. That way all of the I/O of the program - both the reading of the input data and the output of any files, whether they are temporary or your results - is done on the scratch file system.
Note that the top level /scratch/local is not writeable by users. Instead a directory /scratch/local/$USER/$SLURM_JOB_ID is created in the prolog of the slurm job. This directory is removed after the job exits the node, in the slurm epilog.
Example Slurm Script for Redwood that does this:
#!/bin/csh
#SBATCH --time=1:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=2 # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the
first node (%N)
#SBATCH --ntasks=16 # number of MPI tasks, abbreviated by -n # additional information
for allocated clusters
#SBATCH --account=baggins # account - abbreviated by -A
#SBATCH --partition=redwood # partition, abbreviated by -p # # set data and working
directories
setenv WORKDIR <projectspace>
setenv SCRDIR /scratch/general/pe-nfs1/$USER/$SLURM_JOB_ID
#make SCRDIR and copy over input files
mkdir -p $SCRDIR
cp -r $WORKDIR/* $SCRDIR
cd $SCRDIR
# load appropriate modules, in this case Intel compilers, MPICH2
module load intel mpich2
# for MPICH2 over Ethernet, set communication method to TCP - for general lonepeak
nodes
# see above for network interface selection options for other MPI distributions
setenv MPICH_NEMESIS_NETMOD tcp
# run the program
# see above for other MPI distributions
mpirun -np $SLURM_NTASKS my_mpi_program > my_program.out
#Copy output files back to working directory/project space
cp $SCRDIR/outputfile $WORKDIR/.
cd $WORKDIR
rm -rf $SCRDIR
For more details and example scripts please see our Slurm documentation. Also, to help with specifying your job and instructions in your slurm script, please review CHPC Policy 2.3.1 Redwood Job Scheduling Policy.
Job Submission on Redwood
In order to submit a job on redwood one has to login first into an interactive node (see above).
To submit a script named slurmjob, just type:
sbatch slurmjob
Checking the status of your job in slurm
To check the status of your job, use the "sinfo" command
sinfo
For information on compiling on the clusters at CHPC, please see our Programming Guide.