Accessing the Compute Nodes
Pantarhei employs the Slurm Workload Manager for the purpose of job scheduling and management. Utilizing Slurm, a user initiates a request for resources and submits a job to a designated queue. Subsequently, the system undertakes the task of extracting jobs from the queues, assigning the requisite compute nodes, and executing the submitted tasks. Although users typically access the Slurm job scheduler by SSH-ing to a Pantarhei login node, it is imperative to emphasize that the recommended practice entails utilizing Slurm to submit work as a job, as opposed to executing computationally intensive tasks directly on a login node. Given that all users share the login nodes, running anything beyond minimal test jobs can adversely affect the collective ability of users to effectively utilize Pantarhei resources.
Pantarhei's framework is tailored to accommodate the moderate-scale computational and data requirements of the majority of CIROH users. Users with allocations possess the capability to submit tasks to a diverse array of queues, each featuring distinct job size and walltime constraints. Dedicated sets of queues are allocated for CPU, GPU, and FPGA nodes, with typically shorter walltime and smaller job size limits translating to expedited turnaround times. Several additional considerations regarding Pantarhei queues merit attention:
- Pantarhei facilitates shared jobs, whereby multiple tasks can be executed on a single node. This approach enhances job throughput, maximizes overall system utilization, and fosters increased user accessibility to Pantarhei resources.
- Pantarhei accommodates long-running jobs, with run times extendable up to seven days for tasks utilizing up to 6 full nodes.
- The maximum permissible job size on Pantarhei is 240 cores. For tasks exceeding this threshold, users are advised to initiate a consulting ticket to engage in further discussion with Pantarhei support personnel.