How to Manage Users and Queues on an HPC Cluster

Russell Smith
Russell Smith
  • Updated

Efficient management of users and queues is critical for maximizing HPC cluster utilization and ensuring equitable resource distribution. Here's how to manage them effectively:

Step 1: Managing Users

  • Add Users:
  • sudo useradd -m username
  • sudo passwd username
  • Assign Permissions and Groups:
  • sudo usermod -aG hpcusers username

Step 2: Setting Up User Quotas

  • Configure disk quotas on shared file systems:
  • sudo setquota -u username block_soft block_hard inode_soft inode_hard /filesystem

Step 3: Job Scheduler Queues

  • Create Queues (Partitions in Slurm):
  • sudo sacctmgr add partition name=partition_name MaxTime=24:00:00 Nodes=node_list
  • List Existing Queues:
  • scontrol show partitions

Step 4: Assigning Users to Queues

  • Limit queue access:
  • sudo sacctmgr modify user username set partition=partition_name

Step 5: Queue Management and Priority

  • Set queue priorities:
  • sudo sacctmgr modify partition partition_name set Priority=100
  • Adjust user/job priorities dynamically using Slurm commands:
  • scontrol update JobId=<jobid> Priority=2000

Step 6: Monitoring Users and Queue Usage

  • Monitor queue activity and job status:
  • squeue
  • squeue -u username
  • Check user usage and resource allocation:
  • sacct -u username

Step 7: Removing or Disabling Users

  • Disable user accounts:
  • sudo usermod -L username
  • Delete user accounts:
  • sudo userdel -r username

Step 8: Documentation and Policies

  • Clearly document queue policies, user guidelines, and best practices.
  • Provide regular training or resources to help users effectively utilize the cluster.

By following these steps, you'll ensure smooth, efficient, and fair management of your HPC cluster's users and resource queues.

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.