Efficient management of users and queues is critical for maximizing HPC cluster utilization and ensuring equitable resource distribution. Here's how to manage them effectively:
Step 1: Managing Users
- Add Users:
- sudo useradd -m username
- sudo passwd username
- Assign Permissions and Groups:
- sudo usermod -aG hpcusers username
Step 2: Setting Up User Quotas
- Configure disk quotas on shared file systems:
- sudo setquota -u username block_soft block_hard inode_soft inode_hard /filesystem
Step 3: Job Scheduler Queues
- Create Queues (Partitions in Slurm):
- sudo sacctmgr add partition name=partition_name MaxTime=24:00:00 Nodes=node_list
- List Existing Queues:
- scontrol show partitions
Step 4: Assigning Users to Queues
- Limit queue access:
- sudo sacctmgr modify user username set partition=partition_name
Step 5: Queue Management and Priority
- Set queue priorities:
- sudo sacctmgr modify partition partition_name set Priority=100
- Adjust user/job priorities dynamically using Slurm commands:
- scontrol update JobId=<jobid> Priority=2000
Step 6: Monitoring Users and Queue Usage
- Monitor queue activity and job status:
- squeue
- squeue -u username
- Check user usage and resource allocation:
- sacct -u username
Step 7: Removing or Disabling Users
- Disable user accounts:
- sudo usermod -L username
- Delete user accounts:
- sudo userdel -r username
Step 8: Documentation and Policies
- Clearly document queue policies, user guidelines, and best practices.
- Provide regular training or resources to help users effectively utilize the cluster.
By following these steps, you'll ensure smooth, efficient, and fair management of your HPC cluster's users and resource queues.
Comments
0 comments
Please sign in to leave a comment.