Olaf
01
Olaf HW 정보
| 시스템 명 | Olaf-g | Olaf-c | Olaf-cu | Jepyc | HQmem | |
|---|---|---|---|---|---|---|
| 해당 partition | AIP AIP_long mig-3g.40gb mig-1g.10gb mig-1g.10gb_long |
normal_c long_c large_c express_cpu |
core_s core_m core_l |
normal long express |
jepyc | HQmem |
| 모델명 | Lenovo SR675 V3 | Lenovo SD630 V2 | HPE Apollo 6500 Gen10 | ASUS ESC8000 G4 | HP ProLiant DL360 Gen9 | |
| Num of Nodes | 12 | 194 | 16 | 5 | 1 | 4 |
| CPU 종류 | AMD EPYC 9334 | Intel Xeon Platinum 8360Y | Intel Xeon G6230R | Intel Xeon Gold 6126 | Intel Xeon E5-2650 v3 | |
| (2.7GHz, 32 Cores) | (2.6GHz, 36Cores) | (2.1GHz, 26Cores) | (2.6GHz, 12Cores) | (2.3GHz, 10 Cores) | ||
| CPU per Node | 2 | 2 | 2 | 2 | ||
| GPU 종류 | Nvidia H100 SXM5 80GB | - | Tesla V100 32GB SXM2 | GeForce 2080 Ti | - | |
| 노드당 GPU 개수 | 4 | - | 8 | 8 | - | |
| Memory per Node | 1,024 | 256 | 768 | 48 | 256 | |
02
Partition 정보
| Partition Name | Node | Walltime (hours) |
Priority | Max Mem per Job | Remark |
|---|---|---|---|---|---|
| AIP | olaf-g[001-006] | 72 | - | - | |
| AIP_long | olaf-g[001-006] | 336 | - | - | Long jobs |
| mig-3g.40gb | olaf-g007 | 72 | - | - | MIG partition |
| mig-1g.10gb | olaf-g[009-010] | 72 | - | - | MIG partition |
| mig-1g.10gb_long | olaf-g[009-010] | 168 | - | - | MIG partition / Long jobs |
| core_s | olaf-c[001-041] | 2 | - | - | Short jobs |
| core_m | olaf-c[001-041] | 72 | - | - | |
| core_l | olaf-c[001-041] | 336 | - | - | Long jobs |
| normal_cpu | olaf-c[042-091] | 72 | - | - | |
| long_cpu | olaf-c[042-091] | 336 | - | - | Long jobs |
| large_cpu | olaf-c[092-210] | 72 | - | - | |
| normal | olaf-cu[1-5] | 72 | - | - | Only for GPU jobs |
| long | olaf-cu[1-5] | 336 | - | - | Only for GPU jobs / Long jobs |
| jepyc | jepyc[01-20] | - | - | - | Only for GPU jobs |
| HQmem | HQmem[01-04] | - | - | - |
* 신규 장비 도입시 삭제될 예정
03
Olaf SW 정보
컴파일러 및 라이브러리 모듈
| 구분 | 항목 (이름/버전) | |||
|---|---|---|---|---|
| OS | Cent OS 7.8 | Rocky 8.6 | ||
| 컴파일러 | gcc/7.5.0 gcc/9.3.0 gcc/11.2.0 intel/19.5.281 pgi/20.9 |
gcc/8.4.0 gcc/10.2.0 intel/18.5.274 intel/20.4.304 pgi/23.5 |
gcc/8.5.0 gcc/11.2.0 intel/2021.2.0 intel/2022.0.2 pgi/23.5 |
gcc/9.3.0 go/1.22.0 intel/2021.3.0 intel/2022.2.1 |
| MPI | impi/18.5.274 impi/19.5.281 openmpi/3.1.4 openmpi/4.0.5 openmpi/4.1.4 |
impi/18.5.275 impi/20.4.304 openmpi/3.1.6 openmpi/4.1.1 |
impi/2021.1.1 impi/2021.3.0 impi/2021.7.1 openmpi/4.1.4 |
impi/2021.2.0 impi/2021.5.1 openmpi/4.1.1 |
| 라이브러리 | blas/3.8.0 cuDNN/8.4.0 fftw/3.3.8 fitk/1.3.5 geos/3.10.3 cudatoolkit/10.2 cudatoolkit/11.1 cudatoolkit/11.7 cudatoolkit/8.0 imageJ/1.15 jasper/2.0.22 phenix/1.19.2 parallel/20210082 utils/default wxWidgets/3.1.4 |
boost/177.0 fftw/2.1.5 fitk/1.3.3 gdal/3.5.0 cudatoolkit/10.0 cudatoolkit/11.0 cudatoolkit/11.3 cudatoolkit/11.8 hdf5/1.14.3 gsl/2.5 petsc/3.15.0 proj/8.2.1 sqlite/3.39.0 wxWidgets/3.0.2 |
blas/3.11.0 boost/1.81.0 cudatoolkit/11.8 fftw/3.3.8 gsl/2.7.1 hdf5/4.2.15 jasper/3.0.6 lapack/3.11.0 libxml2/2.11.4 mpfr/4.1.0 petsc/3.18.2 readline/8.2 trilinos/13.4.1 zlib/1.2.11 |
boost/1.65.1 cudatoolkit/11.7 cudatoolkit/12.2 fftw/3.3.10 hdf4/4.2.14 hdf5/1.12.1 jpeg/9e libtirpc/1.3.3 npc/13.1 pcre2/10.42 netcdf/1.12.3 sqlite/3.43.0 ucx/1.13.1 |
| 소프트웨어 | anaconda3/2020.11 chimera/1.15 clang/6.0.1 coot/0.9.6.2 eman2/2.9 gaussview/gv61 ghostscript/9.50 git-lfs/3.4.1 MotionCOR/1.4.0 orca/4.2.0 pbs/default python/3.6.10 qchem/6.0.1 root/6.18.04 singularity/3.8.2 vmd/1.9.3 |
Aretomo/1.3.3 chimerax/1.1 cmake/3.18.4 dynamo/1.1.532 gaussian/g16.c02 gantomatch/0.53 git/2.38.0 julia/1.6.0 netcdf/4.4.1.1 orca/5.0.3 python/2.7.17 python/3.7.2 R/4.0.5 singularity/3.8.0 spack/0.20.0 vmd/1.9.4a |
anaconda/23.09.0 bison/3.8.2 charm/7.0.0 cmake/3.28.1 curl/7.88.1 dlf6/2.8.1 gaussian/g16.c01 git-lfs/3.5.1 go/1.22.0 miniconda/23.1.0 ncview/2.1.10 openssl/1.1.1g python/3.7.2 python/3.9.16 qchem/6.0.1 R/4.0.5 relion-gpu/4.0.0 singularity/4.1.1 wgrib2/3.1.1 |
baqel/1.2.2 bzip2/10.8 cmake/3.18.4 curl/7.61.1 difx/2.6.3 flex/2.6.4 ghostscript/10.02.1 gmp/6.2.1 isl/0.24 ncurses/6.4 netcdf/4.8.1 openfoam/v2006 python/3.6.10 python/3.8.16 protein-miniconda/23.1.0 qmcpack/3.16.0 relion/4.0.0 root/6.26.10 wgrib/1.8.2 xz/5.4.2 |
04
작업스케줄러 정보
| 명령어 | 내용 |
|---|---|
| $ sbatch [옵션…] 스크립트 | 작업 제출 |
| $ scancel 작업ID | 작업 삭제 |
| $ squeue | 작업 상태 확인 |
| $ sinfo [옵션] | 노드 정보 확인 |
Slurm 노드 및 파티션 정보를 조회
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 5 mix olaf-cu[1-5]
cryoem* up infinite 5 mix olaf-cu[1-5]
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 5 mix olaf-cu[1-5]
cryoem* up infinite 5 mix olaf-cu[1-5]
$ sbatch ./job_script.sh
[작업스크립트 예제]
| #!/bin/sh | |
| #SBATCH -J test | # 작업 이름 |
| #SBATCH -p cryoem | # partition 이름 |
| #SBATCH -N 2 | # 총 필요한 컴퓨팅 노드 수 |
| #SBATCH -n 2 | # 총 필요한 프로세스 수 |
| #SBATCH -o %x.o%j | # stdout 파일명 ({작업이름}.o{작업ID}) |
| #SBATCH -e %x.e%j | # stderr 파일명 ({작업이름}.e{작업ID}) |
| #SBATCH –time 00:30:00 | # 최대 작업 시간 |
| #SBATCH –gres=gpu2 | # GPU 사용을 위한 옵션 |
| Srun ./run.x | # 실제 수행될 명령줄 입력 |
제출된 작업 목록 및 정보 조회 명령어
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1327 debug Relion_c crayadmi R 19:30:41 1 olaf-cu4
1328 debug Relion_c crayadmi R 19:28:06 1 olaf-cu3
1329 debug Relion_c crayadmi R 19:25:47 1 olaf-cu1
1330 debug Relion_c crayadmi R 19:25:47 1 olaf-cu2
1344 debug Relion_c crayadmi R 17:15:17 1 olaf-cu5
1358 cryoem cryospar ibsuser R 14:25:46 1 olaf-cu5
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1327 debug Relion_c crayadmi R 19:30:41 1 olaf-cu4
1328 debug Relion_c crayadmi R 19:28:06 1 olaf-cu3
1329 debug Relion_c crayadmi R 19:25:47 1 olaf-cu1
1330 debug Relion_c crayadmi R 19:25:47 1 olaf-cu2
1344 debug Relion_c crayadmi R 17:15:17 1 olaf-cu5
1358 cryoem cryospar ibsuser R 14:25:46 1 olaf-cu5
[제출된 작업 상세 조회]
Scontrol 명령어를 이용하면 제출된 작업의 상세내역을 조회할 수 있습니다.
$ scontrol show job [작업 ID]
$ scontrol show job 1327
JobId=1327 JobName=Relion_case6_olaf-cu4
UserId=… GroupId=… MCS_label=N/A
Priority=4294901439 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=19:48:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2020-12-21T12:49:40 EligibleTime=2020-12-21T12:49:40
AccrueTime=2020-12-21T12:49:40
StartTime=2020-12-21T12:49:40 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-21T12:49:40
Partition=debug AllocNode:Sid=olaf1:72181
ReqNodeList=olaf-cu4 ExcNodeList=(null)
NodeList=olaf-cu4
BatchHost=olaf-cu4
NumNodes=1 NumCPUs=52 NumTasks=52 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=52,node=1,billing=52
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./run_case6_4GPU.sh
WorkDir=…
StdErr=…
StdIn=…
StdOut=…
Power=
TresPerJob=gpu:4
TresPerNode=gpu:4
MailUser=(null) MailType=NONE
$ scontrol show job [작업 ID]
$ scontrol show job 1327
JobId=1327 JobName=Relion_case6_olaf-cu4
UserId=… GroupId=… MCS_label=N/A
Priority=4294901439 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=19:48:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2020-12-21T12:49:40 EligibleTime=2020-12-21T12:49:40
AccrueTime=2020-12-21T12:49:40
StartTime=2020-12-21T12:49:40 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-21T12:49:40
Partition=debug AllocNode:Sid=olaf1:72181
ReqNodeList=olaf-cu4 ExcNodeList=(null)
NodeList=olaf-cu4
BatchHost=olaf-cu4
NumNodes=1 NumCPUs=52 NumTasks=52 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=52,node=1,billing=52
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./run_case6_4GPU.sh
WorkDir=…
StdErr=…
StdIn=…
StdOut=…
Power=
TresPerJob=gpu:4
TresPerNode=gpu:4
MailUser=(null) MailType=NONE
제출된 작업 수행을 취소합니다.
$ scancel [작업 ID]