Olaf
01
Olaf HW 정보
시스템 명 | Olaf-g | Olaf-c | Olaf-cu | Jepyc | Jepyc-rtx | HQ2 | HQ | HQmem | |
---|---|---|---|---|---|---|---|---|---|
해당 partition | AIP mig-1g.10gb mig-1g.20gb mig-3g.40gb |
normal_c, long_c, olaf_astro |
olaf_c_core | normal, long, express |
jepyc | jepyc-rtx | HQ2comp | HQcomp | HQmem |
모델명 | Lenovo SR675 V3 | Lenovo SD630 V2 | HPE Apollo 6500 Gen10 | SuperMicro AS-4023S-TRT | ASUS ESC8000 G4 | HP ProLiant DL360 Gen9 / Dell PowerEdge R630 |
HP ProLiant DL360 Gen9 | HP ProLiant DL360 Gen9 | |
Num of Nodes | 12 | 194 | 16 | 5 | 20 | 1 | 28(10+18) | 28 | 4 |
CPU 종류 | AMD EPYC 9334 (2.7GHz, 32 Cores) |
Intel Xeon Platinum 8360Y (2.6GHz, 36Cores) |
Intel Xeon G6230R (2.1GHz, 26Cores) |
AMD EPYC 7401 (2.0GHz, 24Cores) |
Intel Xeon Gold 6126 (2.6GHz, 12Cores) |
Intel Xeon E5-2690 v4 (2.6GHz, 14Cores) / Intel Xeon E5-2690 v3 (2.6GHz, 12Cores) |
Intel Xeon E5-2650 v3 (2.3GHz, 10 Cores) |
Intel Xeon E5-2650 v3 (2.3GHz, 10 Cores) |
|
CPU per Node | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||
GPU 종류 | Nvidia H100 SXM5 80GB | - | Tesla V100 32GB SXM2 | GeForce 1080 Ti | GeForce 2080 Ti | - | - | - | |
노드당 GPU 개수 | 4 | - | 8 | 2 | 8 | - | - | - | |
Memory per Node | 1,024 | 256 | 768 | 64 | 48 | 64 / 128 | 64 | 256 |
02
Partition 정보
Partition Name | Node | Walltime (hours) |
Priority | Max Mem per Job | Remark |
---|---|---|---|---|---|
AIP | olaf_g[001-004] | 72 | 20 | - | |
mig-1g.10gb | olaf_g012 | 72 | 20 | - | |
mig-1g.20gb | olaf_g[007-008,011] | 72 | 20 | - | |
mig-3g.40gb | olaf_g[005-006,009-010] | 72 | 20 | - | |
normal_cpu | olaf_c[001-194] | 72 | 20 | - | |
long_cpu | olaf_c[001-194] | 336 | 2 | - | |
express_cpu | olaf_c[001-194] | 336 | 220 | - | |
olaf_astro | olaf_c[001-194] | 336 | 20 | - | - |
olaf_c_core | olaf_c[195-210] | 336 | 20 | - | - |
normal | olaf_cu[1-5] | 72 | 20 | Only for GPU jobs | |
long | olaf_cu[1-5] | 336 | 2 | Only for GPU jobs | |
express | olaf_cu[1-5] | 336 | 22 | Only for GPU jobs | |
jepyc | jepyc[01-20] | - | 2 | Only for GPU jobs | |
jepyc-rtx | jepyc50 | - | 2 | Only for GPU jobs | |
HQ2comp | HQ2comp[01-28] | - | 2 | - | |
HQcomp | HQcomp[01-28] | - | - | - | |
HQeme | HQmem[01-04] | - | 2 | - |
03
Olaf SW 정보
컴파일러 및 라이브러리 모듈
구분 | 항목 (이름/버전) | |
---|---|---|
OS | Cent OS 7.8 | Rocky 8.6 |
컴파일러 | gcc/7.5.0gcc/8.4.0gcc/9.3.0gcc/10.2.0gcc/11.2.0intel/18.5.274intel/19.5.281intel/20.4.304pgi/20.9pgi/23.5 | gcc/8.5.0gcc/9.3.0gcc/11.2.0go/1.22.0intel/2021.2.0intel/2021.3.0intel/2022.0.2intel/2022.2.1pgi/23.5 |
MPI | impi/18.5.274impi/18.5.275impi/19.5.281impi/20.4.304openmpi/3.1.4openmpi/3.1.6openmpi/4.0.5openmpi/4.1.1openmpi/4.1.4 | impi/2021.1.1impi/2021.2.0impi/2021.3.0impi/2021.5.1.impi/2021.7.1.openmpi/4.1.1openmpi/4.1.4 |
라이브러리 | blas/3.8.0boost/1.77.0 cuDNN/8.4.0fftw/2.1.5fftw/3.3.8fltk/1.3.3 fltk/1.3.5gdal/3.5.0geos/3.10.3cudatoolkit/10.0cudatoolkit/10.2cudatoolkit/11.0 cudatoolkit/11.1cudatoolkit/11.3cudatoolkit/11.7cudatoolkit/11.8cudatoolkit/8.0hdf5/1.14.3 imod/4.11.5gsl/2.5jasper/2.0.22petsc/3.15.0 phenix/1.19.2proj/8.2.1parallel/2021.082sqlite/3.39.0utils/defaultwxWidgets/3.0.2wxWidgets/3.1.4 | blas/3.11.0 boost/1.65.1boost/1.81.0cudatoolkit/11.7cudatoolkit/11.8 cudatoolkit/12.2fftw/3.3.8fftw/3.3.10 gsl/2.7.1hdf4/4.2.14hdf4/4.2.15hdf5/1.12.1jasper/3.0.6jpeg/9elapack/3.11.0libtirpc/1.3.3libxml2/2.11.4 mpc/1.3.1mpfr/4.1.1pcre2/10.42petsc/3.18.2pnetcdf/1.12.3 readline/8.2sqlite/3.43.0trilinos/13.4.1ucx/1.13.1zlib/1.2.11 |
소프트웨어 | anaconda3/2020.11 Aretomo/1.3.3 chimera/1.15 chimerax/1.1 clang/6.0.1 cmake/3.18.4 coot/0.9.6.2 dynamo/1.1.532 eman2/2.9 gaussian/g16.c02 gaussview/gv61 gautomatch/0.53 ghostscript/9.50 git/2.38.0 git-lfs/3.4.1 julia/1.6.0 MotionCOR/1.4.0 netcdf/4.4.1.1 orca/4.2.0 orca/5.0.3 pbs/default python/2.7.17 python/3.6.10 python/3.7.2 qchem/6.0.1 R/4.0.5 root/6.18.04 singularity/3.8.0 singularity/3.8.2 spack/0.20.0 vmd/1.9.3 vmd/1.9.4a | anaconda/23.09.0 bagel/1.2.2 bison/3.8.2 bzip2/1.0.8 charm/7.0.0 cmake/3.18.4 cmake/3.28.1 curl/7.61.1 curl/7.88.1 difx/2.6.3 difx/2.8.1 flex/2.6.4 gaussian/g16.c01 ghostscript/10.02.1 git-lfs/3.5.1 gmp/6.2.1 go/1.22.0 isl/0.24 miniconda/23.1.0 ncurses/6.4 ncview/2.1.10 netcdf/4.8.1 netcdf-fortran/4.5.4 openfoam/v2006 openssl/1.1.1g python/3.6.10 python/3.7.2 python/3.8.16 python/3.9.16 protein-miniconda/23.1.0 qchem/6.0.1 qmcpack/3.16.0 R/4.0.5 relion/4.0.0 relion-gpu/4.0.0 root/6.26.10 singularity/4.1.1 wgrib/1.8.2 wgrib2/3.1.1 xz/5.4.2 |
04
작업스케줄러 정보
명령어 | 내용 |
---|---|
$ sbatch [옵션…] 스크립트 | 작업 제출 |
$ scancel 작업ID | 작업 삭제 |
$ squeue | 작업 상태 확인 |
$ sinfo [옵션] | 노드 정보 확인 |
Slurm 노드 및 파티션 정보를 조회
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 5 mix olaf-cu[1-5]
cryoem* up infinite 5 mix olaf-cu[1-5]
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 5 mix olaf-cu[1-5]
cryoem* up infinite 5 mix olaf-cu[1-5]
$ sbatch ./job_script.sh
[작업스크립트 예제]
#!/bin/sh | |
#SBATCH -J test | # 작업 이름 |
#SBATCH -p cryoem | # partition 이름 |
#SBATCH -N 2 | # 총 필요한 컴퓨팅 노드 수 |
#SBATCH -n 2 | # 총 필요한 프로세스 수 |
#SBATCH -o %x.o%j | # stdout 파일명 ({작업이름}.o{작업ID}) |
#SBATCH -e %x.e%j | # stderr 파일명 ({작업이름}.e{작업ID}) |
#SBATCH –time 00:30:00 | # 최대 작업 시간 |
#SBATCH –gres=gpu2 | # GPU 사용을 위한 옵션 |
Srun ./run.x | # 실제 수행될 명령줄 입력 |
제출된 작업 목록 및 정보 조회 명령어
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1327 debug Relion_c crayadmi R 19:30:41 1 olaf-cu4
1328 debug Relion_c crayadmi R 19:28:06 1 olaf-cu3
1329 debug Relion_c crayadmi R 19:25:47 1 olaf-cu1
1330 debug Relion_c crayadmi R 19:25:47 1 olaf-cu2
1344 debug Relion_c crayadmi R 17:15:17 1 olaf-cu5
1358 cryoem cryospar ibsuser R 14:25:46 1 olaf-cu5
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1327 debug Relion_c crayadmi R 19:30:41 1 olaf-cu4
1328 debug Relion_c crayadmi R 19:28:06 1 olaf-cu3
1329 debug Relion_c crayadmi R 19:25:47 1 olaf-cu1
1330 debug Relion_c crayadmi R 19:25:47 1 olaf-cu2
1344 debug Relion_c crayadmi R 17:15:17 1 olaf-cu5
1358 cryoem cryospar ibsuser R 14:25:46 1 olaf-cu5
[제출된 작업 상세 조회]
Scontrol 명령어를 이용하면 제출된 작업의 상세내역을 조회할 수 있습니다.
$ scontrol show job [작업 ID]
$ scontrol show job 1327
JobId=1327 JobName=Relion_case6_olaf-cu4
UserId=… GroupId=… MCS_label=N/A
Priority=4294901439 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=19:48:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2020-12-21T12:49:40 EligibleTime=2020-12-21T12:49:40
AccrueTime=2020-12-21T12:49:40
StartTime=2020-12-21T12:49:40 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-21T12:49:40
Partition=debug AllocNode:Sid=olaf1:72181
ReqNodeList=olaf-cu4 ExcNodeList=(null)
NodeList=olaf-cu4
BatchHost=olaf-cu4
NumNodes=1 NumCPUs=52 NumTasks=52 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=52,node=1,billing=52
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./run_case6_4GPU.sh
WorkDir=…
StdErr=…
StdIn=…
StdOut=…
Power=
TresPerJob=gpu:4
TresPerNode=gpu:4
MailUser=(null) MailType=NONE
$ scontrol show job [작업 ID]
$ scontrol show job 1327
JobId=1327 JobName=Relion_case6_olaf-cu4
UserId=… GroupId=… MCS_label=N/A
Priority=4294901439 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=19:48:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2020-12-21T12:49:40 EligibleTime=2020-12-21T12:49:40
AccrueTime=2020-12-21T12:49:40
StartTime=2020-12-21T12:49:40 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-21T12:49:40
Partition=debug AllocNode:Sid=olaf1:72181
ReqNodeList=olaf-cu4 ExcNodeList=(null)
NodeList=olaf-cu4
BatchHost=olaf-cu4
NumNodes=1 NumCPUs=52 NumTasks=52 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=52,node=1,billing=52
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./run_case6_4GPU.sh
WorkDir=…
StdErr=…
StdIn=…
StdOut=…
Power=
TresPerJob=gpu:4
TresPerNode=gpu:4
MailUser=(null) MailType=NONE
제출된 작업 수행을 취소합니다.
$ scancel [작업 ID]