ibs 기초과학연구원

보유자원

home보유자원chevron_rightHPC

Olaf

01
Olaf HW 정보
시스템 명 Olaf-g Olaf-c Olaf-cu Jepyc Jepyc-rtx HQ2 HQ HQmem
해당 partition AIP
mig-1g.10gb
mig-1g.20gb
mig-3g.40gb
normal_c,
long_c,
olaf_astro
olaf_c_core normal, long,
express
jepyc jepyc-rtx HQ2comp HQcomp HQmem
모델명 Lenovo SR675 V3 Lenovo SD630 V2 HPE Apollo 6500 Gen10 SuperMicro AS-4023S-TRT ASUS ESC8000 G4 HP ProLiant DL360 Gen9 /
Dell PowerEdge R630
HP ProLiant DL360 Gen9 HP ProLiant DL360 Gen9
Num of Nodes 12 194 16 5 20 1 28(10+18) 28 4
CPU 종류 AMD EPYC 9334
(2.7GHz, 32 Cores)
Intel Xeon Platinum 8360Y
(2.6GHz, 36Cores)
Intel Xeon G6230R
(2.1GHz, 26Cores)
AMD EPYC 7401
(2.0GHz, 24Cores)
Intel Xeon Gold 6126
(2.6GHz, 12Cores)
Intel Xeon E5-2690 v4
(2.6GHz, 14Cores) /
Intel Xeon E5-2690 v3
(2.6GHz, 12Cores)
Intel Xeon E5-2650 v3
(2.3GHz, 10 Cores)
Intel Xeon E5-2650 v3
(2.3GHz, 10 Cores)
CPU per Node 2 2 2 2 2 2 2
GPU 종류 Nvidia H100 SXM5 80GB - Tesla V100 32GB SXM2 GeForce 1080 Ti GeForce 2080 Ti - - -
노드당 GPU 개수 4 - 8 2 8 - - -
Memory per Node 1,024 256 768 64 48 64 / 128 64 256
02
Partition 정보
Partition Name Node Walltime
(hours)
Priority Max Mem per Job Remark
AIP olaf_g[001-004] 72 20 -
mig-1g.10gb olaf_g012 72 20 -
mig-1g.20gb olaf_g[007-008,011] 72 20 -
mig-3g.40gb olaf_g[005-006,009-010] 72 20 -
normal_cpu olaf_c[001-194] 72 20 -
long_cpu olaf_c[001-194] 336 2 -
express_cpu olaf_c[001-194] 336 220 -
olaf_astro olaf_c[001-194] 336 20 - -
olaf_c_core olaf_c[195-210] 336 20 - -
normal olaf_cu[1-5] 72 20 Only for GPU jobs
long olaf_cu[1-5] 336 2 Only for GPU jobs
express olaf_cu[1-5] 336 22 Only for GPU jobs
jepyc jepyc[01-20] - 2 Only for GPU jobs
jepyc-rtx jepyc50 - 2 Only for GPU jobs
HQ2comp HQ2comp[01-28] - 2 -
HQcomp HQcomp[01-28] - - -
HQeme HQmem[01-04] - 2 -
03
Olaf SW 정보

컴파일러 및 라이브러리 모듈

구분 항목 (이름/버전)
OS Cent OS 7.8 Rocky 8.6
컴파일러 gcc/7.5.0gcc/8.4.0gcc/9.3.0gcc/10.2.0gcc/11.2.0intel/18.5.274intel/19.5.281intel/20.4.304pgi/20.9pgi/23.5 gcc/8.5.0gcc/9.3.0gcc/11.2.0go/1.22.0intel/2021.2.0intel/2021.3.0intel/2022.0.2intel/2022.2.1pgi/23.5
MPI impi/18.5.274impi/18.5.275impi/19.5.281impi/20.4.304openmpi/3.1.4openmpi/3.1.6openmpi/4.0.5openmpi/4.1.1openmpi/4.1.4 impi/2021.1.1impi/2021.2.0impi/2021.3.0impi/2021.5.1.impi/2021.7.1.openmpi/4.1.1openmpi/4.1.4
라이브러리 blas/3.8.0boost/1.77.0 cuDNN/8.4.0fftw/2.1.5fftw/3.3.8fltk/1.3.3 fltk/1.3.5gdal/3.5.0geos/3.10.3cudatoolkit/10.0cudatoolkit/10.2cudatoolkit/11.0 cudatoolkit/11.1cudatoolkit/11.3cudatoolkit/11.7cudatoolkit/11.8cudatoolkit/8.0hdf5/1.14.3 imod/4.11.5gsl/2.5jasper/2.0.22petsc/3.15.0 phenix/1.19.2proj/8.2.1parallel/2021.082sqlite/3.39.0utils/defaultwxWidgets/3.0.2wxWidgets/3.1.4 blas/3.11.0 boost/1.65.1boost/1.81.0cudatoolkit/11.7cudatoolkit/11.8 cudatoolkit/12.2fftw/3.3.8fftw/3.3.10 gsl/2.7.1hdf4/4.2.14hdf4/4.2.15hdf5/1.12.1jasper/3.0.6jpeg/9elapack/3.11.0libtirpc/1.3.3libxml2/2.11.4 mpc/1.3.1mpfr/4.1.1pcre2/10.42petsc/3.18.2pnetcdf/1.12.3 readline/8.2sqlite/3.43.0trilinos/13.4.1ucx/1.13.1zlib/1.2.11
소프트웨어 anaconda3/2020.11 Aretomo/1.3.3 chimera/1.15 chimerax/1.1 clang/6.0.1 cmake/3.18.4 coot/0.9.6.2 dynamo/1.1.532 eman2/2.9 gaussian/g16.c02 gaussview/gv61 gautomatch/0.53 ghostscript/9.50 git/2.38.0 git-lfs/3.4.1 julia/1.6.0 MotionCOR/1.4.0 netcdf/4.4.1.1 orca/4.2.0 orca/5.0.3 pbs/default python/2.7.17 python/3.6.10 python/3.7.2 qchem/6.0.1 R/4.0.5 root/6.18.04 singularity/3.8.0 singularity/3.8.2 spack/0.20.0 vmd/1.9.3 vmd/1.9.4a anaconda/23.09.0 bagel/1.2.2 bison/3.8.2 bzip2/1.0.8 charm/7.0.0 cmake/3.18.4 cmake/3.28.1 curl/7.61.1 curl/7.88.1 difx/2.6.3 difx/2.8.1 flex/2.6.4 gaussian/g16.c01 ghostscript/10.02.1 git-lfs/3.5.1 gmp/6.2.1 go/1.22.0 isl/0.24 miniconda/23.1.0 ncurses/6.4 ncview/2.1.10 netcdf/4.8.1 netcdf-fortran/4.5.4 openfoam/v2006 openssl/1.1.1g python/3.6.10 python/3.7.2 python/3.8.16 python/3.9.16 protein-miniconda/23.1.0 qchem/6.0.1 qmcpack/3.16.0 R/4.0.5 relion/4.0.0 relion-gpu/4.0.0 root/6.26.10 singularity/4.1.1 wgrib/1.8.2 wgrib2/3.1.1 xz/5.4.2
04
작업스케줄러 정보

01기본 명령어 요약

명령어 내용
$ sbatch [옵션…] 스크립트 작업 제출
$ scancel 작업ID 작업 삭제
$ squeue 작업 상태 확인
$ sinfo [옵션] 노드 정보 확인

02Sinfo

Slurm 노드 및 파티션 정보를 조회
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 5 mix olaf-cu[1-5]
cryoem* up infinite 5 mix olaf-cu[1-5]

03Sbatch

$ sbatch ./job_script.sh

[작업스크립트 예제]

#!/bin/sh
#SBATCH -J test # 작업 이름
#SBATCH -p cryoem # partition 이름
#SBATCH -N 2 # 총 필요한 컴퓨팅 노드 수
#SBATCH -n 2 # 총 필요한 프로세스 수
#SBATCH -o %x.o%j # stdout 파일명 ({작업이름}.o{작업ID})
#SBATCH -e %x.e%j # stderr 파일명 ({작업이름}.e{작업ID})
#SBATCH –time 00:30:00 # 최대 작업 시간
#SBATCH –gres=gpu2 # GPU 사용을 위한 옵션
Srun ./run.x # 실제 수행될 명령줄 입력

04Squeue

제출된 작업 목록 및 정보 조회 명령어
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1327 debug Relion_c crayadmi R 19:30:41 1 olaf-cu4
1328 debug Relion_c crayadmi R 19:28:06 1 olaf-cu3
1329 debug Relion_c crayadmi R 19:25:47 1 olaf-cu1
1330 debug Relion_c crayadmi R 19:25:47 1 olaf-cu2
1344 debug Relion_c crayadmi R 17:15:17 1 olaf-cu5
1358 cryoem cryospar ibsuser R 14:25:46 1 olaf-cu5

[제출된 작업 상세 조회]

Scontrol 명령어를 이용하면 제출된 작업의 상세내역을 조회할 수 있습니다.
$ scontrol show job [작업 ID]

$ scontrol show job 1327
JobId=1327 JobName=Relion_case6_olaf-cu4
UserId=… GroupId=… MCS_label=N/A
Priority=4294901439 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=19:48:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2020-12-21T12:49:40 EligibleTime=2020-12-21T12:49:40
AccrueTime=2020-12-21T12:49:40
StartTime=2020-12-21T12:49:40 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-21T12:49:40
Partition=debug AllocNode:Sid=olaf1:72181
ReqNodeList=olaf-cu4 ExcNodeList=(null)
NodeList=olaf-cu4
BatchHost=olaf-cu4
NumNodes=1 NumCPUs=52 NumTasks=52 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=52,node=1,billing=52
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./run_case6_4GPU.sh
WorkDir=…
StdErr=…
StdIn=…
StdOut=…
Power=
TresPerJob=gpu:4
TresPerNode=gpu:4
MailUser=(null) MailType=NONE

05Scancel

제출된 작업 수행을 취소합니다.
$ scancel [작업 ID]