ibs 기초과학연구원

보유자원

home보유자원chevron_rightHPC

Olaf

01
Olaf HW 정보
시스템 명 Olaf-g Olaf-c Olaf-cu Jepyc HQmem
해당 partition AIP
AIP_long
mig-3g.40gb
mig-1g.10gb
mig-1g.10gb_long
normal_c
long_c
large_c
express_cpu
core_s
core_m
core_l
normal
long
express
jepyc HQmem
모델명 Lenovo SR675 V3 Lenovo SD630 V2 HPE Apollo 6500 Gen10 ASUS ESC8000 G4 HP ProLiant DL360 Gen9
Num of Nodes 12 194 16 5 1 4
CPU 종류 AMD EPYC 9334 Intel Xeon Platinum 8360Y Intel Xeon G6230R Intel Xeon Gold 6126 Intel Xeon E5-2650 v3
(2.7GHz, 32 Cores) (2.6GHz, 36Cores) (2.1GHz, 26Cores) (2.6GHz, 12Cores) (2.3GHz, 10 Cores)
         
         
CPU per Node 2 2 2 2
GPU 종류 Nvidia H100 SXM5 80GB - Tesla V100 32GB SXM2 GeForce 2080 Ti -
노드당 GPU 개수 4 - 8 8 -
Memory per Node 1,024 256 768 48 256
02
Partition 정보
Partition Name Node Walltime
(hours)
Priority Max Mem per Job Remark
AIP olaf-g[001-006] 72 - -
AIP_long olaf-g[001-006] 336 - - Long jobs
mig-3g.40gb olaf-g007 72 - - MIG partition
mig-1g.10gb olaf-g[009-010] 72 - - MIG partition
mig-1g.10gb_long olaf-g[009-010] 168 - - MIG partition / Long jobs
core_s olaf-c[001-041] 2 - - Short jobs
core_m olaf-c[001-041] 72 - -
core_l olaf-c[001-041] 336 - - Long jobs
normal_cpu olaf-c[042-091] 72 - -
long_cpu olaf-c[042-091] 336 - - Long jobs
large_cpu olaf-c[092-210] 72 - -
normal olaf-cu[1-5] 72 - - Only for GPU jobs
long olaf-cu[1-5] 336 - - Only for GPU jobs / Long jobs
jepyc jepyc[01-20] - - - Only for GPU jobs
HQmem HQmem[01-04] - - -

* 신규 장비 도입시 삭제될 예정

03
Olaf SW 정보

컴파일러 및 라이브러리 모듈

구분 항목 (이름/버전)
OS Cent OS 7.8 Rocky 8.6
컴파일러 gcc/7.5.0
gcc/9.3.0
gcc/11.2.0
intel/19.5.281
pgi/20.9
gcc/8.4.0
gcc/10.2.0
intel/18.5.274
intel/20.4.304
pgi/23.5
gcc/8.5.0
gcc/11.2.0
intel/2021.2.0
intel/2022.0.2
pgi/23.5
gcc/9.3.0
go/1.22.0
intel/2021.3.0
intel/2022.2.1
MPI impi/18.5.274
impi/19.5.281
openmpi/3.1.4
openmpi/4.0.5
openmpi/4.1.4
impi/18.5.275
impi/20.4.304
openmpi/3.1.6
openmpi/4.1.1
impi/2021.1.1
impi/2021.3.0
impi/2021.7.1
openmpi/4.1.4
impi/2021.2.0
impi/2021.5.1
openmpi/4.1.1
라이브러리 blas/3.8.0
cuDNN/8.4.0
fftw/3.3.8
fitk/1.3.5
geos/3.10.3
cudatoolkit/10.2
cudatoolkit/11.1
cudatoolkit/11.7
cudatoolkit/8.0
imageJ/1.15
jasper/2.0.22
phenix/1.19.2
parallel/20210082
utils/default
wxWidgets/3.1.4
boost/177.0
fftw/2.1.5
fitk/1.3.3
gdal/3.5.0
cudatoolkit/10.0
cudatoolkit/11.0
cudatoolkit/11.3
cudatoolkit/11.8
hdf5/1.14.3
gsl/2.5
petsc/3.15.0
proj/8.2.1
sqlite/3.39.0
wxWidgets/3.0.2
blas/3.11.0
boost/1.81.0
cudatoolkit/11.8
fftw/3.3.8
gsl/2.7.1
hdf5/4.2.15
jasper/3.0.6
lapack/3.11.0
libxml2/2.11.4
mpfr/4.1.0
petsc/3.18.2
readline/8.2
trilinos/13.4.1
zlib/1.2.11
boost/1.65.1
cudatoolkit/11.7
cudatoolkit/12.2
fftw/3.3.10
hdf4/4.2.14
hdf5/1.12.1
jpeg/9e
libtirpc/1.3.3
npc/13.1
pcre2/10.42
netcdf/1.12.3
sqlite/3.43.0
ucx/1.13.1
소프트웨어 anaconda3/2020.11
chimera/1.15
clang/6.0.1
coot/0.9.6.2
eman2/2.9
gaussview/gv61
ghostscript/9.50
git-lfs/3.4.1
MotionCOR/1.4.0
orca/4.2.0
pbs/default
python/3.6.10
qchem/6.0.1
root/6.18.04
singularity/3.8.2
vmd/1.9.3
Aretomo/1.3.3
chimerax/1.1
cmake/3.18.4
dynamo/1.1.532
gaussian/g16.c02
gantomatch/0.53
git/2.38.0
julia/1.6.0
netcdf/4.4.1.1
orca/5.0.3
python/2.7.17
python/3.7.2
R/4.0.5
singularity/3.8.0
spack/0.20.0
vmd/1.9.4a
anaconda/23.09.0
bison/3.8.2
charm/7.0.0
cmake/3.28.1
curl/7.88.1
dlf6/2.8.1
gaussian/g16.c01
git-lfs/3.5.1
go/1.22.0
miniconda/23.1.0
ncview/2.1.10
openssl/1.1.1g
python/3.7.2
python/3.9.16
qchem/6.0.1
R/4.0.5
relion-gpu/4.0.0
singularity/4.1.1
wgrib2/3.1.1
baqel/1.2.2
bzip2/10.8
cmake/3.18.4
curl/7.61.1
difx/2.6.3
flex/2.6.4
ghostscript/10.02.1
gmp/6.2.1
isl/0.24
ncurses/6.4
netcdf/4.8.1
openfoam/v2006
python/3.6.10
python/3.8.16
protein-miniconda/23.1.0
qmcpack/3.16.0
relion/4.0.0
root/6.26.10
wgrib/1.8.2
xz/5.4.2
04
작업스케줄러 정보

01기본 명령어 요약

명령어 내용
$ sbatch [옵션…] 스크립트 작업 제출
$ scancel 작업ID 작업 삭제
$ squeue 작업 상태 확인
$ sinfo [옵션] 노드 정보 확인

02Sinfo

Slurm 노드 및 파티션 정보를 조회
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 5 mix olaf-cu[1-5]
cryoem* up infinite 5 mix olaf-cu[1-5]

03Sbatch

$ sbatch ./job_script.sh

[작업스크립트 예제]

#!/bin/sh
#SBATCH -J test # 작업 이름
#SBATCH -p cryoem # partition 이름
#SBATCH -N 2 # 총 필요한 컴퓨팅 노드 수
#SBATCH -n 2 # 총 필요한 프로세스 수
#SBATCH -o %x.o%j # stdout 파일명 ({작업이름}.o{작업ID})
#SBATCH -e %x.e%j # stderr 파일명 ({작업이름}.e{작업ID})
#SBATCH –time 00:30:00 # 최대 작업 시간
#SBATCH –gres=gpu2 # GPU 사용을 위한 옵션
Srun ./run.x # 실제 수행될 명령줄 입력

04Squeue

제출된 작업 목록 및 정보 조회 명령어
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1327 debug Relion_c crayadmi R 19:30:41 1 olaf-cu4
1328 debug Relion_c crayadmi R 19:28:06 1 olaf-cu3
1329 debug Relion_c crayadmi R 19:25:47 1 olaf-cu1
1330 debug Relion_c crayadmi R 19:25:47 1 olaf-cu2
1344 debug Relion_c crayadmi R 17:15:17 1 olaf-cu5
1358 cryoem cryospar ibsuser R 14:25:46 1 olaf-cu5

[제출된 작업 상세 조회]

Scontrol 명령어를 이용하면 제출된 작업의 상세내역을 조회할 수 있습니다.
$ scontrol show job [작업 ID]

$ scontrol show job 1327
JobId=1327 JobName=Relion_case6_olaf-cu4
UserId=… GroupId=… MCS_label=N/A
Priority=4294901439 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=19:48:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2020-12-21T12:49:40 EligibleTime=2020-12-21T12:49:40
AccrueTime=2020-12-21T12:49:40
StartTime=2020-12-21T12:49:40 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-21T12:49:40
Partition=debug AllocNode:Sid=olaf1:72181
ReqNodeList=olaf-cu4 ExcNodeList=(null)
NodeList=olaf-cu4
BatchHost=olaf-cu4
NumNodes=1 NumCPUs=52 NumTasks=52 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=52,node=1,billing=52
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./run_case6_4GPU.sh
WorkDir=…
StdErr=…
StdIn=…
StdOut=…
Power=
TresPerJob=gpu:4
TresPerNode=gpu:4
MailUser=(null) MailType=NONE

05Scancel

제출된 작업 수행을 취소합니다.
$ scancel [작업 ID]