Data management through distributed storage for scientific data (dCache)
Data handling
Nordugrid ARC middleware supports various access protocols such as ftp, gsiftp, http, https, httpg, dav, davs, ldap, srm, root, rucio, and s3. It uses cache for input data and can optimize transfers, using only one transfer for jobs using same dataset, and more.
Storing data on remote dCache
- It is available for SLING users, members of gen.vo.sling.si and other VOs.
- Suitable for jobs standard input and output data as a temporary storage (ARC), with limited quota.
- The default setup is not appropriate for confidential unencrypted data, members of same VO can read data of other members.
- Short and long term storage on dCache server and pools within SLING.
- No backup!
- More details of HPC Vega data storage solutions at link
ARC client provides commands for direct handling of data, documentation and useful commands are found here.
S3 Object Storage
HPC Vega is offering object storage. To obtain credentials, OpensSack client is needed. For data management any S3 client should work, i.e. s5cmd, libs3, or boto3. Users of Vega HPC can use client on login nodes. Initial user quota is set to 100GB.
Obtaining key and secret for accessing project in S3 object storage:
openstack --os-auth-url https://keystone.sling.si:5000/v3 --os-project-domain-name sling --os-user-domain-name sling --os-project-name <project_name> --os-username <user_name> ec2 credentials create
Environment variables:
OS_AUTH_URL=https://keystone.sling.si:5000/v3
OS_PROJECT_NAME=<project_name>
OS_PROJECT_DOMAIN_NAME=sling
OS_USER_DOMAIN_NAME=sling
OS_IDENTITY_API_VERSION=3
OS_URL=https://keystone.sling.si:5000/v3
OS_USERNAME=<user_name>
Command for obtaining key and secret:
openstack ec2 credentials create
s5cmd Client
Data transfer with client.
Environment variables such as aws_access_key_id
and secret_access_key
are stored within ~/.aws/credentials
file.
mkdir ~/.aws
chmod 700 ~/.aws
touch ~/.aws/credentials
chmod 600 ~/.aws/credentials
cat >~/.aws/credentials <<EOF
[default]
aws_access_key_id = <access>
aws_secret_access_key = <secret>
EOF
List the content:
s5cmd --endpoint-url https://ceph-s3.vega.izum.si ls
Create bucket:
s5cmd mb mybucket01
Check if bucket is created:
s5cmd head s3://mybucket01/
Copy file into bucket:
s5cmd --endpoint-url https://ceph-s3.vega.izum.si cp <data> s3://mybucket01/
Download file(s) from bucket:
s5cmd cp s3://mybucket01/data01.tar.gz .
Remove file(s) from bucket:
s5cmd rm s3://mybucket01/data01.tar.gz
More commands at s5cmd.