Transfering files between the marvin cluster and other cloud systems

 

The commands; rclone - rsync are tool used for cloud storage management. It is a command line to sync files and directories with cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. The command rclone can be invoked in one of three different modes:

  • Copy; mode to just copy new/changed files.

  • Sync; mode to make a directory identical, only works in one direction.

  • Check mode to check for file hash equality.

 

This command (rclone) is available on Marvin cluster, the module is called rclone .

 

 

Configuration for transferring files to/from the UPF Google Drive storage  

A few notes about this storage:

Through a partnership between the UPF and Google for Education, U faculty, staff and students have access to unlimited storage in Google Drive.

For more secure storage and transfer options pertaining  to storing sensitive data, and personal information, consult this link.

 

Step 1:

 

Login to Marvin:

 

$ ssh [username]@marvin.s.upf.edu

 

Step 2:

 

Get a interactive session:

 

$ interactive

 

Step 3:

 

Load rclone module:

 

$ module load rclone

 

Step 4:

 

Configuring rclone and setting up remote access to your Google drive, using command:

 

$ rclone config

 

You can select one of the options (here we show how to setup a new remote)

 

2017/04/24 10:21:00 Config file "/homes/users/test/.rclone.conf" not found - using defaults

No remotes found - make a new one

n) New remote

s) Set configuration password

q) Quit config

           n/s/q> n

 

You enter n for a new remote connection and give it a name. (whatever you want)

 

name> upf_drive

 

Then you choose the type of storage for which you are setting up the remote (here we show the method for setting up a remote for google drive which is option 7).

 

Type of storage to configure.

Choose a number from below, or type in your own value
 1 / Alias for a existing remote
   \ "alias"
 2 / Amazon Drive
   \ "amazon cloud drive"
 3 / Amazon S3 (also Dreamhost, Ceph, Minio, IBM COS)
   \ "s3"
 4 / Backblaze B2
   \ "b2"
 5 / Box
   \ "box"
 6 / Cache a remote
   \ "cache"
 7 / Dropbox
   \ "dropbox"
 8 / Encrypt/Decrypt a remote
   \ "crypt"
 9 / FTP Connection
   \ "ftp"
10 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
11 / Google Drive
   \ "drive"
12 / Hubic
   \ "hubic"
13 / Local Disk
   \ "local"
14 / Microsoft Azure Blob Storage
   \ "azureblob"
15 / Microsoft OneDrive
   \ "onedrive"
16 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
17 / Pcloud
   \ "pcloud"
18 / QingCloud Object Storage
   \ "qingstor"
19 / SSH/SFTP Connection
   \ "sftp"
20 / Webdav
   \ "webdav"
21 / Yandex Disk
   \ "yandex"
22 / http Connection
   \ "http"

Storage> 11

 

Then you see a few messages like the ones below:

 

Google Application Client Id - leave blank normally.

client_id> (just press enter key here)

 

Google Application Client Secret - leave blank normally.

client_secret> (just press the enter key here)

 

Now you need to choose what scope rclone should use when request access from drive.

 

 

Scope that rclone should use when requesting access from drive.
Choose a number from below, or type in your own value
 1 / Full access all files, excluding Application Data Folder.
   \ "drive"
 2 / Read-only access to file metadata and file contents.
   \ "drive.readonly"
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ "drive.file"
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ "drive.appfolder"
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ "drive.metadata.readonly"

scope>

 

Then you see a few messages like the ones below:

 

 

ID of the root folder - leave blank normally.  Fill in to access "Computers" folders. (see docs).
root_folder_id> (just press enter key here)


Service Account Credentials JSON file path  - leave blank normally.
Needed only if you want use SA instead of interactive login.
service_account_file> (just press enter key here)

 

Now since you are remotely accessing the cluster you have to select remote config i.e. option n

Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n

 

You will see a message similar to the one below:

 

If your browser doesn't open automatically go to the following link: https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=202264815644.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.readonly&state=f0a4c9e22a7707333de83f36c387ba00
Log in and authorize rclone for access
Enter verification code>

 

You have to open this url in your workstation system browser and authenticate your Google drive options. Once that is done you will get a screen that displays a password/ verification code.

Type or copy this key from the browser and paste it into the terminal. Once the terminal accepts the verification code it will display the options below, choose one:

 

Configure this as a team drive?
y) Yes
n) No
y/n>

 

  • If you selected yes

Fetching team drive list...
Choose a number from below, or type in your own value
 1 / SIT
   \ "0ADI4d_YZU-TuUk9PVA"
Enter a Team Drive ID> 1
--------------------
[upf_drive]
type = drive
client_id =
client_secret =
scope = drive.readonly
root_folder_id =
service_account_file =
token = {"access_token":"ya29.GlspBmbrC3QiaSDTC9YeZumBpXc9qbM

DHrgV6i3WgvFVBH1l-HKDlLoXU4jCtFWzcXfsZeSoAFcm5sts-X_2aOi5tpY4RtUIIK3icLn_Hl9FdZ8YyqJktqwyUNYu","token_type":"Bea

rer","refresh_token":"1/GjNBeQid1EVm4cx4yRoVfqu7ospo_IVwnm96vs

1Q0tM","expiry":"2018-10-01T12:03:08.805662852+02:00"}
team_drive = 0ADI4d_YZU-TuUk9PVA
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote

 

  • If you select no:

[upf_drive]
type = drive
client_id =
client_secret =
scope = drive.readonly
root_folder_id =
service_account_file =
token = {"access_token":"ya29.GlspBts8IvCgbt4IPAWvGpRrDnkQ1XL

P4BwyVowCoFrhB3P_CbO53VQuC9eizuVFfwGhEX61L_NfKDI3nuVa

fvVWWsLsxliAqDhA_ncstM9RcBl4_dn7eWDS2eh","token_type":"Bear

er","refresh_token":"1/6sr5ggELtnj7mVOMGDJl7_WaiLHT-mA8vgVtHA0hEzs","expiry":"2018-10-01T12:12:06.184548815+02:00"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d>

 

You can select y if everything seems okay with the remote or you can edit the same.

You can also view the current existing remotes.

 

Current remotes:

Name                 Type
====                 ====
upf_drive            drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>

 

Configuration for transferring files to/from Dropbox

 

A few notes about this storage system:

For security information including storing sensitive data, and personal information, consult this link.

 

Step 1:

 

Login to Marvin:

 

$ ssh [username]@marvin.s.upf.edu

 

Step 2:

 

Get a interactive session:

 

$ interactive

 

Step 3:

 

Load rclone module:

 

$ module load rclone

 

Step 4:

 

Configuring rclone and setting up remote access to your dropbox, using command:

 

$ rclone config

 

You can select one of the options (here we show how to setup a new remote).

 

2017/04/24 10:21:00 Config file "/homes/users/test/.rclone.conf" not found - using defaults

No remotes found - make a new one

n) New remote

s) Set configuration password

q) Quit config

n/s/q> n

 

You enter n for a new remote connection and give it a name. (whatever you want)

 

name> my_dropbox

 

Then you choose the type of storage for which you are setting up the remote (here we show the method for setting up a remote for dropbox which is option 4).

 

Type of storage to configure.

Choose a number from below, or type in your own value
 1 / Alias for a existing remote
   \ "alias"
 2 / Amazon Drive
   \ "amazon cloud drive"
 3 / Amazon S3 (also Dreamhost, Ceph, Minio, IBM COS)
   \ "s3"
 4 / Backblaze B2
   \ "b2"
 5 / Box
   \ "box"
 6 / Cache a remote
   \ "cache"
 7 / Dropbox
   \ "dropbox"
 8 / Encrypt/Decrypt a remote
   \ "crypt"
 9 / FTP Connection
   \ "ftp"
10 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
11 / Google Drive
   \ "drive"
12 / Hubic
   \ "hubic"
13 / Local Disk
   \ "local"
14 / Microsoft Azure Blob Storage
   \ "azureblob"
15 / Microsoft OneDrive
   \ "onedrive"
16 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
17 / Pcloud
   \ "pcloud"
18 / QingCloud Object Storage
   \ "qingstor"
19 / SSH/SFTP Connection
   \ "sftp"
20 / Webdav
   \ "webdav"
21 / Yandex Disk
   \ "yandex"
22 / http Connection
   \ "http"

Storage> 7

 

The command line will show the following commands:

 

Dropbox App Key - leave blank normally.

app_key> (just press enter key here)

Dropbox App Secret - leave blank normally.

app_secret> (just press enter key here)

Remote config

Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n> n
For this to work, you will need rclone available on a machine that

has a web browser available.
Execute the following on your machine:
    rclone authorize "dropbox"
Then paste the result below:
result>

{"access_token":"UokS4PXp5QAAAAAAAAAAC5A6MIh1

gpDOc8nwoNtXFLaHSBS7c5KecIbI6itK84","token_type":

"bearer","expiry":

"0001-01-01T00:00:00Z"}

 

 

 

In order to get the result you need to installe the rclone in the local terminal. So, open a new window terminal and change the directory to the tempotal.

$ cd tmp

Then, open the following link and copy the link path of the corresponding software: https://rclone.org/downloads/

 

 

To intalle the rclone you firts need to load the files using wget and then unzip the file.

$ wget linkpath

$ unzip filename

 

Now you need to change you directory to the unzip folder created.

After changing your directory you have to execute the following command:

$ ./rclone authorize "dropbox"

 

You have to open this url in your workstation system browser and authenticate your Dropbox options. Once that is done you will get a screen that displays a password/ verification code in your local terminal.

Type or copy this key from the browser and paste it into the marvin terminal. Once the marvin terminal accepts the verification code it will display the options below, choose one:

 

--------------------
[dropbox_1]
type = dropbox
client_id =
client_secret =
token = {"access_token":"UokS4PXp5QAAAAAAAAAAC

5A6MIh1gpDOc8nwoNtXFLaHSBS7-c5KecIbI6-itK84","token_type":"bearer","expiry":"0001-01-01T00:00:00Z"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

 

 

You can select y if everything seems okay with the remote or you can edit the same.

 

You can also view the current existing remotes.

 

 

 Examples to transfer files using rclone and Google Drive

 

The following commands are useful tools for transfering files using rclone with Google Drive (seen here as upf_drive), although these commands also work with other clouds systems.

 

1)List the drive’s directory:

 

rclone lsd upf_drive:[path]

 

2) Copy files from marvin to drive:

 

rclone copy [marvin path] upf_drive:[path drive]

 

3)Copy files from drive to marvin:

 

rclone copy upf_drive:[path drive] [marvin path]

 

4) Backup marvin home directory to drive:

 

DATE=‘date +'%d-%m-%Y_%H:%M:%S'’

rclone copy ~ upf_drive:backups/mavin/home/$DATE

 

5) Synchronizing the home directory and a copy of the home

directory on google drive:

 

rclone sync ~ upf_drive:syncs/marvin/home

 

6) Synchronizing the home directory and a copy of the home

directory on google drive but with a 10MB/s limit of bandwith:

 

rclone sync ~ upf_drive:syncs/marvin/home --verbose=1 --bwlimit 10M

 

7) Synchronizing the home directory and a copy of the home

directory on google drive but with a bandwidth limit within certain

time slots: specifically from 8am to 10am the limit is 20MB/s, 10am

to 18pm the limit is 5MB/s while the outside those times it is

unlimited:

 

rclone sync ~ upf_drive:syncs/marvin/home --verbose=1 --bwlimit "08:00,10M 10:00,5M 18:00,off"

IT IS RECOMMENDED TO ALWAYS PUT THESE LIMITS.

 

More info:

 

https://rclone.org/