Skip to content

In the cloud

Jose Manuel Martí edited this page May 31, 2024 · 3 revisions

Running GENTANGLE in the cloud

Rationale

As we have indicated in requirements, GENTANGLE should only be installed on a Linux machine. While there are ways to install the software on Windows or MacOS machines, these approaches are not recommended and not supported due to the heavy dependencies on third party virtual machine software that must first be installed.

Understanding that some labs may only have immediate access to Windows or MacOS machines, we identified cloud computing as a viable option for Windows/MacOS users. There are many cloud computing services, which provide free trial access to Linux computing resources which allow for free confirmation and evaluation of the GENTANGLE software. Cloud computing also works as a longer term low cost option beyond the fairly extensive free trial periods, should Windows/MacOS labs choose to run the GENTANGLE workflow.

To confirm ready, straight forward access, we used Google’s cloud computing, which offers a $300 credit for first time users. The compute node we procured costs < 50 cents per day and will allow users to run extensive evaluation at no cost for up to 3 months. Many other cloud services provide equivalent resources, including Microsoft, AWS and others and our intent is to not direct users to one specific commercial provider. We re-confirmed that the GENTANGLE software can be downloaded and run using cloud services and provide additional details on the steps to take below.

Procedure for Google Cloud

As with any containerization technology, administrative privileges are required for the installation —not the execution, which is rootless. The a Linux virtual machine (VM) in the cloud offers an easy way to fulfill that basic requirement.

Create the Linux VM instance

For Google Cloud you would start by creating a Linux VM instance You can start with the instructions here: https://cloud.google.com/compute/docs/create-linux-vm-instance

We show a console screenshot of an example configuration with Google Cloud after clicking on the “CREATE INSTANCE” button. Selecting a standard compute node (EC2), use the default Linux option, Debian GNU/Linux 12 (bookworm):

Console in Google Cloud

Format new 500GB storage device

We have named the device as gendisk2. The steps are the following:

sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/disk/by-id/google-gendisk2
sudo mkdir /mnt/disks
sudo mkdir /mnt/disks/gendisk2
sudo mount -o discard,defaults /dev/disk/by-id/google-gendisk /mnt/disks/gendisk2
sudo chmod a+w /mnt/disks/gendisk2

The next screenshot shows adding the persistent 500GB storage device to store data: Adding a new disk

Check the cost estimate for running the instance

Monthly estimate

Install Singularity and obtain GENTANGLE

The steps are the following:

wget https://github.com/sylabs/singularity/releases/download/v4.1.3/singularity-ce_4.1.3-focal_amd64.deb
sudo apt install ./singularity-ce_4.1.3-focal_amd64.deb
singularity pull gentangle.sif library://khyox/gentangle/gentangle.sif

Setup git and pull DATANGLE

The steps are the following:

sudo apt install git
sudo apt-get install git-lfs
git clone https://github.com/BiosecSFA/datangle.git

Next steps

From here you can follow the tutorial starting here to run GENTANGLE and experiment with the individual software modules: Running the pipeline

Clone this wiki locally