World Community Grid - View Thread - Run Agent on Cloud Linux VM?

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Run Agent on Cloud Linux VM?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 49

[ ]

Author

This topic has been viewed 7683 times and has 48 replies

zolople
Cruncher
Spain
Joined: Apr 25, 2020
Post Count: 8
Status: Offline
Project Badges:

20 year badge for Mapping Cancer Markers

180 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: crunching in Google Cloud or IBM cloud?

you can use a GPU (activate in menu Edit / book Configuration)

008: 02-Aug-2020 13:08:04 [---] OpenCL: NVIDIA GPU 0: Tesla T4 (driver version 418.67, device version OpenCL 1.2 CUDA, 15080MB, 3968MB available, 16282 GFLOPS peak)
007: 02-Aug-2020 13:08:04 [---] CUDA: NVIDIA GPU 0: Tesla T4 (driver version 418.67, CUDA version 10.1, compute capability 7.5, 4096MB, 3968MB available, 16282 GFLOPS peak)

----------------------------------------

[Aug 2, 2020 5:59:29 PM]

Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3295
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project


Re: crunching in Google Cloud or IBM cloud?

Got it:

008: 02-Aug-2020 19:01:12 [---] OpenCL: NVIDIA GPU 0: Tesla K80 (driver version 418.67, device version OpenCL 1.2 CUDA, 11441MB, 4007MB available, 4111 GFLOPS peak)
007: 02-Aug-2020 19:01:12 [---] CUDA: NVIDIA GPU 0: Tesla K80 (driver version 418.67, CUDA version 10.1, compute capability 3.7, 4096MB, 4007MB available, 4111 GFLOPS peak)

Also got this one:

008: 02-Aug-2020 19:10:51 [---] OpenCL: NVIDIA GPU 0: Tesla T4 (driver version 418.67, device version OpenCL 1.2 CUDA, 15080MB, 3968MB available, 16282 GFLOPS peak)
007: 02-Aug-2020 19:10:51 [---] CUDA: NVIDIA GPU 0: Tesla T4 (driver version 418.67, CUDA version 10.1, compute capability 7.5, 4096MB, 3968MB available, 16282 GFLOPS peak)

Gracias

----------------------------------------

AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
AMD Ryzen 7 7730U 8C/16T 3.0 GHz

----------------------------------------
[Edit 1 times, last edit by Mosqueteiro at Aug 2, 2020 7:19:44 PM]

[Aug 2, 2020 7:03:45 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2156
Status: Recently Active
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: crunching in Google Cloud or IBM cloud?

Have you tried Google Colab?

I have the following question.

After the VM session times out, you would have to restart the session. Preferably with all the same BOINC-datafiles that are stored on Google Drive in the Boinc directory, so that tasks that were running don't get lost and can continue from their last checkpoint. How do you restart the session in the correct way, step by step, so that WCG doesn't create a new device for your VM?

[Aug 4, 2020 12:20:59 PM]

zolople
Cruncher
Spain
Joined: Apr 25, 2020
Post Count: 8
Status: Offline
Project Badges:


Re: crunching in Google Cloud or IBM cloud?

We can't force Google to assign the same machine and in fact there are even multiple models, both CPU and GPU (that's why every time it starts it's a benchmark).
When you start again, the data already stored in Google Drive is used, ensuring that these tasks are not lost, even those that have already started will continue from your checkpoint.
There are no problems with the tasks already started and they finish correctly.
The only problem is that many devices appear in our account ... 150 Device Installations appear to me.

----------------------------------------

[Aug 4, 2020 9:43:13 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2156
Status: Recently Active
Project Badges:


Re: crunching in Google Cloud or IBM cloud?

We can't force Google to assign the same machine and in fact there are even multiple models, both CPU and GPU (that's why every time it starts it's a benchmark).

I think I have disabled the GPU, because I'm staying at WCG's side, so to speak.

When you start again, the data already stored in Google Drive is used, ensuring that these tasks are not lost, even those that have already started will continue from your checkpoint.

This is what I do:
When the VM ends, I click Reconnect, then I click 'Connect to hosted runtime', it says Allocating..., Reconnect..., then there is RAM and Disk available according to the icons that are appearing and hovering the mouse over RAM/Disk says: "Waiting for Python 3 backend to finish its current execution." Then the VM seems to restart, because the 'bottom command line' (0-STOP, ..., 2-Get State, ...) appears, but then it appears that I can't start any bottom commands, and clicking 'Runtime' at the top menu also doesn't work. Right next to RAM/Disk there is this down pointing little triangle, when I click that, I see 'Manage sessions', clicking that doesn't do anything anymore. I'm completely at a loss there. I'm stuck.

OK, let's try this again. I'm opening a new browser window for Colab. Now devilish

I can go to Runtime at the Colab menu and click 'Interrupt execution'! Then I can click 'Run all' from Runtime. And now the VM seems to have restarted, I can enter a command like e.g. '2' at the bottom command line and I'm seeing tasks that continue from their latest checkpoints and there is this message at the bottom of the Colab screen: "Automatic saving failed. This file was updated remotely or in another tab. Show diff".

I guess I'm making a mess ... So here's my question. What steps do I need to take to continue tasks from their latest checkpoints after the VM ends?
(Do I need to click Reconnect? Do I need to click 'Connect to hosted runtime'?)
So the VM is running, tasks are executing, then I enter '1' (COMMAND LINE) at the bottom command line, just trying out something, and I don't know how to end that (I seem to have landed in the COMMAND LINE). Wait, RAM/Disk in the upper right corner has disappeared, now there is 'Reconnect'. So I click Reconnect and then Run All (from Runtime). And... a new VM starts. And all my previous tasks are gone. crying

The only problem is that many devices appear in our account ... 150 Device Installations appear to me.

Yes, I'm seeing that, too. Many new devices with tasks that will end in 'No Reply'. sad

----------------------------------------
[Edit 3 times, last edit by adriverhoef at Aug 4, 2020 11:19:21 PM]

[Aug 4, 2020 11:02:00 PM]

zolople
Cruncher
Spain
Joined: Apr 25, 2020
Post Count: 8
Status: Offline
Project Badges:


Re: crunching in Google Cloud or IBM cloud?

I think I have disabled the GPU, because I'm staying at WCG's side, so to speak.

The GPU only needs to be activated if necessary. At WCG, now, it is NOT necessary. If you want to add other projects that need it (GPUGrid, Einstein ...), activate it in menu Edit / notebook settings

Yes, these are communication errors between Colab and the browser.
When the form for giving instructions disappears, the solution is to stop the execution of THE CELL. It is enough to press the icon of the cell itself, (directly with the left button, or right button and the option to interrupt execution) and then start it again. If successful, you have to reconnect and tasks appear without reinstalling the environment.
Only if everything goes very very bad (when the runtime stops and gives error every time I try to start it) I give menu / reset to factory state.
The save error ... just hit CTRL-S to save it manually and remove the error.

I guess I'm making a mess ... So here's my question. What steps do I need to take to continue tasks from their latest checkpoints after the VM ends?
(Do I need to click Reconnect? Do I need to click 'Connect to hosted runtime'?)
So the VM is running, tasks are executing, then I enter '1' (COMMAND LINE) at the bottom command line, just trying out something, and I don't know how to end that (I seem to have landed in the COMMAND LINE). Wait, RAM/Disk in the upper right corner has disappeared, now there is 'Reconnect'. So I click Reconnect and then Run All (from Runtime). And... a new VM starts. And all my previous tasks are gone. crying

For a permanent connection with Google Drive, you must activate Drive, before executing, by clicking on this icon:

You must verify that you are correctly connected to Google Drive. If you don't see a "Drive" folder, a "My Drive" folder inside it and a "Boinc" folder inside it, something is not working.
1) Stop running
2) disconnect Drive
3) wait for it to completely disconnect
4) reconnect it
5) wait for it to completely connected
6) run again

Yes, I'm seeing that, too. Many new devices with tasks that will end in 'No Reply'. sad

The devices have no solution, but ... Those missed tasks shouldn't exist! Are you sure you have Google Drive well connected to the Colab notebook?

----------------------------------------

[Aug 5, 2020 7:00:08 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2156
Status: Recently Active
Project Badges:


Re: crunching in Google Cloud or IBM cloud?

For a permanent connection with Google Drive, you must activate Drive, before executing, by clicking on this icon:

I don't see those three icons:

Uhm, let's try something ... Clicking the 'Files' icon ...
Ha! There's the row of three icons ('Upload to session storage', 'Refresh', 'Mount Drive').

Clicking the 3rd one: "Mounting Google Drive". Succeeded.

Clicking Runtime→Run all ..
... It's executing.

Let's see how this session goes.

Question:
Do you need to have the browser open and connected to Colab to prevent halting notebook execution at all times?

[Aug 5, 2020 11:34:27 AM]

zolople
Cruncher
Spain
Joined: Apr 25, 2020
Post Count: 8
Status: Offline
Project Badges:


Re: crunching in Google Cloud or IBM cloud?

Question: Do you need to have the browser open and connected to Colab to prevent halting notebook execution at all times?

YES! You need to have the browser open! If you close the browser, Google will close the connection in a maximum of 1 hour

I take this opportunity to clarify a question that remained pending:

The script runs Boinc as a daemon, so:
- The "0-stop" option stops Boinc completely. If you start again, it will reinstall everything and start Boinc.
- The "1-command line" option does NOT stop Boinc. It only gives us access to be able to execute code in other cells. If you start the cell where the script is again, it will immediately follow where it was, since Boinc was still running like a daemon and the script only has to read the state and show it.

There are quite a few things that can be done by taking advantage of the command line (in ANOTHER cell), for example, restart WCG on this machine:
! boinccmd --project "http://www.worldcommunitygrid.org" reset

----------------------------------------

[Aug 5, 2020 1:50:04 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: crunching in Google Cloud or IBM cloud?

Any web browser I know I keep open and open goes bloat and bloat and bloat and bloat some more until is eats gigabytes of memory in VM. Last I caught a chrome based browser which took the measly 3.2Gb. Everything went ultra slow, no wonder with all that swapping. Suppose you need one browser to exclusively serve this 'keep google drive open' purpose, without extensions or anything to minimize memory leak build-up.

[Aug 5, 2020 2:23:21 PM]

zolople
Cruncher
Spain
Joined: Apr 25, 2020
Post Count: 8
Status: Offline
Project Badges:


Re: crunching in Google Cloud or IBM cloud?

For the browser it is not directly VM, it is a Google session (or two ... or three ... the limit is ten).
I use a portable Firefox (from portableapps) without add-ons and only use it for this. Now, it has been open for about 13 hours with 20 tabs and is using 2846MB.

----------------------------------------

[Aug 5, 2020 5:49:47 PM]

[ ]