World Community Grid - View Thread - Stats and Device weirdness

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: Stats and Device weirdness

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 16

[ ]

Author

This topic has been viewed 3851 times and has 15 replies

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Stats and Device weirdness

I am currently running two physical machines and get short stretches of time (up to 12 hours each) on six virtual machines that are re-created each time. Those virtual machines are downloading, completing, and uploading results but at any time only one of them is showing up in my results status list. Exactly which of the six varies but it's only ever the two physical machines and one virtual, and all of the work done by the other rigs, including pending and valid work, just vanishes.
This is very odd.
Each of the six machines is configured identically (copy and paste configs) and there are no errors showing up in the system logs, they just don't all show up in the stats.

----------------------------------------

Currently being moderated under false pretences

[Aug 3, 2020 11:06:29 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Stats and Device weirdness

"Each of the six machines is configured identically (copy and paste configs)"

You need to load each VM with their own boinc config or you risk conflict and possibly see things as detach/result lost and more. (each client uses a connect counter to ensure that joe is really joe. If there are 2 identical twin joe's and the second connects with a counter off from first, assigned work gets dismissed)

Best is you suppress the network name in cc_config with the <suppress_net_info>1</suppress_net_info>
option so you can easily identify who's who on the Result Status pages where then the device ID is listed instead of the host name.

----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 3, 2020 11:16:16 AM]

[Aug 3, 2020 11:13:50 AM]

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:


Re: Stats and Device weirdness

each VM installs BOINC from scratch at each run. The VM itself has identical configs EXCEPT with a different host name on each one.
I'll try changing that and letting it go back to automatic host names.

----------------------------------------

Currently being moderated under false pretences

[Aug 3, 2020 11:19:46 AM]

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:


Re: Stats and Device weirdness

Looking at what's happening it appears the system is considering all six VMs to be the same machine and just keeps changing the name whenever one contacts the server. The same six work units are listed but each time a different VM connects it just changes the name of the machine they're assigned to.

edit: I'll just reset the lot and let them expire, then start fresh tomorrow and see what happens.

----------------------------------------

Currently being moderated under false pretences

----------------------------------------
[Edit 1 times, last edit by Dark Angel at Aug 3, 2020 11:33:41 AM]

[Aug 3, 2020 11:32:10 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Stats and Device weirdness

Not how I would do it, but at least, if not already done, have it set to report results immediately and abort any unfinished tasks, then communicate one more time to clear the queue before dismounting.

Somewhere there's a description of Deep Freeze as a methodology and certainly read of people restoring their backups of images so as to continue where they were whenever one or the other resource is available again for a crunch resume. Saw someone who does that with his rendering farm. Each image pegged for a specific device, quasi lossless, but for maybe one or the other task that goes overdue when the crunching breaks take too long, running bufferless probably best to reduce that risk to the minimum.

[Aug 3, 2020 11:34:49 AM]

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:


Re: Stats and Device weirdness

I've changed the setting to priority zero ie only downloads one unit per core to crunch at a time with no buffer rather than running even a small cache.

The reason I'm finding this current behaviour weird is because it was working fine for a while.

----------------------------------------

Currently being moderated under false pretences

[Aug 3, 2020 9:22:28 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Stats and Device weirdness

Presume with 'priority zero' you mean the Resource Share aka Project Weight which is default 100. Leaving it at that and setting the "Store at least 0 of work" and "Store up to an additional 0 days of work" would achieve the same thing, with the advantage that BOINC would fetch a new task several minutes before the most imminent one is about to finish i.e. a little anticipation. With zero resource you'd see a little idling of 1 thread as there is a delay in fetching a new task.

[Aug 3, 2020 10:05:20 PM]

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:


Re: Stats and Device weirdness

Yes, that's what I meant. Regardless the WCG system is still considering all six VMs to be the same machine with an ever changing name. It wasn't doing this before and I haven't had this issue on any other project that I've used these on, so I can only assume it's something in the WCG database back end, possibly a hard limit on the number of machine IDs allowed per person. I've been on this project off and on for fifteen years and built and rebuilt many rigs in that time.

----------------------------------------

Currently being moderated under false pretences

[Aug 3, 2020 10:13:28 PM]

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:


Re: Stats and Device weirdness

Never mind, I'll just put them to work on a GPU project and leave my physical rigs running here.

----------------------------------------

Currently being moderated under false pretences

[Aug 3, 2020 10:27:04 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Stats and Device weirdness

I dare say not. IBM themselves runs many thousands under one member ID. I'm just sure your method of set up is creating a conflict making it impossible for the servers to see the difference between the various installs. Think techs best look at this from the back end. The network suppression line in cc_config would certainly make it easier to see who's who having what.

----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 3, 2020 10:34:32 PM]

[Aug 3, 2020 10:31:10 PM]

[ ]