| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 16
|
|
| Author |
|
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 728 Status: Offline Project Badges:
|
I am currently running two physical machines and get short stretches of time (up to 12 hours each) on six virtual machines that are re-created each time. Those virtual machines are downloading, completing, and uploading results but at any time only one of them is showing up in my results status list. Exactly which of the six varies but it's only ever the two physical machines and one virtual, and all of the work done by the other rigs, including pending and valid work, just vanishes.
----------------------------------------This is very odd. Each of the six machines is configured identically (copy and paste configs) and there are no errors showing up in the system logs, they just don't all show up in the stats. ![]() Currently being moderated under false pretences |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"Each of the six machines is configured identically (copy and paste configs)"
----------------------------------------You need to load each VM with their own boinc config or you risk conflict and possibly see things as detach/result lost and more. (each client uses a connect counter to ensure that joe is really joe. If there are 2 identical twin joe's and the second connects with a counter off from first, assigned work gets dismissed) Best is you suppress the network name in cc_config with the <suppress_net_info>1</suppress_net_info> option so you can easily identify who's who on the Result Status pages where then the device ID is listed instead of the host name. [Edit 1 times, last edit by Former Member at Aug 3, 2020 11:16:16 AM] |
||
|
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 728 Status: Offline Project Badges:
|
each VM installs BOINC from scratch at each run. The VM itself has identical configs EXCEPT with a different host name on each one.
----------------------------------------I'll try changing that and letting it go back to automatic host names. ![]() Currently being moderated under false pretences |
||
|
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 728 Status: Offline Project Badges:
|
Looking at what's happening it appears the system is considering all six VMs to be the same machine and just keeps changing the name whenever one contacts the server. The same six work units are listed but each time a different VM connects it just changes the name of the machine they're assigned to.
----------------------------------------edit: I'll just reset the lot and let them expire, then start fresh tomorrow and see what happens. ![]() Currently being moderated under false pretences [Edit 1 times, last edit by Dark Angel at Aug 3, 2020 11:33:41 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not how I would do it, but at least, if not already done, have it set to report results immediately and abort any unfinished tasks, then communicate one more time to clear the queue before dismounting.
Somewhere there's a description of Deep Freeze as a methodology and certainly read of people restoring their backups of images so as to continue where they were whenever one or the other resource is available again for a crunch resume. Saw someone who does that with his rendering farm. Each image pegged for a specific device, quasi lossless, but for maybe one or the other task that goes overdue when the crunching breaks take too long, running bufferless probably best to reduce that risk to the minimum. |
||
|
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 728 Status: Offline Project Badges:
|
I've changed the setting to priority zero ie only downloads one unit per core to crunch at a time with no buffer rather than running even a small cache.
----------------------------------------The reason I'm finding this current behaviour weird is because it was working fine for a while. ![]() Currently being moderated under false pretences |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Presume with 'priority zero' you mean the Resource Share aka Project Weight which is default 100. Leaving it at that and setting the "Store at least 0 of work" and "Store up to an additional 0 days of work" would achieve the same thing, with the advantage that BOINC would fetch a new task several minutes before the most imminent one is about to finish i.e. a little anticipation. With zero resource you'd see a little idling of 1 thread as there is a delay in fetching a new task.
|
||
|
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 728 Status: Offline Project Badges:
|
Yes, that's what I meant. Regardless the WCG system is still considering all six VMs to be the same machine with an ever changing name. It wasn't doing this before and I haven't had this issue on any other project that I've used these on, so I can only assume it's something in the WCG database back end, possibly a hard limit on the number of machine IDs allowed per person. I've been on this project off and on for fifteen years and built and rebuilt many rigs in that time.
----------------------------------------![]() Currently being moderated under false pretences |
||
|
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 728 Status: Offline Project Badges:
|
Never mind, I'll just put them to work on a GPU project and leave my physical rigs running here.
----------------------------------------![]() Currently being moderated under false pretences |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I dare say not. IBM themselves runs many thousands under one member ID. I'm just sure your method of set up is creating a conflict making it impossible for the servers to see the difference between the various installs. Think techs best look at this from the back end. The network suppression line in cc_config would certainly make it easier to see who's who having what.
----------------------------------------[Edit 1 times, last edit by Former Member at Aug 3, 2020 10:34:32 PM] |
||
|
|
|