| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 10
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi there,
I'm a new person in WCG community but I have got some BOINC experience over 9 months of cruching. Unfortunately today I've realized the version of BOINC server software used by the WCG is prone to the already known "client detached" bug. In result, I've just lost several WUs crunched by one of my machines. Briefly what happens: "Client detached" bug occurs when one crunches on machines without access to the network and needs to download and upload WUs from several cruching clients through one, networked machine. Then sometimes the BOINC server mixes up different clients. For instance: We have 2 computers, 1 with Internet access and 1 without. We proceed as follows: 1) Client A on Machine A dowloads WUs 2) Client A goes to Machine B (with no network) and starts crunching 3) Client B on Machine B downloads WUs Normally it would be that but - because of the bug - we have: 4) BOINC Server sees clients A and B as the one. It doesn't see the WUs for the client A so it says "client detached" and kills the WUs which are already being crunched. Because of this bug the crunching on computers without Internet access is pretty limited. According to my experience, this bug does not happen all the time - but I don't know why it happens and why not. According to one of the BOINC project managers I have contacted, the whole mess is caused by a bug existing in some versions of BOINC server software. As far as I know several BOINC projects are prone to the same problem (e.g. QMC) but another ones (like POEM, Enigma) seem to be perfectly stable. My question is: could you use a better version of the software or force the BOINC developers to fix this bug? This is pretty sad to see your perfectly fine WUs killed because of the server bug. What is more, it successfully limits the crunching power designated for WCG (few people want to risk and then watch all the tasks to see of they got an error or not...). Thanks for your co-operation and take care! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1) Client A on Machine A dowloads WUs 2) Client A goes to Machine B (with no network) and starts crunching 3) Client B on Machine B downloads WUs Of course there should be: 3) Client B on Machine A downloads WUs |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
My question is: could you use a better version of the software or force the BOINC developers to fix this bug? There was a change 10 days ago that should hopefully fix this, but not sure if any projects has updated yet. WCG haven't upgraded for many months, so doesn't include this fix yet. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I've seen 4 of the same or related cases which had little to do with the server software version:
----------------------------------------1. Client was cloned so no unique identity is found, 2 or more client competing. 2. The client was copied to sneakernet, but managed to connect to servers anyhow. 3. Sneaker net but the host admin is not properly executed. 4. Account manager. For sneaker-netting I use unique installs on the machine with unique data-dirs of course, so can flip flop user and client without any trouble. Of course WCG then sees unique Host ID's when being contacted. Mind u never tried this except for WCG. Far as I know updating the server side software is on the program or in progress. Edit: I use suppressed network ID in config, which may a be critical difference!
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Aug 7, 2009 2:50:36 PM] |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
I've seen 4 of the same or related cases which had little to do with the server software version: 1. Client was cloned so no unique identity is found, 2 or more client competing. 2. The client was copied to sneakernet, but managed to connect to servers anyhow. 3. Sneaker net but the host admin is not properly executed. 4. Account manager. You've forgotten 5: User restored backup-copy of BOINC. It's unclear if the scheduler-change will have any effects for these clients or not... Far as I know updating the server side software is on the program or in progress. I've seen it mentioned, but don't remember any timeline on this... ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Yes, 5, was an afterthought, but then I've never done that except for going off-line before the backup and not go back online until restore complete ;>)
----------------------------------------Maybe knreed reads this and drops in a revealing word ![]() edit (and then regrets later, with heightened expectations, that he'd said anything ;>)
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Aug 7, 2009 4:19:36 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1) Client A on Machine A dowloads WUs 2) Client A goes to Machine B (with no network) and starts crunching 3) Client B on Machine B downloads WUs Of course there should be: 3) Client B on Machine A downloads WUs If I am reading this correctly (and I believe I am), then you are the victim not of a bug but of a feature. World Community Grid is intelligently identifying client A and client B as being a single device - namely machine A. For most purposes, this is useful. Sadly, it does not support the scenario you are attempting. So, Ingleside's "BOINC fix" is irrelevant, since this behaviour is by design. There may be a workaround, but generally speaking, sneakernet setups only work by accident. BOINC certainly isn't designed with sneakernet in mind. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Allow me to resurrect this very old topic hopefully to find some new information.
I do the sneakernet thing with Einstein and Boinc on one machine pair setup. essentially you transfer the 'data' directory from offnet to on net machine to upload / download new etc, then transfer it back to offnet to work, then restore on net directory to continue working it on that machine. With WCG this does not seem to work, you can transfer the directory with just the data, or the entire boinc install directory with data and it will kill the wu's as being detatched or duplicated. Does anyone know of a way I can run WCG on a non net connected machine and sucessfully transfer the data up to the server w/o it killing or duplicating the data on the transferring machine. Im thinking it has something to do with the machine ID but don't understand the entire process enough to make an educated guess at it. Id appreciate any advice / info I can get as I have several machines Id love to run WCG on that are not able to net connect. Aaron |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1410 Status: Offline Project Badges:
|
Special for Aaron:
For you as young investigator there should something to do, so I'll give you some stones on the way to heaven. Don't step in between. For every kind of combination OS/x86/x64 you should make an extra USB-stick. I tested it with Windows 64bit. Starting on the machine with the internet where you have stopped Boinc (best finished all the work) and made a backup of the Boinc Data directory. Install Boinc version boinc_5.10.45_windows_x86_64 on a USB-stick with the command boinc_5.10.45_windows_x86_64 /a It will create 2 folders: program files and windows. Delete the windows folder (just for the screensaver). Copy all the other files and the locale folder to a directory on the USB-stick e.g. G:\BOINC In the cc_config.xml at least the option <data_dir>/BOINC</data_dir> should be (keeping program and data together) Start Boinc from that directory with the command: boinc.exe --detach You could use the Boinc Manager or better BoincTasks to attach to WCG. I used an almost empty "account_www.worldcommunitygrid.org.xml" file: <account> <master_url>http://www.worldcommunitygrid.org/</master_url> <authenticator>your secret hex number</authenticator> <project_name>World Community Grid</project_name> </account> Now the most important part: WCG created for me a new host. If not: (techniciens don't read or give us the tool to merge/delete old hosts) After stopping boinc with the command "boinccmd --quit" edit the client_state.xml file replace the WCG host_id with an old unused host id of your own or a fake one. After starting boinc again a new host-cpid will be generated. I let 4 c4cw tasks run to the first checkpoint and put the USB-stick from a Vista64 into a Win7-64 machine. There I copied the Boinc-directory to a E: partition, because writing the checkpoints to HD is a bit faster. After finishing you can copy the whole directory to your internet-machine. Stop there the Boinc version. Start boinc from usb and report/ask work. Good luck and don't forget your sneakers! An older investigator ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thank you crystal. I did something similar to that... almost.. and it invalidated all the tasks on one of the machines, and when the other finished it invalidated them too.
Ill definately give your instructions a try and hope it works. Thanks again for taking the time to assist me. Aaron |
||
|
|
|