Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: very short tasks |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 43
|
Author |
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
By reading your post, I came on the idea that the log file could maybe represent the cause of the problem. Could a very big log file cause that boinc does not succeed to manage it correctly? This is probably the easiest culprit to eliminate: the size of the stdoutdae files is more or less constant and BOINC is simply pushing message lines out as needed to fit with this size. So the difference between the log file of a very busy client (like our multicore ones with HCMD2 currently) and a very quiet one like a very slow machine is only the number of days of activity which are kept.Currently the stdoutdae.txt of my quad contains about 15 hours of activity (without checkpoint logging) while that of my eeePC (which I removed from crunching in September 2011 ) contains messages covering 15 days with checkpoint logging! No, I do think that the problem has to do with communication between the client and the server, i.e. when exchanging the client_state.xml files between both after an update request. If latest changes on the server side improve the ability of the server to deal with the big update activity generated by the small grand children HCMD2 WUs currently, maybe these BOINC client failures might be over. Otherwise we will have to wait for the end of the HCMD2 final cleaning, or for a better designed version of the BOINC client which would simply drop/ignore the returned client_state.xml when it is corrupted or incomplete and retry the update request from the beginning. |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
It sounds reasonable.
----------------------------------------For my-self, because of lack of available time, I am not so familiar with the boinc internal mechanisms. Just for info, within 8 hours, boinc generated around 12'000 entries in the event file. Because of ring buffer behaviour, only the 2'000 last entries remain available. Yves |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Bad guess from my part: it just happened again and this time there was no update request in progress, only a mere upload of a HCMD2 result:
----------------------------------------Started upload of CMD2_2175-1NZW_A.clusters..... Can't open client_state_next.xml: fopen() failed Couldn't write state file: fopen() failed; giving up No user activity either and the number of tasks in the cache was the lowest of all these last days, only a little more than 500, and about 30 WUs ready to report. |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: |
No more WUs? Both of my HCMD2 only machines ran dry last night. Message tab says no work available.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
|
||
|
Thargor
Veteran Cruncher UK Joined: Feb 3, 2012 Post Count: 1291 Status: Offline Project Badges: |
Same here, getting no more work-units on one of my HCMD2-only machines - the other downloaded a huge chunk of other WCG units, before I set it to HCMD2-only.
----------------------------------------Also showing the project as down for maintenance, with uploads disabled. |
||
|
Pete Broad
Senior Cruncher Wales Joined: Jan 3, 2007 Post Count: 167 Status: Offline Project Badges: |
I'm still picking the odd one up, I've had 20 or so in the last few hours
----------------------------------------Pete |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
We are still catching up on various backend tasks due to our problems earlier in the week. HCMD2 is loading up work now and since we have resolved our issues, you should have a steady stream moving forward from here.
|
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: |
We are still catching up on various backend tasks due to our problems earlier in the week. HCMD2 is loading up work now and since we have resolved our issues, you should have a steady stream moving forward from here. I'm still getting no work available for CMD2 messages in my logs on 2 machines. EDIT: For some unknown reason it was asking for GPU tasks. It seems to have straightened itself out and I'm now getting CPU tasks.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------[Edit 1 times, last edit by nanoprobe at Apr 6, 2012 8:49:05 PM] |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
Because of several daily hang-ups for each Linux-based host, I deselected HCMD2 for the next 7 days. I have to business travel and I will not be able to baby sit the hosts.
----------------------------------------The hang-up issue seems to appear rarely on Windows-based hosts but too often (around every 6 to 8 hours) on Linux-based hosts. I am a little bit frustrated since I need around 200 crunching days for achieving 30 years on HCMD2. Cheers, Yves |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
A new BOINC client failure last night which gives me the opportunity to report another sequence of messages that I have already seen days ago before reporting these failures here:
----------------------------------------Computation for task CMD2_2175-1NZW_A.clustersOccur-2PKD_B.clustersOccur_0_7296_9119_7846_8028_1 finished Signature verification error for wcg_hcmd2_maxdo_6.40_i686-pc-linux-gnu Can't open client_state_next.xml: fopen() failed Couldn't write state file: fopen() failed; giving up |
||
|
|