Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Retired Forums Forum: Member-to-Member Support [Read Only] Thread: Unattended machine appears to fail workunits -- no messages |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 12
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have an old Aptiva with an 800 MHz P3 back in the bedroom on an 802.11b network which I kept around for my sons to use when they visit me. When I first installed WCG on it, I kept getting the connection failure messages. After reading about similar problems others had corrected by reducing the storage from 10 Gb, I reset the profile for that system to 5 Gb. That seemed to do the trick and it completed two work units, one on 26 Nov and the second on 27 Nov. Since that time, I have looked at it and found it to be grinding away on work units. I recall looking at it yesterday and saw that it was up to 30 hours CPU time on the one it was working on. Later, I will look at it and find that it has reset itself back to zero, presumably on a different work unit. I have been watching the statistics pages and do not see any completed work units since 27 Nov although I know I have seen this cycle repeat itself several times since then. There are no messages to indicate any problems, and, of course, with no log, there is no way to review what happened. I have reset the storage down to 1 Gb now with the hope that that might do something. Short of sitting back there in front of that PC in a very uncomfortable chair in a cold room to watch for the failure, can anybody think of something else I could do to resolve the problem?
----------------------------------------[Edit 1 times, last edit by Former Member at Dec 4, 2004 4:52:24 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dave,
More information. Have things really gone wrong? First check in Member Area – Device Manager – Device Statistics for the results returned by your second device. How does that compare with your Agent on the computer for Points, CPU Time, etc. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Lawrence --
----------------------------------------Hi again -- I am not sure how the local Agent statistics and those reported on the device manager relate to one another. The unattended system reports 8300 points and appears to have started its present work unit about 8 hours ago (which would be after the latest statistics were updated for the device). The "My Statistics" page reports 8044 and this system is reporting 8491 and has been on its present work unit for just over two hours. If one assumes that the Agent obtains the value of the accumulated points for a user and adds to that what it determined the most recently completed work unit will be awarded, it might be that my reduction of storage to 1 Gb has corrected the problem. I guess I should be patient and see what the device statistics for that system look like after tomorrow's update. I think we can let this ride and assume that it is fixed. I will update this thread tomorrow either way. Thanks. [Edit 1 times, last edit by Former Member at Dec 4, 2004 4:52:48 AM] |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: |
Hi Dave
----------------------------------------Just to clear up any confusion regarding the points on the agent and those reflected on the website. I maybe teaching Grandma to suck eggs if so please forgive me but it's useful info to have in the forum anyway. The website stats show results that have been returned from 00:00 GMT to 23:59 GMT the previous day. The stats page begins its update at 6am GMT and your results and points returned up to 23:59 the previous day will be visable when this process has completed around 7:20am GMT, at the moment - the more members we have the longer it will take to produce the results. If your WU was posted to WCG at 00:01am GMT it has missed the boat for the stats update at 6am GMT (I was going to say in the morning but thats only true for us sitting on the Greenwich Meridian -I'm in the UK) and will apear in the following days website update. Thanks for putting up with my ramblings Regards Dave |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Dave,
So many ways to get to places in the site. An easy way (not the only way) to check on a device is to go to your Member Area. Then click on Device Installations. This will take you to the Device Manager – Device Statistics page which lists all your devices. If you click on a particular Device Name, it takes you to the Device Statistics History page for that device. This has a table with 4 columns: Statistics Date -- Total CPU Time -- Points Generated – Results Returned Any day without a result returned will be omitted. This shows the CPU time spent on the results returned and the points awarded for results returned that day. This lets you see how that computer is doing. I assume that this list is created at 0600 UTC each day and will not include any results returned after that time. I personally suspect it might not be fully up-to-date before then, but you have already heard my ideas on that. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Lawrence --
----------------------------------------Yep -- that was what got me started on this in the first place. There are only two entries on this page, one from 26 Nov and the other from 27 Nov. Since then, I have seen that system show as much as 30 hours of CPU time on a work unit, perhaps 40% complete and then found that it had cycled back down to low values with no indication on the statistics pages that it had completed a work unit. As I said, let's see if the most recent might not appear with the 2 December statistics tomorrow. [Edit 1 times, last edit by Former Member at Dec 4, 2004 4:53:08 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had this problem a few tasks ago, it took over 30 hours to do it, and the progress bar reset itself a few times, but i just left it be and it seemed to finish. I've done quite a few tasks since then anyway, and it's still plodding along.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No luck -- device statistics still shows that my unattended system has only completed two work units, so it appears that yesterday was yet again another failure. Looks like the only answer is to uninstall WCG from that system and power it off. With no log and no diagnostics to point to the problem, I don't think it is solvable.
----------------------------------------[Edit 1 times, last edit by Former Member at Dec 4, 2004 4:53:29 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, I decided to have another try at it. I removed WCG from the system and re-installed it. After the install, it went through several itterations of downloading and then starting to execute every 15 to 20 seconds. I then restarted the machine. That seems to have stopped the rapid cycle of downloads. I watched it for five minutes and saw no more instances of that. Guess I will have to wait and see now. During that period in which it kept downloading every few seconds, there were no messages to indicate what the problem was. Kind of user surly of the application to abandon a work unit and not indicate why with some kind of message. If there are no messages, I guess a message log wouldn't even be of benefit.
----------------------------------------[Edit 1 times, last edit by Former Member at Dec 4, 2004 4:53:49 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, it does not sound like the (assumed) router problem that caused TaoWarrior's slow downloads. My first few results I had the Agent maximized on my screen so that I could follow the messages as it returned the result and downloaded a new Work Unit. You might want to time the presumed progress of your new Work Unit and use Exit on the agent to make sure that it does not try to return a result until you are ready to watch. I remember being slightly startled after a few results when one suddenly flashed Backing Off, then waited a minute before going through the familiar routine. Good luck!! This is a really puzzling situation.
|
||
|
|