| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 7
|
|
| Author |
|
|
abennett
Cruncher Joined: Dec 11, 2005 Post Count: 10 Status: Offline Project Badges:
|
I'm still occasionally having problems on one of my machines that is running BOINC. Every now and then, when I come back to the machine after a period of time, the monitor goes to a sleep state, and the keyboard does not respond. I can only get the machine back by doing a hard restart.
----------------------------------------I'm running 5.10.22 on a 2 core AMD X2 3800 that has Windows 2000 Pro on it. No screensaver enabled, but BOINC is using 100% of CPU time. I have 2gb of memory. This machine mostly sits idle, with occasional web browser use. I've never seen it use most of it's physical RAM. I do not think I have a heat problem: - this machine previously ran distributed.net's client for months without issues - temperature checks upon a hang detection are in range This morning, when looking at the client, I noticed a status of "Computational Error" for a Dengue unit. Poking around, these messages look relevant: 11/3/2007 8:14:07 AM|World Community Grid|Restarting task dddt0201b0112_ZINC05460445-0000_00_0 using dddt version 510 11/3/2007 8:14:16 AM|World Community Grid|Deferring communication for 1 min 0 sec 11/3/2007 8:14:16 AM|World Community Grid|Reason: Unrecoverable error for result dddt0201b0112_ZINC05460445-0000_00_0 ( - exit code -1 (0xffffffff)) 11/3/2007 8:14:16 AM|World Community Grid|Computation for task dddt0201b0112_ZINC05460445-0000_00_0 finished Granted, these messages are from the restart, and not from the incident leading to the hang. Checking stdoutdae.txt, I have rows of numbers that proceed the status messages from the reboot/restart: 0. 0.001 -0.001 -0.001 -0.002 -0.005 -0.007 -0.011 -0.022 -0.032 -0.042 -0.058 -0.082 -0.11 To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK 2007-11-03 08:13:58 [---] Starting BOINC client version 5.10.22 for windows_intelx86 2007-11-03 08:13:58 [---] log flags: task, file_xfer, sched_ops 2007-11-03 08:13:58 [---] Libraries: libcurl/7.16.1 OpenSSL/0.9.8e zlib/1.2.3 2007-11-03 08:13:58 [---] Data directory: D:\BOINC Literally, pages and pages of random numbers. stdoutdae.txt is over 1mb in size, and I'd guess that 98% of it is numbers. Where does that 'To pause/resume tasks' line come from? Why is that in a log that I opened in Notepad? Is that the hang symptom? My issue is very similar to http://www.worldcommunitygrid.org/forums/wcg/...?thread=9381&offset=0 with the dual cores and the computation errors. Although I'm not using the BOINC screen saver. Help! Babysitting my machine is getting frustrating, not to mention the lost time and potentially lost units. ![]() [Edit 2 times, last edit by abennett at Nov 3, 2007 3:59:47 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Computer freezing isn't caused directly by BOINC. However, sometimes the additional stress on your computer can cause such problems. You have ruled out overheating, but faulty or failing power supplies have also been linked to this problem.
Such hangs and resets have probably been the cause of any BOINC errors you see, not the other way around. I'm not sure what the numbers are doing in your log. |
||
|
|
abennett
Cruncher Joined: Dec 11, 2005 Post Count: 10 Status: Offline Project Badges:
|
So why does my pc behave as expected using distributed.net's client for their crunching projects, but not BOINC?
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
This week I've been experimenting with pulling in jobs from different DC projects and can assure you that there are hardware work-outs and hardware work-outs, speak the amount of RAM used, the amount of disk space, disk i/o, and different parts of the CPU itself including the L1/L2 caches causing material temperature rating pressure over the cause of a job and the combination of jobs, at times speeding fans up to 90% of max to stay below CPU desired temp ceiling.
----------------------------------------If it was a suspected bad job, check up on the Result Status page, Work Unit quorum detail. If only a single error occurs, it usually is closer to being a client side error instead of a program error. As for your 'fault', found only one in the whole of the BOINC projects realm with the identical error code and hex . For now, I'd suggest a thorough test with e.g. memtest86 to see if any failures occur and if intermittent or repeating.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Diana G.
Master Cruncher Joined: Apr 6, 2005 Post Count: 3003 Status: Offline Project Badges:
|
abennett <snip>.. Checking stdoutdae.txt, I have rows of numbers that proceed the status messages from the reboot/restart: 0. 0.001 -0.001 -0.001 -0.002 -0.005 -0.007 -0.011 -0.022 -0.032 -0.042 -0.058 -0.082 -0.11 To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK 2007-11-03 08:13:58 [---] Starting BOINC client version 5.10.22 for windows_intelx86 2007-11-03 08:13:58 [---] log flags: task, file_xfer, sched_ops 2007-11-03 08:13:58 [---] Libraries: libcurl/7.16.1 OpenSSL/0.9.8e zlib/1.2.3 2007-11-03 08:13:58 [---] Data directory: D:\BOINC Literally, pages and pages of random numbers. stdoutdae.txt is over 1mb in size, and I'd guess that 98% of it is numbers. Where does that 'To pause/resume tasks' line come from? Why is that in a log that I opened in Notepad? Is that the hang symptom? <snip> ![]() Just my thought on this... I think it means that whenever you reboot your pc that always appears. I have seen it exactly like this right after each and everytime I reboot. I don't think it means anything other than that. ![]() ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes. The stdoutdae file contains everything written to stdout. This includes the messages, of course, but it can also include other things. The pause/resume tasks line is a perfect example of this. It is shown on stdout for the benefit of people running BOINC in a console window.
The numbers, though - I don't recall seeing them before, as I said. If I had to guess, I would say that the science application was writing to stdout instead of stderr, and since stdout isn't redirected, it is going to the parent process stdout and getting logged in the wrong place. Does anyone want to try to determine which project is doing this? Diana, which project(s) are you running? |
||
|
|
Diana G.
Master Cruncher Joined: Apr 6, 2005 Post Count: 3003 Status: Offline Project Badges:
|
Didactylos, I run all the projects but haven't received the AC@H or the HCC yet.
----------------------------------------I don't see any numbers in my stdoutdae and it shows that each time I reboot, it writes the pause/resume tasks line. Everything is normal. . ![]() |
||
|
|
|