| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 13
|
|
| Author |
|
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
I am encountering a problem where the BOINC client does not update the values in the Accessible View - Tasks Panel. It started yesterday on a single machine. What I am seeing is the Task Panel data (CPU Time, Progress, To Completion, etc.) doesn't change over time and there is a message at the bottom of the BOINC Client panel: "Retrieving System State: Please Wait".
----------------------------------------Environment: BOINC Client: 6.2.28 Processor: Intel I7 P920 - Not OC OS: Vista Ultimate 64 SP 2 I do not use the Screen Saver. Client is not installed as a service. I do not use the "Scheduled Job" procedure to initiate as documented in an old post about problems with BOINC under Vista. Don't know if the following means anything but: It started yesterday after I received a "bunch" of DDD WU's that are high priority. I got about 10 of them downloaded and then hit this problem. The task panel listed a number of WU's with the last one showing only the Project and no other values being updated. It appears that it is only the Task Panel not being updated, that is, if I shut down BOINC and restart with Run as Administrator a number of the WUs that were showing as in progress are now Ready to Report and so far at least appear to be valid. And it is a recurring problem in that it has happened at least 5 times in the last 24 hours. The problem is only occuring on one machine. I have another with the exact hardware/software running Vista Business 32 that is not having this problem. Anyone have any thoughts/experience with this type of problem? Edit.... After posting this I checked the message log and it appears that what is happening is that the Client stops writing anything to the display. The log shows that the WUs ended and were uploaded but the Messages panels was also frozen. So it looks like the Client just stops outputing display data. FYI, the WUs are showing as Valid in the results. Thanks in advance for any assistance! Bill P
Bill P
----------------------------------------![]() [Edit 2 times, last edit by wplachy at Jul 18, 2009 5:56:11 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
BOINC doesn't have bugs.
But it it did have bugs, you would find a list of them here: http://boinc.berkeley.edu/trac/query And be able to report them here: http://boinc.berkeley.edu/dev/ Good luck! |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
wplachy, if we could have one of the Result Logs (Result Status page). Then, boot the machine. Every time it's an out of the blue, it's likely needing a complete system refresh. Nothing at all in the message log? You find it them in the stdoutdae.txt file located in the data_dir (address shows in startup log).
----------------------------------------If booting does not help look at your AV program and make sure to exclude the data_dir from scanning. And indeed good luck. The heartbeat issue I'm just reading in trac was changed as fix required by 6.6 to undetermined. It's disappointing for it's one thing long overdue to be replaced... and encountered by me again recently on a 6.6 test client.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The ticket was opened 2 years ago, by David Anderson. He marked it "Critical".
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Speaking of the devil, last night:
----------------------------------------Result Log Result Name: CMD2_ 0016-1433GA.clustersOccur-ZPR1.clustersOccur_ 10_ 15067_ 15201_ 1-- <core_client_version>6.6.36</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. No heartbeat from core client for 30 sec - exiting called boinc_finish </stderr_txt> ]]> and the stdoutdae.txt file showing: Task CMD2_0016-1433GA.clustersOccur-ZPR1.clustersOccur_10_15067_15201_1 exited with zero status but no 'finished' file [World Community Grid] If this happens repeatedly you may need to reset the project. Fortunately it was only once, so it finish and validated normally.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7849 Status: Offline Project Badges:
|
exited with zero status but no 'finished' file[[World Community Grid] If this happens repeatedly you may need to reset the project. I have seen this message repeatedly on the Rice project, but since the units are valid, I have just ignored it. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Jobs will abort if too frequent, but after digging, yet again no trace what set this off. Possibly the remote monitor that checks the client continues to run.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
Jobs will abort if too frequent, but after digging, yet again no trace what set this off. Possibly the remote monitor that checks the client continues to run. A task that re-starts 100 times from same checkpoint should be aborted by BOINC-client. If it's long between checkpoints, and someone runs with "suspend if active" and "don't leave applications in memory", it's possible they'll have so many re-starts without any progress that the task get aborted. For "no heartbeat", normally you'll not get 100 in a row at the same checkpoint, if there's not something seriously wrong with the wu that is... ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Thanks for that. Could not find it mentioned e.g. http://boinc-wiki.info/Result_%27%28result%29...ut_no_%27finished%27_file and here http://boinc-wiki.info/No_heartbeat_from_core_client_-_exiting
----------------------------------------The official wiki has nothing that I could find either.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
All, thank you for the responses but I don't believe the problem I'm having has anything to do with a "missing heartbeat". I went back thru the stdoutdae.txt and stdoutdae.old files and there are no heartbeat messages in either (logs go back to Jun 24, 2009).
----------------------------------------What is happening is that the BOINC Accessable View panels are not being written. If I switch to Simple View I see the stats for 4 of the 8 running tasks and what I'm seeing in the stdoutdae files is that BOINC appears to be running normally. In the Accessable view I have nothing posted, all but the messages tab are empty. Messages shows only the startup messages. It does not show the "restarting task" messages that are being written to stdoutdae.txt Since the WU's appear to be running and completing OK I'm going to set BOINC to not get any new work, let the queue dry up and try uninstall/install to see what happens. Again, thanks for the responses! Bill P
Bill P
![]() |
||
|
|
|