Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 27
Posts: 27   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4492 times and has 26 replies Next Thread
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU failure ? - Maybe solved ?

Interesting...

In the last two weeks have experienced three machines cease boinc processing. There was work in the queue on all these machines - but crunching just stopped. Restarting boinc as well as rebooting failed to solve the situation... No prior issue with invalids.

Stopped boinc, blew away the boinc work directories retaining only account_www.worldcommunitygrid.org.xml, global_prefs_override.xml and gui_rpc_auth.cfg. Restarted boinc. Immeditatey new work was downloaded and processing began. Not missed a beat since. All three machines had "hand me down" boinc work directories from previous OS upgrades.
----------------------------------------
[Jul 28, 2019 10:25:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU failure ? - Maybe solved ?

Hi Tony,
what project, what OS?
I did experience such a case several times since several months with MCM on a Windows 7 Pro x64 machine.
Only a reboot - including power off - does help. It seems that the network stack is killed, which disturbs boinc as well.
Yves
----------------------------------------
[Jul 28, 2019 4:04:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU failure ? - Maybe solved ?


what project, what OS?

Linux ClearOS 7.6 (derived from CentOS 7.6) on one, the other two Fedora Version 30. At the time one was running both SCC and HST - the others SCC only. All have run every available project in the past including Betas. (in the recent past was runing MIP and changed to SCC/HST after reaching 20 years for MIP just before SCC restarted). You mentioned Zika in your inital post. Last run of Zika WUs was September 2017.
The binaries were not re-installed. Fedora is running 7.14.2-17.fc30.x86_64 and ClearOS running 7.14.2-17.el7.x86_64.
Edit: Probably should mention that 6 other Linux machines are OK - 1 running Fedora 30 and the others ClearOS 7.6, all with similar boinc work directories which have seen OS upgrades and running the same mix of projects. The one Windows machine is running 8.1 - but has done so since new.
----------------------------------------
----------------------------------------
[Edit 3 times, last edit by TonyEllis at Jul 28, 2019 5:42:21 PM]
[Jul 28, 2019 5:31:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU failure ? - Maybe solved ?

Just guessing here, but it seems the problem is not OS related. That would leave some sort of intermittent (?) hardware problem. These can be extremely hard to track down. It cold be as simple as a malfunctioning capacitor on the motherboard, but I have no idea how one would determine that.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jul 28, 2019 6:16:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU failure ? - SOLVED

Update #3
---
Very good news smile smile smile
Since the last update everything is running well, no invalid WU.
Comparing to a couple months ago, the machine is performing much better. Prior resetting the boinc data directory, the machine was about 20% to 30% slower than it twin. Now the both machines work fine and fast (about 420 Zika WU's / day, 24'000 boinc points / day).
I should have performed the reset much earlier in the past.
However, until now, I cannot really understand what was causing the trouble in the boinc data directory.

@TechTeam
Maybe for the future, it would be fine to provide an advice in the FAQ regarding system update and in case of unexplainable invalid results.

Yves
----------------------------------------
[Jul 31, 2019 7:10:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: CPU failure ? - SOLVED

A reset has NEVER cleared off all project related gunk built up over time. Only a project DETACH and re-add does.
[Jul 31, 2019 3:36:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU failure ? - SOLVED

Hi Lavaflow,
I did it as described above.
1/ Ran dry (do not accept new task).
2/ Project reset
3/ Project detach
4/ Project reset
5/ Re-attach to WCG
6/ Allow new task

Yves
----------------------------------------
[Aug 2, 2019 7:27:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 27   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread