Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 51
Posts: 51   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 225691 times and has 50 replies Next Thread
Thanassos
Cruncher
Joined: Jun 21, 2013
Post Count: 24
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

FWIW:

I'm seeing quite a few failures on the VINA cores as well. In fact the only invalid results I have at all are all VINA cores. Some from my Galaxy S4 which I'm not fused about, it's not mature technology.

My bigger concern is on my 48 Thread machine, its had 100% success on all other units but is generating INVALID VINA results all over the shop.

I'm not going to spend time changing hardware and swapping components when everything else runs fine except VINA. However as already stated my two Intel 3930K Systems have no INVALID results for any VINA unit. I'm not sure this is coincidence as 48 threads vs 24 threads is quite a bit of work being done and over a week is quite a decent sample size.

I've just disabled FAAH on my 48 thread machine and called it a day.
----------------------------------------

[Aug 5, 2013 2:59:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

"I am still thinking that the trouble is caused by an over-sensibility of the validator; maybe caused by cumulative rounding failures? ..."

On this line, if a quorum has been reached, and the tolerances were set wider just to get a validation, then please tell me which of the two non-matching results should be passed to the master database?

If non-quorum, specifically for these sciences, is in effect stating, not reproducible, then my view is that there is not but one option, to ask a 3rd opinion and then consider quorum by simple majority dismiss whichever of the 3 does not agree. Whilst, and this was explained in past, if there is not enough of a particular device type found contributing [OS/CPU combo], there is no room to create a homogeneity class to accommodate. Wrong place wrong time, sadly there was no escape clause, but as I pointed out before, there are many more projects to pick from.
[Aug 6, 2013 12:11:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

We have been continuing to investigate and have found some infrequent situations where two results diverge slightly that gets them marked invalid. This is causing us to go through recent results that have been marked invalid and identify examples that we can share with the researchers and see if we can modify the validation logic to accept a wider range of answers as the "same". This process will take a some time, but when we have an answer we will update this thread.
[Aug 8, 2013 1:39:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi knreed,
I am sure that every of us really appreciate your faithful support.
Based on the analysis I made last week, I reallocated the hosts according to the following scheme:
  • Phenom II x6, 1090T, Ubuntu 10.04, x64: SN2S and CEP2
  • Phenom II x6, 1055T, Ubuntu 10.04, x64: SN2S and CEP2
  • Athlon II x4, 640, Ubuntu 10.04, x64: CEP2 only
  • Athlon II x4, 640, WinXP Pro x86: FA@H, SN2S
  • Intel Q9450, WinXP Pro x86: FA@H, SN2S
  • Intel Q9600, WinXP Pro x86: FA@H, SN2S

Finally I have less than 5 invalid results within 7 days for a daily average of 500 computed WUs (since 2013-08-05; before this date it was only around 150 computed WUs daily).

I amend my initial statement because AMD-based systems (at least Phenom II) do not experience anymore a higher rate of invalid results since the release of the new WUs for SN2S.
I am still remaining careful with the Athlon II x4. For this reason, this host stays in a "CEP2 only" configuration.

Cheers,
Yves
----------------------------------------
----------------------------------------
[Edit 2 times, last edit by KerSamson at Aug 10, 2013 9:08:54 AM]
[Aug 10, 2013 9:04:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

KerSamson you are one of the solid contributors for Switzerland. I hope you stay even if times are difficult. I also had in the past some frustrations with projects thast could have a very high erroring rate 30% or more. They did error in the first seconds but still it was causing disruption because I would hit the limit for the daily number of WUs. But very strangely even if my machines are more or less identical some would generate a lot of errors and other not a single one. I had this with varioius projects.
After a lot of efforts I was unable to find out why some machine error highly on some projects and others not. And this was with Windows based systems. So I gave up to try to find the cause.
My solution was to have all machines run a new project and then just select the suitable ones for a given project. Now with very little projects and few machines I can understand it may become an issue.
For a solid contributor like you it would be fully understandable that you put to rest your machines for a few months and come back when conditions seem more suitable as I am doing at the moment. And it will also impact favourably your wallet wink
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Hypernova at Aug 10, 2013 2:12:44 PM]
[Aug 10, 2013 2:10:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

KerSamson: +53
Hypernova: welcome back and +5

Cheers peace
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Aug 10, 2013 6:34:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi guys,
September and October were fine with only few troubles.
Since end of October, some of my Linux hosts earned again a lot (>300) of invalid WUs, in particular with FA@H.
The problem I mentioned during the Summer is still present from time to time.
The high numbers of invalid WU (i.e. successfully computed but considered as invalid by the validator) is boring, frustrating, and finally not acceptable.
I was considering to add or replace some hosts but finally I do not know what I should do, since the mentioned problem is impacting systematically Linux-based AMD systems, and I would like to remain on this particular platform.
On 2013-08-08, knreed promised to investigate the problem.
I did not listen anything since this date.

I also noticed during the last 3 weeks, that Linux-based hosts - working on FA@H - did stop to compute for unexpected reasons. This problem occurred several time during the Summer and finally "disappear". Since beginning of November, this trouble occurred again several times.
I collected the error files.

I would very appreciate to receive some founded answers. I can be contacted directly by the WCG-Tech-Team, for providing more detailed information (e.g. error and statistic files).

In my business work, I have to investigate until root causes are identified. Because of the impact of the above troubles, I think that it is time to look for the particular root causes.

Cheers,
Yves
---
@knreed: JM Boullier has my direct contact data.
----------------------------------------
[Nov 24, 2013 9:57:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
poppageek
Advanced Cruncher
Joined: Nov 16, 2004
Post Count: 99
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

I have 26 AMD cores all running Ubuntu 12.04 or higher. Have never had problems with VINA. If I do have errors it is with CEP2 and not that often. They were all running Boinc 7.0.27, upgraded to 7.0.65.

But my Windows machines are getting MCM errors, 3 different machines. No idea why.

Opteron 8356 x 4 x 4 = 16 cores Ubuntu Server 64 12.04 Boinc 7.0.65
Opteron 1354 x 4 cores Ubuntu Desktop 13.04 7.0.65
AMD 960T 6 (4 + 2 unlocked) Mint 15 Boinc 7.2.28

So enterprise and consumer CPUs, desktop and server linux all 64 bit

Don't suppose I am helping much but I wonder if Linux on AMD is causing your problems. thinking

Hope ya get it figured out! smile
----------------------------------------
[Edit 2 times, last edit by PoppaGeek at Nov 24, 2013 11:57:42 PM]
[Nov 24, 2013 11:56:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

maybe try a different linux distro.
----------------------------------------
Crunching for humanity since 2007!

[Nov 25, 2013 3:12:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

It becomes more and more interesting.
My AMD-Linux hosts do usually not experience troubles with CEP2, excepted from time to time by failing WUs.
The AMD-Windows hosts do not experience problem with VINA.

Until now, the Linux hosts are still using boinc 6.10.58 with an up-to-date version of Ubuntu 10.04 LTS.

I thank you for the feedback.
Yves
----------------------------------------
[Nov 25, 2013 3:18:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 51   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread