| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 51
|
|
| Author |
|
|
Thanassos
Cruncher Joined: Jun 21, 2013 Post Count: 24 Status: Offline Project Badges:
|
FWIW:
----------------------------------------I'm seeing quite a few failures on the VINA cores as well. In fact the only invalid results I have at all are all VINA cores. Some from my Galaxy S4 which I'm not fused about, it's not mature technology. My bigger concern is on my 48 Thread machine, its had 100% success on all other units but is generating INVALID VINA results all over the shop. I'm not going to spend time changing hardware and swapping components when everything else runs fine except VINA. However as already stated my two Intel 3930K Systems have no INVALID results for any VINA unit. I'm not sure this is coincidence as 48 threads vs 24 threads is quite a bit of work being done and over a week is quite a decent sample size. I've just disabled FAAH on my 48 thread machine and called it a day. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"I am still thinking that the trouble is caused by an over-sensibility of the validator; maybe caused by cumulative rounding failures? ..."
On this line, if a quorum has been reached, and the tolerances were set wider just to get a validation, then please tell me which of the two non-matching results should be passed to the master database? If non-quorum, specifically for these sciences, is in effect stating, not reproducible, then my view is that there is not but one option, to ask a 3rd opinion and then consider quorum by simple majority dismiss whichever of the 3 does not agree. Whilst, and this was explained in past, if there is not enough of a particular device type found contributing [OS/CPU combo], there is no room to create a homogeneity class to accommodate. Wrong place wrong time, sadly there was no escape clause, but as I pointed out before, there are many more projects to pick from. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We have been continuing to investigate and have found some infrequent situations where two results diverge slightly that gets them marked invalid. This is causing us to go through recent results that have been marked invalid and identify examples that we can share with the researchers and see if we can modify the validation logic to accept a wider range of answers as the "same". This process will take a some time, but when we have an answer we will update this thread.
|
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi knreed,
----------------------------------------I am sure that every of us really appreciate your faithful support. Based on the analysis I made last week, I reallocated the hosts according to the following scheme:
Finally I have less than 5 invalid results within 7 days for a daily average of 500 computed WUs (since 2013-08-05; before this date it was only around 150 computed WUs daily). I amend my initial statement because AMD-based systems (at least Phenom II) do not experience anymore a higher rate of invalid results since the release of the new WUs for SN2S. I am still remaining careful with the Athlon II x4. For this reason, this host stays in a "CEP2 only" configuration. Cheers, Yves ---------------------------------------- [Edit 2 times, last edit by KerSamson at Aug 10, 2013 9:08:54 AM] |
||
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
KerSamson you are one of the solid contributors for Switzerland. I hope you stay even if times are difficult. I also had in the past some frustrations with projects thast could have a very high erroring rate 30% or more. They did error in the first seconds but still it was causing disruption because I would hit the limit for the daily number of WUs. But very strangely even if my machines are more or less identical some would generate a lot of errors and other not a single one. I had this with varioius projects.
----------------------------------------After a lot of efforts I was unable to find out why some machine error highly on some projects and others not. And this was with Windows based systems. So I gave up to try to find the cause. My solution was to have all machines run a new project and then just select the suitable ones for a given project. Now with very little projects and few machines I can understand it may become an issue. For a solid contributor like you it would be fully understandable that you put to rest your machines for a few months and come back when conditions seem more suitable as I am doing at the moment. And it will also impact favourably your wallet ![]() ![]() [Edit 1 times, last edit by Hypernova at Aug 10, 2013 2:12:44 PM] |
||
|
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges:
|
KerSamson: +53
----------------------------------------Hypernova: welcome back and +5 Cheers ![]() ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi guys,
----------------------------------------September and October were fine with only few troubles. Since end of October, some of my Linux hosts earned again a lot (>300) of invalid WUs, in particular with FA@H. The problem I mentioned during the Summer is still present from time to time. The high numbers of invalid WU (i.e. successfully computed but considered as invalid by the validator) is boring, frustrating, and finally not acceptable. I was considering to add or replace some hosts but finally I do not know what I should do, since the mentioned problem is impacting systematically Linux-based AMD systems, and I would like to remain on this particular platform. On 2013-08-08, knreed promised to investigate the problem. I did not listen anything since this date. I also noticed during the last 3 weeks, that Linux-based hosts - working on FA@H - did stop to compute for unexpected reasons. This problem occurred several time during the Summer and finally "disappear". Since beginning of November, this trouble occurred again several times. I collected the error files. I would very appreciate to receive some founded answers. I can be contacted directly by the WCG-Tech-Team, for providing more detailed information (e.g. error and statistic files). In my business work, I have to investigate until root causes are identified. Because of the impact of the above troubles, I think that it is time to look for the particular root causes. Cheers, Yves --- @knreed: JM Boullier has my direct contact data. |
||
|
|
poppageek
Advanced Cruncher Joined: Nov 16, 2004 Post Count: 99 Status: Offline Project Badges:
|
I have 26 AMD cores all running Ubuntu 12.04 or higher. Have never had problems with VINA. If I do have errors it is with CEP2 and not that often. They were all running Boinc 7.0.27, upgraded to 7.0.65.
----------------------------------------But my Windows machines are getting MCM errors, 3 different machines. No idea why. Opteron 8356 x 4 x 4 = 16 cores Ubuntu Server 64 12.04 Boinc 7.0.65 Opteron 1354 x 4 cores Ubuntu Desktop 13.04 7.0.65 AMD 960T 6 (4 + 2 unlocked) Mint 15 Boinc 7.2.28 So enterprise and consumer CPUs, desktop and server linux all 64 bit Don't suppose I am helping much but I wonder if Linux on AMD is causing your problems. Hope ya get it figured out! ![]() [Edit 2 times, last edit by PoppaGeek at Nov 24, 2013 11:57:42 PM] |
||
|
|
Bearcat
Master Cruncher USA Joined: Jan 6, 2007 Post Count: 2803 Status: Offline Project Badges:
|
maybe try a different linux distro.
----------------------------------------
Crunching for humanity since 2007!
![]() |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
It becomes more and more interesting.
----------------------------------------My AMD-Linux hosts do usually not experience troubles with CEP2, excepted from time to time by failing WUs. The AMD-Windows hosts do not experience problem with VINA. Until now, the Linux hosts are still using boinc 6.10.58 with an up-to-date version of Ubuntu 10.04 LTS. I thank you for the feedback. Yves |
||
|
|
|