Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 51
Posts: 51   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 225690 times and has 50 replies Next Thread
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Well, based on your numbers, some of the conclusions seems to be:
1: For Vina, error-rate Amd >> error-rate Intel.
2: For all projects, error-rate Amd >> error-rate Intel.

The error-rate being higher for Amd is as you expected, but not sure all of the following results is as expected:
3: Total Error-rate for Vina is the same as total error-rate all projects. (1.88% and 1.89% is equal).
4: Error-rate Linux >> error-rate Windows. This is especially apparent for Vina.
5: Total error-rate during summer is much higher than total error-rate during winter. Winter is as mentioned 0.44% while summer is 4.58%.

If higher temperature is significant or if it's just an unlucky combination with the current batches of work being done during summer is difficult to know.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Aug 3, 2013 12:58:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi Ingleside,
I would like to provide more detailed explanations regarding the Winter - Summer comparison.

During the Winter, the machines were mainly focused on HCMD2 and HFCC. One machine (Q9600 with GPU) was devoted to HCC only.
The VINA projects were systematically avoided on the AMD-based systems because of bad experiences in the past.

During June 2013, we came close to a VINA only configuration. I tried again to compute for VINA-based projects with the AMD systems.

The both Athlon II x4 systems are identical and side-by-side in my office. One host is running Win XP Pro SP3, the second one Ubuntu 10.04 x64. Until the Windows host generates very rarely invalid results, the rate is significantly higher for the Ubuntu host.

Additionally, for VINA-based projects, it is to notice that the rate of invalid results is lower on Phenom II x6 based systems (the both are running Ubuntu 10.04 x64) than on the the Athlon II x4 system.
Even if the rate is significantly higher for VINA-based projects as for the non-VINA projects.

Again, I don't consider that the problem is caused by temperature. Much more, I think that there is an over-sensibility of the validator for VINA-based projects.

Being honest, I remember that we speak currently about an average rate of invalid and errored results between 1% and 3.5%.

Unfortunately, I cannot perform a similar analysis for the "pre-VINA" period. Nevertheless, at this time, there was no significant differences between Intel-based and AMD-based systems. The same systems were able to compute month long without any troubles excepted some wrong batches.

I would appreciate if Knreed could share with us his view about this problem.

Cheers,
Yves
----------------------------------------
[Aug 3, 2013 8:00:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ryan222h
Senior Cruncher
Joined: Sep 4, 2006
Post Count: 425
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

For what it's worth, I have 0 invalids and 0 errors in my results returned for however long it goes back, I would imagine thousands of results. I run at least 4 systems 24/7, 2 intel i7 and 2 AMD opteron, 40 cores total between those systems.

Its interesting how some are getting errors and some are not.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by ryan222h at Aug 3, 2013 11:35:59 PM]
[Aug 3, 2013 11:34:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

I am going to take a shot in the dark here and suggest the AMD and Linux combination is pretty much responsible for the invalids. An experiment would be to change the operating system to Windows on the AMD machines and see if they still throw invalids at the same rate. Ingleside does point to the fact that invalids are more prevalent during the summer than the winter which hints at a maybe - transient - heat problem. If Linux is really needed for the AMD machines, maybe try a different distro and see if that makes any difference.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 4, 2013 12:27:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi,
I appreciate the various feedback.
However, the initial question is:
Why a reliable system (over years) based on AMD (Athlon II or Phenom II) architecture operating Ubuntu 10.04 x64 does generate more invalid results for VINA-based projects than for non-VINA based projects?

If the system would generate only invalid or error results, it would be probably easier to understand the problem root cause.
In my specific case (nonetheless it seems that I am not the only one), we have to understand what happens by 3.5% of the cases.
I am still thinking that the trouble is caused by an over-sensibility of the validator; maybe caused by cumulative rounding failures? ...

I am not willing to switch the concerned systems to Windows at least for the following reasons:
  • License cost
  • Computational efficiency
  • Efficiency by memory management
  • Maintenance work


Based on the different feedbacks, it seems that AMD Bulldozer CPU's do not experience similar troubles, likewise for Opteron.

Have a nice Sunday,
Yves
----------------------------------------
[Aug 4, 2013 5:48:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3315
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi,
I appreciate the various feedback.
However, the initial question is:
Why a reliable system (over years) based on AMD (Athlon II or Phenom II) architecture operating Ubuntu 10.04 x64 does generate more invalid results for VINA-based projects than for non-VINA based projects?

If the system would generate only invalid or error results, it would be probably easier to understand the problem root cause.
In my specific case (nonetheless it seems that I am not the only one), we have to understand what happens by 3.5% of the cases.
I am still thinking that the trouble is caused by an over-sensibility of the validator; maybe caused by cumulative rounding failures? ...

I am not willing to switch the concerned systems to Windows at least for the following reasons:
  • License cost
  • Computational efficiency
  • Efficiency by memory management
  • Maintenance work


Based on the different feedbacks, it seems that AMD Bulldozer CPU's do not experience similar troubles, likewise for Opteron.

Have a nice Sunday,
Yves




I think I already told you this in another thread but are you sure you don't want to update your distro? Or atleast update/install the GCC package(latest version is 4.8.1) from the synaptic manager? I have an AMD Sempron which is unlocked to a X2 and overclocked and it has never produced any invalids on either SN2S, DSFL or GFAM while using Linux 64 bits(mostly Linux Mint 14 and Ubuntu 12.10)
----------------------------------------


- AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
- AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
- AMD Ryzen 7 7730U 8C/16T 3.0 GHz
----------------------------------------
[Edit 1 times, last edit by Falconet at Aug 4, 2013 10:17:08 AM]
[Aug 4, 2013 10:10:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi,
I appreciate the various feedback.
However, the initial question is:
Why a reliable system (over years) based on AMD (Athlon II or Phenom II) architecture operating Ubuntu 10.04 x64 does generate more invalid results for VINA-based projects than for non-VINA based projects?

Seeing messages along the lines of "stable" overclock with project ABC but nothing but errors with project XYZ until decrease/remove the overclock isn't uncommon to see. This just shows that different applications uses cpu, cache, memory and hd differently, and all computers has a weakest spot. Even if you're not overclocking, it's possible Vina more frequently uses your computers weakest spot and this can more easily bring it over the edge and an error happens.

As for why Linux having more errors?
If not overlooked anything, you're running XP-32bit and Linux-64bit. For one thing this means Linux can use registers in CPU what's never used under windows-32bit. For another the 64bit-version is AFAIK faster and therefore will likely increase the heat on possibly the "wrong" spot for your system and bring it over the edge.

Even if you're also running 64-bit windows, the Linux and Windows-applications isn't the same and it's possible Linux is still faster and uses the "wrong" part of your system.


Another possibility is as you've indicated a buggy validator and it's not really anything wrong with any of your systems.

Now I'm not sure how the FAAH-Vina-results are, this can be:
a: Running same wu on same computer gives always the same result, bit-by-bit.
b: Running same wu on same computer gives always the same result, except time-stamps but results is saved as "human-readable" so can easily filter-out these different time-stamps.
c: Stopping and continuing a wu from checkpoint can give different results from running without re-start from checkpoint.
d: A random seed is choosen at start or during running and this can give completely different results.
e: Running same wu on same computer gives always the same result, but it includes a time-stamp and result is encoded so different time-stamp gives completely different result on disk.

If FAAH-VINA uses d or e, where's no method for you as an user to verify if your computer crunches correctly or is ocassionally generating garbage. If a or c on the other hand it's fairly easy to verify for yourself if it's the computer doing an error or something else. b is more time-consuming to handle but is also possible to verify. So, if assumes a, one method to verify is something like:

1: Download a couple days of VINA-work on a computer.
2: Disable network in BOINC, and stop BOINC.
3: Make backup-copy of your BOINC data-directory.
4: Re-start BOINC, and crunch through your work, but without uploading anything.
5: Stop BOINC again and make a new backup of your BOINC data-directory. Do not overwrite the other backup.
6: Re-start BOINC, upload & report the results.
7: Check if any of the results fails validation. If no errors, repeat from #1.

8: Then you do get one or more invalid results, stop BOINC and make a new backup of your current data-directory.
9: Restore your backup from #3.
10: Select to run only the tasks you had invalid results for.
11: After the invalid results is done, stop BOINC again.
12: Compare the new result-files with the ones from the backup in #5.

Assuming FAAH-VINA follows method a, difference in step #12 indicates a problem with your computer, either the hardware, the OS or possibly a buggy application. If you somehow manages to get the exact same result on the other hand, a problem with the validator or work-distribution pairing incompatible systems is more likely.

Also running some of the tasks that was validated to check how the results are can be an advantage. If all results is different this indicates FAAH-VINA probably uses either d or e and if this is the case where's really no way for you to verify with FAAH-VINA if you've got a computer-problem or not.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
----------------------------------------
[Edit 1 times, last edit by Ingleside at Aug 4, 2013 1:02:34 PM]
[Aug 4, 2013 12:59:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hei Ingleside,
Tusen takk for detaljerte forklaringer.
Siden jeg ikke har ferie for tiden, jeg vil ha noen problemer med å utføre alle børsnoterte tester.
Anyway, maybe should I follow Falconet's advice and indeed to plan to update the used Ubuntu version. Frankly, I was happy with 10.04 and I am not really enjoying to move to a newer Ubuntu version ! Perhaps, time to switch to something "different" ? ... Debian Wheezy ? ...

As soon as I will have a couple of free hours, I will try to move/update at least the Athlon II x4 host.

In all cases, I am really interested about Knreed's view and answers based on your detailed scenarios and assumptions.

I do not overclock any of my systems. However, the mentioned CPU weakness can be one of the possible causes.

I am still not leaving WCG, since the community gave again some power during the last 7 days smile

Hilsen fra Yves
---
PS: I apologise for my poor Norwegian.
----------------------------------------
[Aug 4, 2013 8:29:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
OldChap
Veteran Cruncher
UK
Joined: Jun 5, 2009
Post Count: 978
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Using Mint 14 Cinnamon here and am happy with it but then again I don't own any amd just now..... and Mint is just another ubuntu clone.

I tried Mint 15 but that was during my set up phase with the new rig...It seemed to run OK but I was getting lesser benchmarks for some reason.

Tried playing with 12.04 lts a while back but CLI is tough for me. Maybe it is that thing about old dogs and new tricks. I certainly feel like an old dog.

Maybe you should just look for what is using the latest kernel in the hope that some tweak or other fixes your issue??

EDIT: Not to miss anything, turn off "Cool'n'Quiet in bios by the way
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by OldChap at Aug 4, 2013 9:07:37 PM]
[Aug 4, 2013 8:58:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 396
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Why a reliable system (over years) based on AMD (Athlon II or Phenom II) architecture operating Ubuntu 10.04 x64 does generate more invalid results for VINA-based projects than for non-VINA based projects?
I have a Athlon II x4 640 running Ubuntu 10.04 LTS (64-bit) and a Athlon II x4 630 in a OEM HP box running Windows 7 Home Premium (64-bit).

Both machines produce 100% valid results for VINA-based projects.

I highly doubt the high rate of invalids are a WCG issue. It's possible you have flakey hardware or a corrupted OS install. What kind of hardware is mated to the Athlon II x4 having the problems? Hard drive, memory, motherboard specs might be helpful for others to help diagnose the issue. I have a ASUS motherboard, 2x2GB sticks of Corsair RAM and a rather old WD2500KS hard drive in the Ubuntu box. It runs for weeks without any problems. It will run 4 CEP2 units at the same time perfectly.

If you are happy with Ubuntu 10.04, try reloading it from scratch. If the problem doesn't go away then focus on the hardware (swap memory sticks, hard drives, etc). Make sure the dust bunnies are cleaned out even if you're seeing OK temps. The temp sensors may not be reading correctly and your temps may be too high even though you're seeing good numbers.

Do you have the LAIM setting turned on in your profile?
----------------------------------------

  • i5-10400 (Comet Lake, 6C/12T) @ 2.9 GHz
  • i5-7400 (Kaby Lake, 4C/4T) @ 3.0 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3330 (Ivy Bridge, 4C/4T) @ 3.0 GHz

----------------------------------------
[Edit 3 times, last edit by AgrFan at Aug 5, 2013 1:56:31 AM]
[Aug 5, 2013 1:34:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 51   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread