Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5660 times and has 8 replies Next Thread
se29592
Cruncher
Joined: Jun 4, 2009
Post Count: 8
Status: Offline
Reply to this Post  Reply with Quote 
confused Computational errors in various projects [AMD/Linux x86_64]

Since some weeks I am getting a very high error rate in my computations. Both Errors and Invalids in a number of projects. This has not been the case earlier using the same system and software. I have now disabled all projects where I do get errors:

Drug Search for Leishmaniasis
The Clean Energy Project - Phase 2
Help Cure Muscular Dystrophy - Phase 2
Human Proteome Folding - Phase 2
FightAIDS@Home

This started somewhere mid september. Before Sep 19 I was averaging 20k points per day whereas since then I have been getting about 5k points of valid results per day.

AMD Phenom(tm) II X6 1090T Processor
Fedora Core 14 - x86_64

I've tried reducing clock speed (although not over-clocked before either) with no change.
This is not related to the SELinux problems others have been reported since I have resolved those with SELinux exceptions.

I think this requires someone with full database access to investigate to find out what kind of patterns is associated with these failures. It could of course be this particular system, but I doubt it. Could there be changes that has been made to the MSWin x86 base drift slightly and making my previously ok system loose the votings? Or what can it be?

I can see no suspicious code update in this period.
$ rpm --last -qa boinc\*
boinc-client-doc-6.10.58-3.r22930svn.fc14 Tue 22 Feb 2011 02:13:50 PM CET
boinc-client-static-6.10.58-3.r22930svn.fc14 Tue 22 Feb 2011 02:13:48 PM CET
boinc-client-devel-6.10.58-3.r22930svn.fc14 Tue 22 Feb 2011 02:13:42 PM CET
boinc-manager-6.10.58-3.r22930svn.fc14 Tue 22 Feb 2011 02:13:29 PM CET
boinc-client-6.10.58-3.r22930svn.fc14 Tue 22 Feb 2011 02:13:03 PM CET

$ rpm --last -qa 
system-config-printer-1.2.8-2.fc14 Fri 30 Sep 2011 11:41:37 AM CEST
libdhash-0.4.3-5.fc14 Fri 30 Sep 2011 11:41:36 AM CEST
libini_config-0.6.2-5.fc14 Fri 30 Sep 2011 11:41:35 AM CEST
libcollection-0.6.1-5.fc14 Fri 30 Sep 2011 11:41:34 AM CEST
libpath_utils-0.2.1-5.fc14 Fri 30 Sep 2011 11:41:33 AM CEST
libref_array-0.1.2-5.fc14 Fri 30 Sep 2011 11:41:31 AM CEST
system-config-printer-libs-1.2.8-2.fc14 Fri 30 Sep 2011 11:41:29 AM CEST
firefox5-5.0-1.fc14 Wed 28 Sep 2011 05:32:42 PM CEST
xulrunner5-5.0-1.fc14 Wed 28 Sep 2011 05:32:38 PM CEST
PackageKit-gtk-module-0.6.12-4.fc14 Wed 28 Sep 2011 02:30:28 PM CEST
dbus-glib-0.86-4.fc14 Wed 28 Sep 2011 02:30:26 PM CEST
clearlooks-compact-gnome-theme-1.5-3.fc12 Wed 28 Sep 2011 02:25:43 PM CEST
fpaste-0.3.7-1.fc14 Tue 27 Sep 2011 08:26:48 AM CEST
k3b-libs-2.0.2-5.fc14 Tue 27 Sep 2011 08:26:46 AM CEST
k3b-2.0.2-5.fc14 Tue 27 Sep 2011 08:26:45 AM CEST
k3b-common-2.0.2-5.fc14 Tue 27 Sep 2011 08:26:44 AM CEST
nss-devel-3.12.10-4.fc14 Mon 26 Sep 2011 09:26:49 PM CEST
openldap-devel-2.4.23-10.fc14 Mon 26 Sep 2011 09:26:47 PM CEST
libcurl-devel-7.21.0-10.fc14 Mon 26 Sep 2011 09:26:44 PM CEST
libsoup-devel-2.32.2-2.fc14 Mon 26 Sep 2011 09:26:42 PM CEST
alsa-plugins-pulseaudio-1.0.24-2.fc14 Mon 26 Sep 2011 09:26:41 PM CEST
qt-webkit-4.7.4-2.fc14 Mon 26 Sep 2011 09:26:39 PM CEST
libcurl-7.21.0-10.fc14 Mon 26 Sep 2011 09:26:36 PM CEST
openldap-2.4.23-10.fc14 Mon 26 Sep 2011 09:26:34 PM CEST
qt-x11-4.7.4-2.fc14 Mon 26 Sep 2011 09:26:32 PM CEST
qt-4.7.4-2.fc14 Mon 26 Sep 2011 09:26:27 PM CEST
nss-3.12.10-4.fc14 Mon 26 Sep 2011 09:26:25 PM CEST
pcre-8.10-2.fc14 Mon 26 Sep 2011 09:26:21 PM CEST
nss-tools-3.12.10-4.fc14 Mon 26 Sep 2011 09:26:20 PM CEST
gnupg2-2.0.18-1.fc14 Mon 26 Sep 2011 09:26:18 PM CEST
curl-7.21.0-10.fc14 Mon 26 Sep 2011 09:26:16 PM CEST
qt-webkit-4.7.4-2.fc14 Mon 26 Sep 2011 09:26:15 PM CEST
foomatic-4.0.8-3.fc14 Mon 26 Sep 2011 09:26:12 PM CEST
libsoup-2.32.2-2.fc14 Mon 26 Sep 2011 09:26:10 PM CEST
foomatic-filters-4.0.8-3.fc14 Mon 26 Sep 2011 09:26:09 PM CEST
qt-x11-4.7.4-2.fc14 Mon 26 Sep 2011 09:26:06 PM CEST
qt-4.7.4-2.fc14 Mon 26 Sep 2011 09:26:01 PM CEST
libcurl-7.21.0-10.fc14 Mon 26 Sep 2011 09:25:59 PM CEST
openldap-2.4.23-10.fc14 Mon 26 Sep 2011 09:25:57 PM CEST
nss-3.12.10-4.fc14 Mon 26 Sep 2011 09:25:55 PM CEST
nss-sysinit-3.12.10-4.fc14 Mon 26 Sep 2011 09:25:54 PM CEST
ntfsprogs-2011.4.12-5.fc14 Fri 23 Sep 2011 02:28:58 PM CEST
ntfs-3g-2011.4.12-5.fc14 Fri 23 Sep 2011 02:28:56 PM CEST
xorg-x11-drv-savage-2.3.2-3.fc14 Tue 20 Sep 2011 10:15:33 PM CEST
unique-1.1.6-3.fc14 Tue 20 Sep 2011 10:15:33 PM CEST
librsvg2-2.32.0-4.fc14 Tue 20 Sep 2011 10:15:32 PM CEST
ql2400-firmware-5.06.01-1.fc14 Fri 16 Sep 2011 05:02:59 PM CEST
setup-2.8.28-2.fc14 Fri 16 Sep 2011 05:02:56 PM CEST
ql2500-firmware-5.06.01-1.fc14 Fri 16 Sep 2011 05:02:54 PM CEST
rsyslog-4.6.3-3.fc14 Fri 16 Sep 2011 05:02:50 PM CEST
python-boto-2.0-1.fc14 Wed 14 Sep 2011 11:07:23 PM CEST

[Oct 24, 2011 10:16:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

Have you run a virus scan as well as several system checks? This could be due to hdd errors or errors in the RAM so this is why you should run some system checks. Any bluescreens or power failures?

You may want to try to reinstall wcg as well.
[Oct 24, 2011 2:14:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

Hello se29592,
Going from 20,000 points to 5,000 points a day is indeed astonishing. There are always some errors caused by misformed work units (such as Batch 40 in DSFL a few days ago) and by input data that the algorithm cannot handle correctly. But that does not explain a problem that reduces your output by three quarters. Nobody else has reported such a drastic change.

So let us try a slow, thoughtful problem solving technique. Reduce the data that needs to be analysed by reducing BOINC to run only 1 process per computer (not per core) and eliminate the extra cache. In fact, cut the cache to just 0.1 days, which will mean no more than 1 work unit waiting to run. Then allow all projects to run. This should allow you to run at 100% speed without worrying about temperature.

This should allow you to build up a picture of just where things are going wrong without overloading you with data. We ordinarily run BOINC as fast as possible with good intentions, but when problems occur it can be like an auto accident where things go wrong more quickly than we can process information.

I look forward to a report.

confused
Lawrence
[Oct 24, 2011 11:05:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
se29592
Cruncher
Joined: Jun 4, 2009
Post Count: 8
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

Hi Lawrence,

I will try to change one thing at the time and see what comes out of it. My first action is to continue to run the system as before but selecting the projects where I have not seen any problems to try to confirm my theory that the problems are connected to some specific projects and not eg. memory or CPU problems (wihch are more likely to hit all projects, but not guaranteed to do so).

I have not noticed any instablities in the system but I'm not stressing it very much when I am at the console. Your confirmation that this is an isolated anomaly makes me more confident in continuing to try to find an error on the system level.

Ironically I discovered the drastic change when I found out that I have been running at reduced (power saving) speed continuously.

I'll update the tread when there is more information to share. It would be interesting to be able to query in full the restult status back in time. The limited searches available at the Result Status page do not provide (at least easily) enough information on result history.

/Nils
[Oct 25, 2011 7:50:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

Why don't you post an actual Result Log [scan multiple if there are variations in the fail codes] and what's printed in the message/event log of BOINC when these tasks fail?

Signal 11? SIGSEGV? Fedora's Firewall or any other IP/port scanning / guarding software needs to let IP 127.0.0.1 (localhost) through and port 31416. If that's continually scanned or obstructed, your tasks will fail, random, frequent, always. Try crunching with the BOINC network set suspended also. Intermittent WIFI is known to upset BOINC too.

All of this of course does not explain why it is not happening when you'd run e.g. HCC or Clean Water (both I think are Integer intense computations), so maybe the FPU is intermittently failing, but then HFCC would have to be failing too and that is the same program (science engine) as FAAH.

Can you define "reduced power". Lower CPU cycles, lower % CPU time, default 60% (known to cause DSFL to fail for some). Maybe this affects the cycles of the CPU itself if set to power save profile for BOINC, so that I've at least in Ubuntu locked it to max cycles. (Would expect that cycle down to respond with delay).

--//--
[Oct 25, 2011 8:48:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
se29592
Cruncher
Joined: Jun 4, 2009
Post Count: 8
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

Well the log files are in the data base for those interested and the errors vary. I see some SIGSEGVs for example.

Reduced power in my book means using a different frequency governor. This should not be normally visible by the application so I would not expect it to have any effect on application stability, but reducing internal clock frequency could potentially increase system stability.

I will wait with posting more information until I have anything useful to post.
[Oct 29, 2011 1:32:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
se29592
Cruncher
Joined: Jun 4, 2009
Post Count: 8
Status: Offline
Reply to this Post  Reply with Quote 
smile Re: Computational errors in various projects [AMD/Linux x86_64]

During my investigations I started seeing indications of a SELinux problem with my set-up. During one of my reboots a system SELinux relabel took place and the problems appears to be solved.

Cheers,

/Nils
[Nov 2, 2011 7:22:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

Hello se29592,
I hope that solves the problem. I have been interested in hearing how seLinux works for PC users for more than half a decade now.

Lawrence
[Nov 2, 2011 8:47:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computational errors in various projects [AMD/Linux x86_64]

From time to time, I notice that after some Linux updates (Ubuntu 10.04 LTS), the error rate could increase. For this reason, even if Ubuntu is not requesting to do it, I reboot the system after some specific updates (e.g. lib, pam, ...).
I don't have a formal rational regarding reboot criterion, it is more or less experience (and feeling) based.
Yves
----------------------------------------
[Nov 2, 2011 10:59:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread