Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 31
Posts: 31   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7723 times and has 30 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Repeating errors on 64-bit server

I have recently launched c4cw calculations on a 64-bit server. All c4cw tasks have returned errors, while tasks from other projects return valid results:

c4cw_ target04_ 001803385_ 0-- 	elk 	Error 	5/31/11 03:39:40 	5/31/11 05:40:03 	0.24 	4.6 / 0.0
c4cw_ target04_ 001798196_ 0-- elk Error 5/31/11 03:39:29 5/31/11 05:40:03 0.32 6.0 / 0.0
c4cw_ target04_ 001802864_ 0-- elk Error 5/31/11 03:38:54 5/31/11 05:37:47 0.30 5.7 / 0.0
X0000109510585200906161328_ 0-- elk Pending Validation 5/31/11 03:38:05 5/31/11 08:13:44 1.69 32.5 / 0.0
X0000109510478200906161330_ 0-- elk Pending Validation 5/31/11 03:37:18 5/31/11 08:13:44 1.52 29.1 / 0.0
c4cw_ target04_ 001793109_ 0-- elk Error 5/31/11 03:36:48 5/31/11 05:14:07 0.00 0.0 / 0.0
c4cw_ target04_ 001797411_ 0-- elk Error 5/31/11 03:36:27 5/31/11 05:14:07 0.08 1.5 / 0.0
X0000109510177200906161334_ 1-- elk Valid 5/31/11 03:35:01 5/31/11 08:13:44 1.63 31.2 / 22.7
X0000109510011200906161336_ 0-- elk Valid 5/31/11 03:33:30 5/31/11 08:13:44 1.45 27.8 / 19.6
X0000109510005200906161336_ 1-- elk Valid 5/31/11 03:33:30 5/31/11 08:13:44 1.49 28.7 / 14.2
c4cw_ target04_ 001799010_ 0-- elk Error 5/31/11 03:33:30 5/31/11 05:14:07 0.00 0.0 / 0.0


The reported error is:

Result Name: c4cw_ target04_ 001803385_ 0--
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
Commandline = ../../projects/www.worldcommunitygrid.org/wcg_c4cw_lmps_6.41_x86_64-pc-linux-gnu -screen none -in in.wcg.acc -var wcgsteps1 10000 -var wcgsteps2 10000 -var loop 0 -var restart 0 -var rinterval 100 -var ifile in.wcg.acc -var wcgseed 1803385
[09:20:58] Percent complete = 0.499975
[09:22:30] Percent complete = 0.999950
[09:24:11] Percent complete = 1.499925
[09:25:46] Percent complete = 1.999900
[09:27:23] Percent complete = 2.499875
[09:29:28] Percent complete = 2.999850
[09:31:33] Percent complete = 3.499825
[09:34:30] Percent complete = 3.999800
[09:36:38] Percent complete = 4.499775

</stderr_txt>
]]>


The server is an HP device with two Xeon E5540 CPUs.
$ uname -a
Linux elk 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux


I had to pause the Clean Water tasks. Is there anything I can do to fix the errors?
[May 31, 2011 9:54:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

Try a restart; amongst other things this could be related to memory (error - do a memory test), page file (disk write error - disk check) or task problems (help the scientists).
I have personally seen run-away task failures many times, for many projects - usually immediate failures. I found that a restart was a quick fix even if it did re-occur at a later time. Be sure to keep crunching other tasks!

http://en.wikipedia.org/wiki/SIGSEGV
----------------------------------------
[Edit 1 times, last edit by skgiven at May 31, 2011 10:22:04 AM]
[May 31, 2011 10:20:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

minaev,

It's curious that the 32 bit sciences succeed and the one 64 bit science that WCG distributes fails.

Whilst it may not be related cause, 6.10.17 you operate with is rather aged. Suggest to upgrade to 6.10.58, in 64 bit version, though the 6.10 32 bit client handles 64 bit sciences too:

http://boinc.berkeley.edu/dl/boinc_6.10.58_x86_64-pc-linux-gnu.sh

Could you please post copy of your client startup log, then we can read the setup. There's nothing logged in the client messages right around when these Clean Water tasks fail?

ttyl

PS, here from the BOINC FAQ Service a list of errors. Signal 11 is included with the possibles. http://boincfaq.mundayweb.com/index.php?view=165
[May 31, 2011 10:34:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

Just restarted boinc-client, will check the new results. I don't really believe in the magic of turning things off and then on again, though :) Updating boinc-client sounds more promising. I'll try to find an up-to-date deb package for Ubuntu if restarting doesn't help.

I can't see anything worrying in the logs. But then again, I'm only a newbie when it concerns Boinc and grid computing in general :)

31-May-2011 15:15:24 [---] Starting BOINC client version 6.10.17 for x86_64-pc-linux-gnu
31-May-2011 15:15:24 [---] log flags: file_xfer, sched_ops, task
31-May-2011 15:15:24 [---] Libraries: libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
31-May-2011 15:15:24 [---] Data directory: /var/lib/boinc-client
31-May-2011 15:15:24 [---] Processor: 16 GenuineIntel Intel(R) Xeon(R) CPU E5540 @ 2.53GHz [Family 6 Model 26 Stepping 5]
31-May-2011 15:15:24 [---] Processor: 8.00 MB cache
31-May-2011 15:15:24 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vm
31-May-2011 15:15:24 [---] OS: Linux: 2.6.32-31-server
31-May-2011 15:15:24 [---] Memory: 11.75 GB physical, 34.43 GB virtual
31-May-2011 15:15:24 [---] Disk: 3.55 TB total, 610.50 GB free
31-May-2011 15:15:24 [---] Local time is UTC +4 hours
31-May-2011 15:15:24 [---] No usable GPUs found
31-May-2011 15:15:24 [---] Not using a proxy
31-May-2011 15:15:24 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 1607390; resource share 100
31-May-2011 15:15:24 [climateprediction.net] URL http://climateprediction.net/; Computer ID 1151942; resource share 100
31-May-2011 15:15:24 [climateprediction.net] General prefs: from climateprediction.net (last modified 30-May-2011 17:25:05)
31-May-2011 15:15:24 [climateprediction.net] Host location: none
31-May-2011 15:15:24 [climateprediction.net] General prefs: using your defaults
31-May-2011 15:15:24 [---] Reading preferences override file
31-May-2011 15:15:24 [---] Preferences limit memory usage when active to 6017.18MB
31-May-2011 15:15:24 [---] Preferences limit memory usage when idle to 10830.92MB
31-May-2011 15:15:24 [---] Preferences limit disk usage to 50.00GB
BOINC initialization completed, beginning process execution...


Nor there's anything wrong with c4cw tasks:
31-May-2011 07:37:41 [World Community Grid] Computation for task c4cw_target04_001676121_0 finished
31-May-2011 07:37:41 [World Community Grid] Starting c4cw_target04_001688382_0
31-May-2011 07:37:41 [World Community Grid] Starting task c4cw_target04_001688382_0 using c4cw version 641
31-May-2011 07:37:42 [World Community Grid] Computation for task c4cw_target04_001672908_0 finished
31-May-2011 07:37:43 [World Community Grid] Started upload of c4cw_target04_001676121_0_0
31-May-2011 07:37:43 [World Community Grid] Computation for task c4cw_target04_001673135_0 finished


Will be back with more info later. Thanks!
[May 31, 2011 11:42:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

Kernel 2.6.32 suggests Ubuntu Lucid Lynx 10.04 LTS. That Synaptic has 6.10.58 or .59 in it. There is though an actual .deb in circulation of that version in 32 and 64 bits e.g. here: http://pkgs.org/package/boinc-client

--//--
[May 31, 2011 12:12:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

AFAIK, the official Lucid repository only has 6.10.17: https://launchpad.net/ubuntu/lucid/amd64/boinc-client

I have downloaded 6.10.58 from getdeb.net. Still weighing advantages and downsides of having third-party packages :)
[May 31, 2011 1:04:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

I don't really believe in the magic...
Believe - it's not really hocus-pocus.
[May 31, 2011 1:47:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

Don't want to disappoint anyone, but the magic works as expected :), that is, the client keeps segfaulting after restart. I would not totally dismiss hardware problems, but it doesn't seem likely. Until last week, MySQL was working on this server without a single failure.

I've just installed a newer version of boinc-client, 6.10.58.
[Jun 1, 2011 5:33:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

After the restart, did the C4CW tasks work for a while and then start to fail again?
[Jun 1, 2011 8:54:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Repeating errors on 64-bit server

Don't want to disappoint anyone, but the magic works as expected :), that is, the client keeps segfaulting after restart. I would not totally dismiss hardware problems, but it doesn't seem likely. Until last week, MySQL was working on this server without a single failure.

I've just installed a newer version of boinc-client, 6.10.58.

Could be one or the other library dependency. I'm running 64 bit Linux (Natty) with the mentioned 6.10.59, and under Maverick when C4CW launched in 64 bit, not had issue either... no one seems to. Think the techs test on 10.04 LTS, one of several flavors of Ubuntu they've got around.

BTW, tried allowing just a single Clean Water and the rest on other WCG sciences? Just curious with 16 cores in your device.

quote
Until last week, MySQL was working on this server without a single failure.

Is this inferring that MySQL is now failing too, when C4CW runs concurrent, or just that you haven't run it since last week?

You could indulge in a test: SIMAP has a 64 bit version of their app and that ran fine too on mine under Lucid LTS. They've got work this week. They are one of a few that are me backup projects if WCG goes off the air for longer and caches run dry... cant have an idle cruncher :D

Let us know

--//--
[Jun 1, 2011 9:25:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 31   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread