Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 32
Posts: 32   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5975 times and has 31 replies Next Thread
katoda
Senior Cruncher
Poland
Joined: Apr 28, 2007
Post Count: 170
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

Will try to do that after finishing currently work in progress.
----------------------------------------

[Aug 23, 2017 7:50:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2084
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

Four days ago, I received 32 BETA-WUs on one device at the same time (8/19/17 01:08:40) of which one errored out with "process got signal 11". One other WU errored out with "finish file present too long". The other 30 BETA-WUs all went Valid.
My wingmen - for the WUs that went into Error - were successful afterwards: they got Valids.
[Aug 23, 2017 9:15:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1931
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

So far all WUs returned are without error, on Windows (32/64bit), Linux (Mint 18/64). Got a few on a remote Mac with OS X 10.6.8, but those haven't been returned yet...

Seems to me running better than the recent Beta... confused

Ralf
----------------------------------------

[Aug 23, 2017 11:07:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 674
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

Ironically I'm having the opposite issue the ancient BOINC app running on my clapped out old Ubuntu 10 vm that hasn't seen a patch in years and running inside a BSD box is turning in 100% ok results as is normal for it. The much more modern, fully patched Win 7 Pro box is turning in 100% error rate, but only on these work units.

However excluding MIP, all boxes show 99.9% valids (1 Scc unit threw an error last week). A single error in any science is rare for my machines, a 100% error rate is unheard of.

So I think the problem lies with the science application mainly...
[Aug 23, 2017 11:47:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
guhsoftware
Cruncher
Germany
Joined: Nov 23, 2005
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

As within the beta I see signal 11 on my two RHEL 6 machines. No overclocking, these systems are running rock solid for many months now and have returned quite some valid results for other projects.

Result Name: MIP1_ 00000076_ 0224_ 0--


<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
[2017- 8-24 0: 2: 2:] :: BOINC:: Initializing ... ok.
[2017- 8-24 0: 2: 2:] :: BOINC :: boinc_init()
INFO: result number = 0
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: ../../projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.11_x86_64-pc-linux-gnu -in::file::zip MIP1_databasev2.zip @./MIP1_00000076.flags -out::file::silent result_silent.out -run:jran 955047679 -nstruct 26 -out::level 100 -run::no_scorefile true
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/www.worldcommunitygrid.org/mip1.MIP1_databasev2.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
set_shared_memory_fully_initialized ...
abrelax ...
abrelax.run
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Sequence Length = 40
Starting work on structure: _0001

</stderr_txt>
]]>
[Aug 24, 2017 9:43:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
guhsoftware
Cruncher
Germany
Joined: Nov 23, 2005
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

I did let the machine run dry. Did a reset project. Resumed work.
If I go into
projects/www.worldcommunitygrid.org
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08
cat stderrgfx.txt
07:17:25 (20310): Can't open init data file - running in standalone mode
SIGSEGV: segmentation violation
Stack trace (12 frames):
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08(boinc_catch_signal+0x4d)[0x49859d]
/lib64/libpthread.so.0(+0xf5e0)[0x7f505b37d5e0]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x4393b4]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x4935ca]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x4938e8]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x493a39]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x4affa6]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x4b0825]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x49384d]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x43ba57]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f505afccc05]
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08[0x4381a9]

Exiting...
Might be expected without parameters but looks dubious to me.

I can reproduce this on my two RHEL 6 hosts and a CentOS 6 host.

Let me know if I can help troubleshooting this further.
[Aug 25, 2017 5:22:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

Very hmmm. That's the graphics app part. Years ago split out from the main app in BOINC to prevent tasks crashing if the candy part goes down, which then gets logged to the stderrgfx.txt file in the main data directory, not the job slot. Not even got such a file anywhere on the MIP1 running W10 and Ubuntu 16.04 LTS, though never tried viewing the graphics on the Ubuntu system. Will try that tonight. Repeat, graphics failing is not supposed to crash the main science app / task.
[Aug 25, 2017 6:27:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

If I go into
projects/www.worldcommunitygrid.org
./wcgrid_mdds_gfx_prod_linux_64.x86.7.08
@guhsoftware mdds isn't mip1 (it was a long-ago beta) confused
[Aug 25, 2017 7:21:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
guhsoftware
Cruncher
Germany
Joined: Nov 23, 2005
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

Additional information for the hosts:
These are virtual machines running on VMware vSphere 6.5. Just the commandline "boinc" is running in an xterm.
[Aug 25, 2017 7:22:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
katoda
Senior Cruncher
Poland
Joined: Apr 28, 2007
Post Count: 170
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP units error on Linux

I tried to run the graphics part of MIP1 (wcgrid_mip1_gfx_7.11_x86_64-pc-linux-gnu) and got the following error

./wcgrid_mip1_gfx_7.11_x86_64-pc-linux-gnu: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory

so apparently there is a problem with "candy" part of the science application. I'm wondering, despite SekeRob's statement that it should not impact the main science e application, if our problem is somehow linked with it.
EDIT: and, just as @guhsoftware, I run Boinc in a terminal.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by katoda at Aug 25, 2017 7:42:11 AM]
[Aug 25, 2017 7:41:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 32   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread