Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Microbiome Immunity Project Thread: MIP units error on Linux |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 32
|
Author |
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 170 Status: Offline Project Badges: |
Hello, anybody crunching MIP on Linux (RHEL7) here?
----------------------------------------I received two MIP workunits on my Linux Machine and both finished with error shortly after start. For both workunits the error is the same: Output file absent The two workunits concerned are MIP1_ 00000128_ 0407_ 0-- and MIP1_ 00000118_ 0150_ 0-- I'm wondering if it's only me or maybe it's a a wider problem. Let me know if I should post more information. EDIT: MIP1_ 00000135_ 0520_ errored just a second ago. I think I'll move my Linux box to a profile with MIP deselected or will abort any MIP task on this machine until a solution is found. [Edit 1 times, last edit by katoda at Aug 23, 2017 11:06:50 AM] |
||
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3294 Status: Offline Project Badges: |
I have two 00000134 tasks running on 64-bit Linux. After 25 minutes, they are running fine. Intel N2807, Mint 18.2.
----------------------------------------AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W AMD Ryzen 7 7730U 8C/16T 3.0 GHz |
||
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 170 Status: Offline Project Badges: |
Then the problem is rather on my side. I'm wondering what could be the cause, as the box (running RHEL7 64-bit Workstation) crunched almost 3k tasks from other WCG projects without any problems.
----------------------------------------If I paste the error log, would it help to at least point the nature of the problem? |
||
|
Sid2
Senior Cruncher USA Joined: Jun 12, 2007 Post Count: 259 Status: Offline Project Badges: |
Haven't received any MIP on my linux box, but have had 5/5 error out on Windoze. . . .
---------------------------------------- |
||
|
PecosRiverM
Veteran Cruncher The Great State of Texas Joined: Apr 27, 2007 Post Count: 1053 Status: Offline Project Badges: |
Have you tried the basic's:
----------------------------------------Reboot system Check Anti-virus settings Sometimes those fix it for me at least. PS No problems on any of my Linux systems (Ubuntu 14.04) nor Win (XP, Vista, and 7) |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7545 Status: Offline Project Badges: |
I have had several error out on one of my Linux machines, but I think I have narrowed down the cause to a problem on my end. I have some machines on a wired network and some on a wireless link with a range extender. None of the machines on the wireless link have been able to process either the betas or the actual units for this project. They seem to choke on the big file which needs to be downloaded which apparently never completes. If I get real ambitious I will disconnect one of the machines on the wireless link and hook it up to the wired part of the network. I don't like to move any of them because they are all rack mounted servers just stacked with air spaces between them. And they are heavy.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 170 Status: Offline Project Badges: |
My Linux box is connected to wired network and big MIP files were downloaded quickly and without any problem.
----------------------------------------I dug on the beta test forum and this error was reported several times while crunching beta workunits of MIP. But I did not find any follow-up of these posts from WCG techs. EDIT: and just for info, the error log of one of the workunits, many similar situations reported for beta. Result Name: MIP1_ 00000154_ 0144_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> [2017- 8-23 15:26:19:] :: BOINC:: Initializing ... ok. [2017- 8-23 15:26:19:] :: BOINC :: boinc_init() INFO: result number = 0 BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: ../../projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.11_x86_64-pc-linux-gnu -in::file::zip MIP1_databasev2.zip @./MIP1_00000154.flags -out::file::silent result_silent.out -run:jran 173625263 -nstruct 25 -out::level 100 -run::no_scorefile true Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/www.worldcommunitygrid.org/mip1.MIP1_databasev2.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... set_shared_memory_fully_initialized ... abrelax ... abrelax.run Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Sequence Length = 43 Starting work on structure: _0001 </stderr_txt> ]]> [Edit 1 times, last edit by katoda at Aug 23, 2017 3:06:57 PM] |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: |
katoda,
I had a look at the results we have received back so far and the overall error rate is quite low so there are no widespread application or workunit issues. Do you overclock? Also you may want to run a hardware test if you haven't done this recently. Thanks, armstrdj |
||
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 170 Status: Offline Project Badges: |
The box is not overclocked, it is a normal HP machine with I7-6700 processor, 32GB RAM and 512GB SSD, running on stock settings.
----------------------------------------Can do hardware test, but first must find what software I can use on Linux. Concerning error rate, I'm not surprised, as there are other users running this project on Linux machines without any problems. I think that this must be somehow related to my setup/setup of other users reporting this issue in beta test thread, bu have no idea where to look - missing/incompatible libraries, for example? |
||
|
PecosRiverM
Veteran Cruncher The Great State of Texas Joined: Apr 27, 2007 Post Count: 1053 Status: Offline Project Badges: |
After looking around (googled "signal 11"). You might try to finish all your WU's. Then reset project (not detach) and reboot and see if that fixes it.
----------------------------------------Just a thought. |
||
|
|