Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 13
Posts: 13   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3535 times and has 12 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1406
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Error in batch 5493 with T350's?

3 tasks ended shortly after the start in an error:

HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 0-- rekendoos1 Error 7/14/16 07:34:20 7/14/16 08:37:57 0.00 / 0.01 0.3 / 0.0
HST1_ 005493_ 000030_ KC0010_ T350_ F00029_ S00007_ 1-- rekendoos1 Error 7/14/16 07:34:20 7/14/16 08:12:00 0.00 / 0.02 0.5 / 0.0
HST1_ 005493_ 000015_ KC0010_ T350_ F00009_ S00007_ 1-- rekendoos1 Error 7/14/16 07:34:20 7/14/16 08:12:00 0.00 / 0.01 0.2 / 0.0

All three with the same error "The extended attributes are inconsistent."

Example:
Result Name: HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
The extended attributes are inconsistent.
(0xff) - exit code 255 (0xff)
</message>
<stderr_txt>
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[10:12:48] INFO: Running initial simulation
</stderr_txt>

[Jul 14, 2016 8:46:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

Is the application and OS, Linux 32/64 bit? Positive error codes relate to system issues rather than application.
[Jul 14, 2016 10:01:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1406
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

Windows 7 x64

A T400 from the same batch 5493 started OK and is running fine on that machine like T000's and T001's from batch 5494.
[Jul 14, 2016 10:40:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1406
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

Three of a kind:
HST1_ 005493_ 000015_ KC0010_ T350_ F00009_ S00007_ 4-- Microsoft Windows 10 Professional x64 Edition, (10.00.14388.00) - In Progress 7/14/16 15:44:17 7/18/16 03:44:17 0.00 0.0 / 0.0
HST1_ 005493_ 000015_ KC0010_ T350_ F00009_ S00007_ 3-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 721 Error 7/14/16 15:40:51 7/14/16 15:42:56 0.00 0.0 / 0.0
HST1_ 005493_ 000015_ KC0010_ T350_ F00009_ S00007_ 2-- Microsoft x64 Edition, (06.02.9200.00) 721 Error 7/14/16 08:12:51 7/14/16 15:40:30 0.00 0.1 / 0.0
HST1_ 005493_ 000015_ KC0010_ T350_ F00009_ S00007_ 1-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 721 Error 7/14/16 07:34:20 7/14/16 08:12:00 0.00 0.2 / 0.0
HST1_ 005493_ 000015_ KC0010_ T350_ F00009_ S00007_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) - In Progress 7/14/16 07:33:24 7/24/16 07:33:24 0.00 0.0 / 0.0

and

HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 4-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) - In Progress 7/14/16 10:56:11 7/17/16 22:56:11 0.00 0.0 / 0.0
HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 3-- Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 721 Error 7/14/16 10:03:29 7/14/16 10:52:35 0.01 0.1 / 0.0
HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 721 Error 7/14/16 08:38:08 7/14/16 09:57:44 0.00 0.1 / 0.0
HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 0-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 721 Error 7/14/16 07:34:20 7/14/16 08:37:57 0.00 0.3 / 0.0
HST1_ 005493_ 000027_ KC0010_ T350_ F00026_ S00007_ 1-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) - In Progress 7/14/16 07:33:24 7/24/16 07:33:24 0.00 0.0 / 0.0
[Jul 14, 2016 5:40:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

Also got immediate "computation errors" on a few different systems, all 64bit Windows, at least on each of Windows 7, Windows 8.1 and Windows 10...

HST1_ 005553_ 000027_ AC0002_ T325_ F00006_ S00007_ 3-- (Windows 10)
HST1_ 005558_ 000007_ AT0005_ T300_ F00017_ S00006_ 3-- (Windows 7)
HST1_ 005487_ 000056_ AC0001_ T350_ F00628_ S00007_ 3-- (Windows 10)
HST1_ 005493_ 000029_ KC0010_ T350_ F00028_ S00007_ 1-- (Windows 8.1)

Ralf
[Jul 15, 2016 7:21:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Simba123
Cruncher
Joined: Nov 28, 2011
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

Windows 7 64 bit . never failed tasks ever before in 5 years. failed 22 in last couple of days, all HSTB.
Have noticed that I often get a 'waiting for memory' notice (though I have 8gigs of memory with plenty free and this project is only supposed to use 250 mb)
once it starts, memory usage spikes to 100%, then the task fails within 10 seconds.

<edit> Forgot to mention that I have also had several BLODs when running this particular project. Have now stopped this project until these issues are sorted.
----------------------------------------
[Edit 1 times, last edit by Simba123 at Jul 16, 2016 12:15:28 AM]
[Jul 15, 2016 10:37:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

On Linux side (up-to-date Ubuntu 14.04 LTS x64), two reboots today because of computation error by WU starting:
- HST1_005624_000006_KT0010_T350_F00027_S00003
- HST1_005583_000061_KT0013_T300_F00061_S00003
- HST1_005587_000071_KT0014_T350_F00099_S00003
With following failure: "SIGSEGV: segmentation violation"

Normally Linux hosts are very robust. But the computation error blocked the system so hardly that a reboot seemed to be the safer way to restore the system in a correct and controlled state.
After numerous long computing WUs with ridiculous credit, we experience now severe system disturbance with HST1. I think that it is time to reconsider seriously the science design and implementation.
I've tried to support HST1 as good as possible - 262 days since project launch - however, based on the last problems, I have probably to opt-out it since I am not always close to the hosts for baby sitting them.
Cheers,
Yves
----------------------------------------
[Jul 15, 2016 11:49:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
andgra
Senior Cruncher
Sweden
Joined: Mar 15, 2014
Post Count: 195
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

I also had 16 job errors this morning (HST325, 350 and 400).
It appears to be directly at task start and the error reason is unknown.
All Win10 (64 and 32 bit) both Intel and AMD.
No WU's are ok at wingmen, also error or in progress. So I would say it is a central problem.
----------------------------------------
/andgra



[Jul 16, 2016 6:37:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

Had a 5492 sitting in queue and decided to start it manually... it bombed instantly, app 7.16 i.e. the old one.

Result Name: HST1_ 005492_ 000023_ KC0010_ T350_ F00008_ S00007_ 2--
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073740940 (0xc0000374)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[09:34:47] INFO: Running initial simulation

</stderr_txt>
]]>

Both previous copies did crash too, but different ways

Result Name: HST1_ 005492_ 000023_ KC0010_ T350_ F00008_ S00007_ 0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
couldn't start app: Can't get shared memory segment name: shmget() failed
</message>
]]>

Result Name: HST1_ 005492_ 000023_ KC0010_ T350_ F00008_ S00007_ 1--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code -1073740940 (0xc0000374)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[16:04:06] INFO: Running initial simulation
INFO: No state to restore. Start from the beginning.
[16:07:19] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.1#
INFO: No state to restore. Start from the beginning.
[16:10:29] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.2#
INFO: No state to restore. Start from the beginning.
[16:14:27] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.3#
INFO: No state to restore. Start from the beginning.
[16:17:34] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.4#
INFO: No state to restore. Start from the beginning.
[16:21:27] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.5#
INFO: No state to restore. Start from the beginning.
[16:27:50] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.6#

</stderr_txt>
]]>

Edit: Manually started a 5628 with app 7.16 did so too crash instantly. The next, a 5373 with app 7.21 started fine as did a 5473 on 7.21. 64 bit machine W10-14372-rs1
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jul 16, 2016 7:46:55 AM]
[Jul 16, 2016 7:42:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Error in batch 5493 with T350's?

The plot thickens [even foggier], had a 5591 with 7.16 on W7-64, started fine. Checked in the Task Manager, 7.16 appears to be the 32 bit version (slaps forehead). A manual start of a 5409 on v7.21 runs fine too which is the 64 bit app.

Edit: So here the 7.16 32 bit app does not run on W10-64-14372 but does on W7-64. The 7.21 on both W10-64 14372 and W7-64 run fine.

Edit2: 30 minutes later, all that started fine on W10-64+W7-64 still progressing fine, several checkpoints logged.

Edit: On reread, see a lot of fine and not so fine in this post, LoL, but apart of 7.16 on W10 all remains ... fine biggrin
----------------------------------------
[Edit 3 times, last edit by SekeRob* at Jul 16, 2016 5:02:39 PM]
[Jul 16, 2016 7:56:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 13   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread