Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 26
Posts: 26   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 127801 times and has 25 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

We have identified the issue and are working on finding and correcting the cause.

Thanks,
armstrdj
Curious as to what the issue actually is, as the last Beta batch from July 1st worked just fine, even on 32bit Windows Server 2003/Windows XP hosts, but there seem to be a lot of those (random?) memory issues with regular WUs since right after the Beta run ended... confused

Ralf
[Jul 21, 2016 9:27:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Gurra
Cruncher
Joined: Sep 11, 2006
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

One Windows machine just got 5 WUs, all different batches but all with sequence #3, they all errored out straight away with (unknown error) - exit code -1073741819 (0xc0000005)
Another Windows machine got 5 WUs, mostly different batches and with sequences #3 and #4, all bar one errored out with the above error, the last one with inconsistent extended attributes.
A third Windows machine got a single WU - batch 4975 sequence 4 that is sitting in pending verification on two Darwin machines and one Windows machine. It appears to be processing normally
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Gurra at Jul 22, 2016 3:16:53 PM]
[Jul 22, 2016 3:07:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

We will begin releasing new work again on Help Stop TB shortly. We have tracked down the issue to the latest update done to the Windows 64 bit build of the application. We are reverting back to the previous Windows 64 bit version, version 7.14.

Thanks,
armstrdj
[Jul 22, 2016 6:24:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

I don't understand how that could be the root issue. I and others were seeing these memory errors on linux, would they be affected by the Windows build?
[Jul 22, 2016 6:49:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

For Help Stop TB each workunit is a time slice of a simulation and it relies on the output of the previous slice to run. The windows build was producing bad output and when it was fed back into the system those runs saw the increased memory usage. The next slice could run on any platform regardless of where the previous slice ran, which is why you saw them on Linux.

Thanks,
armstrdj
----------------------------------------
[Edit 1 times, last edit by armstrdj at Jul 22, 2016 8:03:25 PM]
[Jul 22, 2016 7:12:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

Ah, I understand now, thanks for the explanation.

Will you be able to wind back the affected simulations to the last known good output slice just before the Windows runs corrupted things rather than start each simulation from the beginning?
[Jul 22, 2016 8:02:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

Yes the simulations affected will be restarted at the last good input.

Thanks,
armstrdj
[Jul 22, 2016 8:05:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
tmedve
Senior Cruncher
USA
Joined: Nov 16, 2004
Post Count: 191
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

I've got 8 tasks with 2 running from this fixed batch (I think). 20 minutes in and all seems well.
[Jul 22, 2016 10:39:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

Yes the simulations affected will be restarted at the last good input.


That's great news. It's good to know that all the previous work on these simulations won't be in vain.
[Jul 23, 2016 12:29:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sandvika
Advanced Cruncher
United Kingdom
Joined: Apr 27, 2007
Post Count: 112
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Investigating Increased Error Rate in Help Stop TB

I'm delighted to hear that HST will soon be back in business. I've reverted to OET and am 6 real days away from completing 20 crunching years, so will stick with it until it is done and hope HST is ready again by then.
----------------------------------------

[Jul 23, 2016 1:01:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 26   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread