Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2984 times and has 8 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Berserk WU

I've got a WU with estimates of completion time increasing. There is a file called "fort.98" in its data area with a huge and ever-increasing number of lines, all which say "**** OUT OF BOUNDS *********".

Should I abort it, or is there any information that would be useful to obtain first as to why it appears to be failing?
[Nov 22, 2007 3:50:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

I'm back from Thanksgiving at my sister's home. You have probably decided already, but if not - go ahead and abort it.

Lawrence
[Nov 23, 2007 1:44:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

Yeah, I killed it. Someone might want to look at that WU. It's been about 3 days and only 4 of 10 clients have completed it. It could just be a coincidence, but most 6-10 hour (what the completed ones have taken) WUs would mostly be done by all clients in under 2 days. With 60% incomplete, I'd suspect that others might be thrashing away making huge error logs like mine was. (ach1_ 8_30 is the WU)

(Even moreso as what sounds like the same problem was reported in this thread.)
[Nov 24, 2007 3:11:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

Another machine has errored out, after 67.89 hours and having blown a credit of 1,298.2 for nothing. My bet is that all the others will run out of time, having wasted several days each.
[Nov 26, 2007 4:23:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

Ughh!
Well, this is the sort of problem that we are supposed to locate with our initial trial runs. The project scientists have some local computers to debug with once we locate the problems.

sigh. . . .
Lawrence
[Nov 26, 2007 11:37:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

That's why I was trying to raise someone's attention to this. As I expected, the others have returned Error at 2-3 days CPU time or been marked No Reply, whereupon a new batch of copies have been sent out to more machines.

Edit: Now, it's sending out copies with a 1-day execution requirement and getting mostly "No Reply", so sending out more.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 30, 2007 8:51:13 AM]
[Nov 26, 2007 9:07:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

After burning through around 1000 CPU hours to get a quorum, the result was inconclusive, so another 5 copies have been spawned. Isn't it about time someone did something to stop this mess?
[Dec 2, 2007 7:58:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Berserk WU

I downloaded the work unit in question and I'm investigating why it is causing so many copies to be sent out. Hopefully I can have something for you in the next few days.

-Uplinger
[Dec 4, 2007 5:51:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Berserk WU

With over 1885 CPU hours consumed across 33 machines, was this the biggest rogue WU ever?

What ended up being the problem?
[Feb 1, 2008 3:37:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread