Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4441 times and has 14 replies Next Thread
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation Errors

KLIK

I have 3GB RAM and only 2 cores so that is 1.5GB per core which is enough to run CEP2 on both. However, it seems to have been a one-off because only the 1 has errored since the re-boot. Perhaps it was because it was the first after the re-boot?

Mike
[Feb 2, 2015 6:03:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation Errors

UGM1, MCM1, CEP2... I'm confused as this is the UGM1 forum. Not aware of a boot and fail correlation on CEP2, just the large progress loss unless a boot is planned.

Klik did post a link to the Memtest86 app I've mentioned, though oddly the website name is memtest. It's good for both 32 and 64 bit platforms.
[Feb 2, 2015 6:14:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation Errors

Got a WU that failed with:
ERROR: Out of Memory Error on compression
ERROR: Output Compression failed

I rebooted my machine after this and that seems to have given it a good sweep out, but I wanted to express my surprise at this particular error binning all that processing. It seems highly likely that this WU finished correctly, so I'm surprised that it was considered "cheaper" to reprocess the WU than to send the result back in an uncompressed state.

Just my take.
[Jun 18, 2015 11:52:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation Errors

Surprise and incredulity is on the rise over the smallest thing not going the way we want it. My surprise was this message was never ever seen or reported before in the 10.7 years computing of WCG's existence, even more surprise the particular error was never documented on Google's internet, so really like to see the error log from the Result status page, if it contains anything. Certainly like to see the event log sequence from the stdoutdae.txt file, the before and after... an actual error code. Anything new that got written to the stderrdae.txt at the time of the incident. Both the files are in the BOINC data directory, path printed at top of Event log view after a client restart.

If a task fails during any phase without the ability to regress to a previous checkpoint, there's no recovery, no trusting the output, so yes it's the cheapest way to reprocess. If it fails the same way for a wingman, there's a reproducible problem, if not it was the rarest of flukes [IMO].

Beyond that, maybe a tech can look up what the message entails within the application code. It may be a UGM home made.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 19, 2015 1:19:20 AM]
[Jun 19, 2015 1:14:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation Errors

Yes this compression error is unique to UGM1 and agree that it must be a rare fluke since this is the first time we are seeing it reported. UGM1 produces a pretty high amount of data per cpu time. To help minimize the overall storage needed for results we removed anything in the output that could be calculated cheaply from the other output data as well as added compression outside of what is provided by the BOINC client. It is this compression in the application which failed. If we see more reports of this we will investigate further.

Thanks,
armstrdj
[Jun 29, 2015 8:03:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread