| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
KLIK
I have 3GB RAM and only 2 cores so that is 1.5GB per core which is enough to run CEP2 on both. However, it seems to have been a one-off because only the 1 has errored since the re-boot. Perhaps it was because it was the first after the re-boot? Mike |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
UGM1, MCM1, CEP2... I'm confused as this is the UGM1 forum. Not aware of a boot and fail correlation on CEP2, just the large progress loss unless a boot is planned.
Klik did post a link to the Memtest86 app I've mentioned, though oddly the website name is memtest. It's good for both 32 and 64 bit platforms. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Got a WU that failed with:
ERROR: Out of Memory Error on compression ERROR: Output Compression failed I rebooted my machine after this and that seems to have given it a good sweep out, but I wanted to express my surprise at this particular error binning all that processing. It seems highly likely that this WU finished correctly, so I'm surprised that it was considered "cheaper" to reprocess the WU than to send the result back in an uncompressed state. Just my take. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Surprise and incredulity is on the rise over the smallest thing not going the way we want it. My surprise was this message was never ever seen or reported before in the 10.7 years computing of WCG's existence, even more surprise the particular error was never documented on Google's internet, so really like to see the error log from the Result status page, if it contains anything. Certainly like to see the event log sequence from the stdoutdae.txt file, the before and after... an actual error code. Anything new that got written to the stderrdae.txt at the time of the incident. Both the files are in the BOINC data directory, path printed at top of Event log view after a client restart.
----------------------------------------If a task fails during any phase without the ability to regress to a previous checkpoint, there's no recovery, no trusting the output, so yes it's the cheapest way to reprocess. If it fails the same way for a wingman, there's a reproducible problem, if not it was the rarest of flukes [IMO]. Beyond that, maybe a tech can look up what the message entails within the application code. It may be a UGM home made. [Edit 1 times, last edit by Former Member at Jun 19, 2015 1:19:20 AM] |
||
|
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges:
|
Yes this compression error is unique to UGM1 and agree that it must be a rare fluke since this is the first time we are seeing it reported. UGM1 produces a pretty high amount of data per cpu time. To help minimize the overall storage needed for results we removed anything in the output that could be calculated cheaply from the other output data as well as added compression outside of what is provided by the BOINC client. It is this compression in the application which failed. If we see more reports of this we will investigate further.
Thanks, armstrdj |
||
|
|
|