Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2417 times and has 10 replies Next Thread
Sasha Leykatze
Cruncher
United Kingdom
Joined: Feb 25, 2019
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
(unknown error) - exit code 194 (0xc2) - Question regarding stability

Hello,

I have many machines running 24/7 World Community Grid with all current projects enabled. However, one of my machines (and the one I use as my primary computer for daily use) is generating some errors that make me uncomfortable.

It has also produced an Invalid result for Africa Rainfall Project, while none of my other machines exhibit these errors or get any invalid results.

I recently put it down to misconfigured RAM and I have since changed it to 2933 MHz from 3200 and 3000 with looser timings, and I'm running system stability tests from various software (AIDA64, Prime95, etc) and getting no errors. That said, I wanted to know if anyone can shed any light on the:

"(unknown error) - exit code 194 (0xc2) "

Some of the WUs generate. They actually complete, or almost complete, then exit with this error (and only on the machine in question).

Here are some examples:

MCM:
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=299360271

OPN:
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=295895854

I have looked up the issue on:
https://boinc.mundayweb.com/wiki/index.php?title=Project_application_errors
and can find a reference to Exit Code 194 as being associated with "aborted" tasks or past the deadline, but the inclusion of "(Unknown Error)" in the name makes me anxious, as it this error is not listed in the section under "(Unknown Error)" on that page.

I wanted to know if anyone can offer me some advice on what might be happening here. Are there some log files I can provide to understand what is causing the issue or am I just being too anxious? (If so, I apologise).

The machine in question specifications:
Ryzen Threadripper 2950X 16-core (stock)
ASUS X399 Zenith Extreme
64GB (8x8GB) DDR4-2933 16-16-16-36*
860W Platinum PSU

*This 64GB is made from a composition of two kits, one rated for 3200 c14 and one for 3600 c16, I have tried to run the 3200 c14 one on its own and that also generated the errors. Otherwise, I tried them at 3000 c14, but now I have changed the settings as what I have listed.

Sorry for the long and potentially silly post. Anxiety is high if any of my machines start throwing errors/invalids.

Thank you for your time,

Ashley
[Sep 24, 2020 8:40:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
William Albert
Cruncher
Joined: Apr 5, 2020
Post Count: 39
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

I have dozens of machines (including a Ryzen-powered Windows PC) that have been crunching for OPN for months, and have yet to see a single error.

Especially given that others are able to successfully process WU's that are failing on your machine, I can't think of any other explanation other than unstable hardware.
[Sep 24, 2020 9:35:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

Note the negative errors in the 0-240 range are client errors and positive 1-nnn are science app/task errors. Suggest you diagnose your hardware, BOINC crunching being the toughest test there is on hardware stability, dust bunnies to be removed from the box frequently.
[Sep 24, 2020 10:13:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sasha Leykatze
Cruncher
United Kingdom
Joined: Feb 25, 2019
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

Thank you for the replies. I suspect the same: it is likely unstable hardware.

I am currently suspending all processing work on this machine until I have fixed the issue - I am going to run 24 hours of CPU/FPU/RAM intensive stress work on Prime95 (large FFTs) and AIDA64 at the current settings and see if it passes that, before even considering resuming. All of the tasks have deadlines that can still be met even with 24 hours without running - so at least there is that.

This is more or less confirming my worst nightmare because this machine is mostly second hand from a friend, who bought the components from eBay, but I think the RAM is the culprit so I will run those tests.
[Sep 24, 2020 10:28:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1672
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

You should look at:
- Temperature: CPU-cooler cleaning, filter cleaning, ventilator cleaning
- Power Supply: maybe the PS is a little bit too weak and it does not deliver sufficient voltage from time to time
- RAM: contact cleaning, RAM check
- HDD/SSD failure: Disk check, file system check
- Electromagnetic perturbation
- ...
Cheers,
Yves
----------------------------------------
[Sep 24, 2020 1:38:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sasha Leykatze
Cruncher
United Kingdom
Joined: Feb 25, 2019
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

Thanks everyone for your input.

Last night the machine passed Memtest64 for 5 hours without error, and I've tentatively resumed crunching as I suspect it was indeed RAM issues: and the machine has been crunching for 8 hours without any of the errors or more invalids.

This is new to me as previous issues with RAM instability often created invalid results, but now I am taking no chances with RAM settings, especially with Ryzen - and I will run the rated speed for the memory controller of 2933 MHz. (though in this case, the rated speed for 2DPC is lower, at 2133 or 2400, but it seems to be stable).

I have a huge anxiety about disrupting the science and the work that WCG projects do by having a machine produce bad results. I sincerely apologise for the issue with the RAM in my workstation as it was my fault for thinking it was safe to trade reliability for arbitrary speed.

Crunching 24/7 this last year has taught me the valuable lesson that stability and power efficiency is king over any form of over-clocking or clock speed increases, at least for me.

I will update the post here if I get any more errors / invalids.
[Sep 25, 2020 10:18:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

Don't worry, if a device goes really bad, it will eventually only get 1 work unitt a day for the affected sciences and this will last until the problem is resolved. Then for every valid result the daily allowance is doubled until you've reached ~10 serial valid at which point the daily allowance becomes unlimited again.
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 25, 2020 10:45:30 AM]
[Sep 25, 2020 10:44:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sasha Leykatze
Cruncher
United Kingdom
Joined: Feb 25, 2019
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

Bad news. I have an error. I am now suspending this machine entirely from WCG.

Same issue with the same code.

https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=300706788

Can someone help me look into this? The system is passing memory stability tests without issue. Could this be a software issue?

:(
[Sep 25, 2020 1:32:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

Does memtest use all CPU threads or just 1?

I'd let BOINC to just run on half the threads and see if you still get an error, if yes, half the number of threads again. If no error, go to 3/4 of threads, each phase to home in. Monitor temps at each stage. Certainly I would set all to stock speeds, overclocking, undervolting, I don't, once burned, twice shy. If something is intermittent though, try if you can to swap ram between machines and see if the problem moves. Reseat the RAM sticks too.
[Sep 25, 2020 2:07:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
William Albert
Cruncher
Joined: Apr 5, 2020
Post Count: 39
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: (unknown error) - exit code 194 (0xc2) - Question regarding stability

When I overclocked my old computer, I found that the OCCT Linpack test would often detect stability issues that other programs would miss.

That being said...

64GB (8x8GB) DDR4-2933 16-16-16-36*

While the Threadripper 2000 series is capable of running at DDR4-2933 with a simple single-rank config, it's unlikely to be able to operate at that frequency with all 8 slots filled with memory. WikiChip lists the Threadripper 2950X as maxing out at DDR4-2133 with eight single-rank DIMMs installed.

I recommend disabling your overclock using the manufacturer's stock specifications (and disabling XMP/DOCP memory auto-overclocking), and seeing if that resolves the stability issue.
[Sep 25, 2020 2:10:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread