| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 21
|
|
| Author |
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
this morning, I had a WU which ended with ERROR after some 15 hours:
Result Name: ARP1_ 0034214_ 083_ 1-- <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 3221225477 (0xc0000005)</message> <stderr_txt> INFO: Initializing INFO: No state to restore. Start from the beginning. Starting WRFMain [17:03:30] INFO: Checkpoint taken at 2018-12-14_06:00:00 [22:15:36] INFO: Checkpoint taken at 2018-12-14_12:00:00 [03:41:26] INFO: Checkpoint taken at 2018-12-14_18:00:00 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0245475A read attempt to address 0x1B18EE14 Engaging BOINC Windows Runtime Debugger... No idea, what the problem is. I have had such problems neither with other WCG subprojects nor with other projects (like LHC). For some reason, ARP does not run properly on this machine. So I might change back to other WCG subprojects. |
||
|
|
maeax
Advanced Cruncher Joined: May 2, 2007 Post Count: 144 Status: Offline Project Badges:
|
For some reason, ARP does not run properly on this machine. So I might change back to other WCG subprojects. The System requirements for ARP are reaching the limit of your PC. To run an other project of WCG is a good idea!
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
|
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
To run an other project of WCG is a good idea! I have now changed to MCM. Runs without any problems, as before :-) |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
erich56
How many threads does your machine have and how many ARP were you running? Mike |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
erich56 How many threads does your machine have and how many ARP were you running? Mike the CPU has 4 cores/4 threads, and I had 2 ARP running concurrently. |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
That shouldn't be a problem. Maybe you haven't enough RAM to run 2.
Mike |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
That shouldn't be a problem. Maybe you haven't enough RAM to run 2. Mike Windows Task Manager as well as MemInfo show a usage of about 300-400MB RAM per WU. The few other running apps are minor, so out of the total system RAM of 8 GB, more than half was unused all the time. I suspect the system components are too old to cope with ARP. |
||
|
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges:
|
RAM: 8 GB, DDR3, non-ECC, has undergone Memtest recently for a different reason. Test was okay. This 8GB is more then enough memory to run 2 ARP1, something else is going wrong. My Linux Debian 11, intel Atom N270, 2GB RAM, is able to run 1 ARP1 just fine and got a Valid, but CPU too slow, 7.1 hours CPU time.The mainboard is an old Fujitsu D3041, Chipset Intel G41 Processor is an old Intel Core2 Quad Q9550 @ 2.83GHz, no overclocking. I am still kind of guessing it is some sort of hardware problems causing random errors or invalids. Possibly CPU, possibly motherboard if it has blown capacitors, or Possibly RAM mis-matched, overheating, or just failing. A non-ECC RAM can possibly cause silent memory errors at anytime, undetected, not logged, and can cause random crash or invalids. You can pull out half of memory to check if computer runs with better stability, then try the other half of RAM to check which have better stability. I have had random reboots WHEA errors with Asus B550-E Ryzen 3900x 2x16GB unbuffered ECC 3200MT/s NEMIX memory from my oops, wrong DDR4 voltage, use 1.2, not 1.35 volts. Now working much better now. |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
... this a a fairly old PC. So I am not too surprised that it cannot meet the challenges which ARP is posing to a system.I am still kind of guessing it is some sort of hardware problems causing random errors or invalids. Possibly CPU, possibly motherboard if it has blown capacitors, or Possibly RAM mis-matched, overheating, or just failing. ... |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
Just one last thought. ARP is very intensive at checkpoints and upload. Did your 2 units happen to do that at about the same time? If so you only need to keep them apart by occasionally suspending one until theirprogress shows 6% difference, say. That wouldn't need doing very often.
Mike |
||
|
|
|