| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 16
|
|
| Author |
|
|
duanebong
Advanced Cruncher Singapore Joined: Apr 25, 2009 Post Count: 134 Status: Offline Project Badges:
|
I just built a new Ryzen 5800X with 32GB DDR4 on Windows 11 Pro.
----------------------------------------The machine downloaded 8 ARP work units and showed computation error within 2-3 seconds for all of them. Very strange as the machine is rock solid running Open Pandemics and also passed over an hour of various stress testing on Prime95. Any ideas what the problem could be? ![]() |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
Posting one of the error messages would help.
Mike |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Posting one of the error messages would help. Mike Definitely!! Also, how many of the 16 "CPUs" does BOINC have access to, how many of those ARP1 tasks did it try to run at the same time, and is it also running work for projects other than WCG at the same time??? It's harder to give help here than on a lot of other BOINC-based projects because we can't see computer details or workunit details for other users -- debugging with too much guesswork makes for longer threads and slower solutions! So the more information in the original message (within reason!), the better... :-) Cheers - Al |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Posting one of the error messages would help. Mike Definitely!! Also, how many of the 16 "CPUs" does BOINC have access to, how many of those ARP1 tasks did it try to run at the same time, and is it also running work for projects other than WCG at the same time??? It's harder to give help here than on a lot of other BOINC-based projects because we can't see computer details or workunit details for other users -- debugging with too much guesswork makes for longer threads and slower solutions! So the more information in the original message (within reason!), the better... :-) Cheers - Al Just to see if the problem is too many ARP units at once, try just downloading one ARP unit with either nothing else running or a minimal amount of other stuff and see if it still chokes. I also agree with the other posters and please post about the first 50 or so log messages. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
One further question. You said it downloaded 8 ARP units. At what point did the error occur? If they all started running at the same time they might have been trying to checkpoint at the same time and that could cause "indigestion" do to the high workload at checkpoint.
I try to keep checkpoints and finish at least 2 minutes apart by briefly suspending the second one. It doesn't matter which checkpoint we are talking about, they are all a problem capacity-wise. They occur at each 12.5%. Mike |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
It would probably be best if you show us the URL, or post the contents of the Result Log, of at least one task that errored out.
----------------------------------------You can find the result of each task by surfing to Results, then clicking the magnifying glass and typing: ARP (or just A). * This will most probably show your results that errored out with Status "Error". If there are no results showing "Error" and you have a lot of other ARP1 results, click on the funnel icon (Filter Panel) and tick the 'Error' box there, then click 'Apply filters'. Next, click on one of the Result names of a task that errored out. Then post the URL of that workunit result here. After doing that, if we click on the link "Error" there, we can see the contents of your Result Log. Examples: - the URL of a workunit Output looking like this: Result name OS type Status Sent time Due / Return time CPUtime/Elapsed Claimed/Granted - the Result Log of a task Output looking like this: <core_client_version>7.16.11</core_client_version> [Edit 1 times, last edit by adriverhoef at Dec 24, 2021 11:41:44 AM] |
||
|
|
duanebong
Advanced Cruncher Singapore Joined: Apr 25, 2009 Post Count: 134 Status: Offline Project Badges:
|
Thanks for the many replies. The PC was running only 1 Open Pandemics work unit and 1 to 2 ARP units at the same time. The other ARP units were still in the midst of being downloaded. Here's the error log from one of them:
----------------------------------------<core_client_version>7.16.20</core_client_version> <![CDATA[ <message> An unexpected network error occurred. (0x3b) - exit code 59 (0x3b)</message> <stderr_txt> INFO: Initializing INFO: No state to restore. Start from the beginning. Starting WRFMain forrtl: severe (59): list-directed I/O syntax error, unit 28, file C:\ProgramData\BOINC\slots\1\ozone_lat.formatted Image PC Routine Line Source wcgrid_arp1_wrf_7 00007FF6255FDB98 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF62563758A Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF6256359C9 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF6264C35FE Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF6257FDAEF Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF625802768 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF6250C35FB Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF624F8EB96 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF624EB4AD0 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF624309D5A Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF6243087B4 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF624234372 Unknown Unknown Unknown wcgrid_arp1_wrf_7 00007FF6267F9BF4 Unknown Unknown Unknown KERNEL32.DLL 00007FF80F6E54E0 Unknown Unknown Unknown ntdll.dll 00007FF81096485B Unknown Unknown Unknown </stderr_txt> ]]> ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
How about the event log under tools. See if you can find the relevant entries there, the first 50 or so lines.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
That error log suggests there might be a corrupt data file. Check the results pages again to see whether any wingmen are failing or not; by what you said in your original post, they'd probably fail pretty quickly!
If they are failing, there's probably something wrong with the original workunits (and there's nothing much we, as users, can do about that!) -- if they aren't reporting errors then the problem lies elsewhere and someone may have some more ideas as to why yours fails when others don't. Cheers - Al. |
||
|
|
duanebong
Advanced Cruncher Singapore Joined: Apr 25, 2009 Post Count: 134 Status: Offline Project Badges:
|
Thanks guys. Since it is likely to be a batch of corrupted data files, I will try to download some ARP work units to try again later. At the moment it seems there are no ARP units available.
----------------------------------------![]() |
||
|
|
|