Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3593
Posts: 3593   Pages: 360   [ Previous Page | 345 346 347 348 349 350 351 352 353 354 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5744004 times and has 3592 replies Next Thread
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I have a feelling there is a problem with WU:

ARP1_0012850_148

As myself and one other person reported the identical access violation error.
The other two WU's are still "In Progress" at this time.
(The error messages below are from my system.)

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
(unknown error) (317) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[04:22:55] INFO: Checkpoint taken at 2019-04-23_06:00:00
[07:22:14] INFO: Checkpoint taken at 2019-04-23_12:00:00
[10:27:33] INFO: Checkpoint taken at 2019-04-23_18:00:00
[12:19:23] INFO: Checkpoint taken at 2019-04-24_00:00:00
[14:13:06] INFO: Checkpoint taken at 2019-04-24_06:00:00
[17:10:39] INFO: Checkpoint taken at 2019-04-24_12:00:00
[20:11:45] INFO: Checkpoint taken at 2019-04-24_18:00:00


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF6AB6148E7 read attempt to address 0xAE3410A0

Engaging BOINC Windows Runtime Debugger...
<snip>
[Jul 6, 2025 5:59:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

bfmorse - can you supply the workunit number (e.g. by a link to the WU or your result) so that folks can have a decent look at all the result logs before they disappear?

Also, can we assume from your "identical access violation error" that both failures had the same number of successful checkpoints (some folks would only be referring to the error, not the total report, if they said that)?

It'll be interesting to see whether this is a case of another grid cell having a problem with time step size or some other data aspect, or whether (as happens sometimes) one or two systems have problems but others don't.

Thanks in advance - Al.
----------------------------------------
[Edit 2 times, last edit by alanb1951 at Jul 6, 2025 11:51:28 AM]
[Jul 6, 2025 11:42:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Sunday Report

This is an 8 day report to balance up last week's 6 day.

1,965 units in generations up to and including 145 remain stuck and have probably been joined by 13 from generation 146..

There are 30 units in generation 147 and 22,363 in generation 148, which is the current generation.

We are now 52% of the way through generation 148.

There are now 11,238 units held in generation 149.

15,068 units have validated in the week, but there are 1,258,569 units to go.

Based on the last 5 weeks, we would complete ARP1 on 22 December 2026, but we are getting close to where the stuck units will hold the completion up.

Mike
[Jul 6, 2025 12:50:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I THINK that work unit ID is: 738550711

I am uncertain because I have not been shown where and how to obtain that info before.
(At work atm & using my cellphone - kind of awkward for this task)
[Jul 6, 2025 6:47:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

That is the correct number.

I obtain it by clicking on Result Status, then the Result Name and the full access appears in the internet link.

Mike
[Jul 6, 2025 8:35:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thanks for the WU ID...

It's looking ominous for this WU, as both the failed tasks did indeed make the same number of checkpoints and (after allowing for things being loaded into memory in different places) the error location and call stacks look similar enough to suggest a crash at the same point in the code (and possibly at a similar stage of execution)... [Someone more used to MS diagnostics may be able to clarify (or deny) that!]

Given that, the other tasks may well fail the same way at about the same stage. So now we wait and see whether this is a candidate for a shorter time step or something else...

By the way, I note that the two failed tasks are an initial wingman and a retry (and on different Windows releases); the second retry is actually scheduled to return before the other initial wingman because of reduced deadlines. I just hope we don't see any No Reply tasks, as if this is going to be a dead unit it needs to be killed off sooner rather than later :-)

Once again, thanks (and yes, using a mobile phone screen for WCG access isn't much fun...)

Cheers - Al.

P.S. I don't have a record of any cell near that one having had problems in the past; that said, my list of 132 cells (and 320 different cell+generation combinations) won't be anywhere near a complete record of problem tasks (failing or otherwise)...
[Jul 6, 2025 9:34:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Al

I have belatedly started to record newly issued units with their generations. It will slowly narrow down the stuck units but I have only narrowed down to 31k units so far in 4 generations because I am only getting a very few units these days.

However, I am also using Adri's current task lists. Perhaps your data might speed up my process?

Mike
[Jul 7, 2025 1:34:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1293
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

no ARPs are going out at the moment... not even resends....

Here is a link to one of mine "waiting to be sent' https://www.worldcommunitygrid.org/contribution/workunit/736955953
[Jul 7, 2025 5:52:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Regarding bfmorse's problem WU, SIGSEGV and Waiting to be sent...

The _3 retry has also failed (looking extremely similar, again!) so it definitely looks as if this WU is not going to complete. The next retry for this WU is also stuck at "Waiting to be sent" so Unixchick is not alone in this...

And on checking what I returned yesterday I discovered that I handled a retry for a Linux wingman that went SIGSEGV -- that WU has validated! Just a reminder that not all such errors indicate a doomed WU :-)

Cheers Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jul 7, 2025 3:53:04 PM]
[Jul 7, 2025 3:51:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Nothing from ARP since July 5.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jul 7, 2025 7:33:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3593   Pages: 360   [ Previous Page | 345 346 347 348 349 350 351 352 353 354 | Next Page ]
[ Jump to Last Post ]
Post new Thread