Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 21
Posts: 21   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3323 times and has 20 replies Next Thread
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
sad unclear Invalids

Recently, two of my finished and uploaded WUs were classified as invalid.
See here:

which is too bad after 35 and 28 hours CPU time.
Stderr unfortunately does not tell what the problem was.
Anyone any idea how I can find out?

I hate to waste that many hours of CPU time :-(
----------------------------------------
[Edit 1 times, last edit by erich56 at Aug 19, 2021 12:55:57 PM]
[Aug 19, 2021 12:55:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

I had a view unexpected errors with no clear reason. I suppose it has to do with memory access violation.

The only error on my results-list still there is: https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1855097844

If I remember correctly at that time BOINC just started a 4-core ATLAS from LHC@home . . . coïncidence?
----------------------------------------

[Aug 19, 2021 1:37:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

Check if your D drive have enough space.

For most ARP1 invalids and some errors, there is possibly something wrong with your RAM, memtest to check. If CPU is overclocked / undervolt, then maybe it was pushed too far, reduce clocks a little more.

Some invalids I had was from a faulty non-ECC ram. Got new ECC UDIMM DDR3 on my AMD FX 4100, Asus M5A97 R2.0, Debian 11. Much better with no more invalids. Uptime 70 days with 1 corrected memory logged so far. Note: CPU, Motherboard, and RAM must all support ECC to use ECC.
[Aug 19, 2021 5:33:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

SSD has enough space.
RAM: 8 GB, DDR3, non-ECC, has undergone Memtest recently for a different reason. Test was okay.
The mainboard is an old Fujitsu D3041, Chipset Intel G41
Processor is an old Intel Core2 Quad Q9550 @ 2.83GHz, no overclocking.

So maybe this old system is not the optimal one for ARP ?
[Aug 19, 2021 6:18:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Acibant
Advanced Cruncher
USA
Joined: Apr 15, 2020
Post Count: 126
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

We can't see your devices and work units even with that link. Can you click on the "Error" link next to one of the work units in question and copy and paste the content in this thread so we can see the exact error messages?
----------------------------------------

[Aug 19, 2021 7:09:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

oh, sorry, I was not aware that the page I set the link to cannot be seen by others.
Anyway, the Error links shows the following:

Result Name: ARP1_ 0027300_ 086_ 0--
<core_client_version>7.14.3</core_client_version>
<![CDATA[
<message>
couldn't start app: Can't get shared memory segment name: shmget() failed</message>
]]>

what makes we wonder though is that is says "couldn't start app ..."
and still the task ran for 35 hours and the other one for 28 hours.
For me, this does not fit together, does it?
[Aug 19, 2021 8:38:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12160
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

erich56

You have given us some specs, but what units were you actually running at the time?

ARP can be very intensive.

Mike
[Aug 19, 2021 9:08:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 884
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

erich56

If those tasks really used that much CPU time doing something constructive, that's a very strange "log file." It looks like the sort of report one gets when the BOINC run-time needs to report an issue during set-up (no startup information, et cetera)...

I presume you're running some flavour of Windows, as that seems to be the place that throws up errors of that type (and not just for ARP - my Linux boxes keep getting retries for all projects where the single original task was on Windows and failed!

There have been various discussions about that issue. I haven't followed them closely (no Windows systems!) but I seem to recall that it was a mixture of client version and the number of shared memory segments already allocated to processes - users with plenty of memory spare were getting the error (which implies a table size limit somewhere was being hit.)

Given that, Mike's "what else was it doing at the time" query is likely to be pertinent.

But I'll return to my original remark - if it had managed to start the APR1 application properly I'd've expected to see at least the initial INFO lines and the "Starting WRFMain" line. Does the BOINC log on your system indicate that the workunit(s) in question ever checkpointed - in fact, checking said log for all lines containing the relevant work unit names might be revealing!

Good luck troubleshooting - hopefully all will become clear at some point.

Cheers - Al.
[Aug 19, 2021 11:08:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

7.14.3 is old, can update BOINC client. If this computer's BOINC version is already 7.16.11 or newer, when I guess either you looking at someone else's status, or your different computer, or this may be your very old result on old version of BOINC. Check the date and time.

The storage drive possibly can be corrupt and require filesystem checks for all drive letters.

Possibly do a multi-core memtest86+ and/or a Prime95 stress test. Run it for a longer, several hours. Some memory errors only show up when it is heated up and ran for a long time. Check the HDD/SSD health with S.M.A.R.T. tools. But in the end, the old computer could just be failing.
[Aug 19, 2021 11:38:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Acibant
Advanced Cruncher
USA
Joined: Apr 15, 2020
Post Count: 126
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: unclear Invalids

7.14.3 is old, can update BOINC client.
Unfortunately WCG's branded client offered on this site (at least for Windows) is still on that version and they can configure the URL to check against for updates to have it not report the existence of a newer version until they give the green light themselves on their servers. Fortunately, erich56, you can download a newer version here and install right over the old version and the work units in progress won't be lost, though they will revert back to the last point where progress was saved (checkpoint).
----------------------------------------

[Aug 20, 2021 12:33:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 21   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread