Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4291 times and has 10 replies Next Thread
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
sad Too many errors on all devices

All my devices run W7 64 bit. The boinc soft version used is 6.2.28 on all of them.
Yesterday I launched HPFP2 across four devices. CPU is on three of them Core I7 950 and on one I7 975.
I got more than 20 WU with errors. The CPU time is around 0.02, 0.04 hours which shows there is a problem.

In the same timeframe I have about 40 WU which are Waiting for Validation with CPU times around 4:30 hrs,
20WU with errors,
7 WU that are valid.

I will leave the machines crunching today and do a check again at the end of the day.

If the Error ratio stays very high I will pull out of this project.
----------------------------------------

[Dec 14, 2009 8:12:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Too many errors on all devices

Found 30 posts / threads in this project forum on the word combination "too AND many". Can't do anything about it as it is a very small percent and the root cause cannot be determined. Currently the I7 hyperthreaded PC's seem to have an above average share of those failures, so best we can offer is to deselect the project on a separate profile and assign the problem device to that profile.

The state "Waiting for Validation" is not known to me. It suggest quorum has reached minimum. "Pending Validation", particular for the weekend is typically increasing, than falls again when crunchers that were switched off on Friday night are started again when the office re-opens on Monday.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Dec 14, 2009 8:24:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
confused Re: Too many errors on all devices

Sekerob you are right. It is "Pending for Validation" and not "Waiting for Validation".

I did again a check ten minutes ago and the number of errors continue to pile up.

I have In Progress 83 WU at the moment.

On a total number of returned 85 WU's (for HPFP2) on the last 12 hours or so we have

Error 31
Dev VENUS 26
Dev JUPITER 3
Dev SATURN 2

Pending Validation 44
Dev VENUS 19
Dev JUPITER 13
Dev SATURN 12

Valid 10
Dev VENUS 4
Dev JUPITER 6
Dev SATURN 0

The fourth device URANUS did not return results yet so I do not mention it.
VENUS was the first device that crunched on HPFP2 so numbers are higher. Others had HFCC WU in the queue and came later on.

All WU's labelled Error are with extremely short CPU time.
In fact the WU's are processed in about 4.5 hrs or displayed as such in the task list of the device. It looks like when sent to the WCG servers they become WU's with 0.02 or something in CPU time and labelled as error.
But I have not tracked yet aech single Error WU to see if it is really so. I will have to do that and maybe it could be the real problem.

A change in the CPU time data from the local unit to the WCG server that would generate a rejection and Error status on perfectly otherwise valid WU's. Is this possible?
----------------------------------------

[Dec 14, 2009 10:03:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Too many errors on all devices

For now, IF you're still willing to test this, verify all your securities to have full exception to BOINC running components and a non-scan setting for the BOINC data directory. It's save as BOINC itself checks integrity of those files.

Also, if willing uninstall whatever client you have and go back to 5.10.45 and install as all user, regular app, none-service. Will let BOINC run only when you're having an active session, which you can lock. When you get security pop-ups such as UAC, grand with remember my choice, so it wont ask again.

Links to get 5.10.45 and other older formal releases are found in this FAQ!: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16642

Edit: Do not quite follow your last line, but no, change of system dates should not cause errors.

PS: Originally when I had the issue, the fewer ran concurrently, the more chance there was they'd finish, so do a project mix. An additional test would be to run a program like Process-Lasso to force the HPF2 affinity to run on the physical cores, if that's something possible on I7. BOINC will then take care for relocating the other sciences to free cores i.e. such that not 2 run off the same core thread. Manually you can set core affinity in the Task Manager.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Dec 14, 2009 10:31:58 AM]
[Dec 14, 2009 10:25:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
smile Re: Too many errors on all devices

Sekerob,

I checked about 20 WU's to track and see how they behave. All those that complete in about 4.5 hrs go into Pending Validation status. None aof those tracked became Valid. Some others I did not track became Valid but statistically speaking very little becme Valid and in fact they pile up in the PV queue.
At the moment 72 WU are PV and more than 100 are In Progress and will quickly pile up as I have about 32 cores on HPFP2.

The WU's that finished in a few seconds become Error and are immediately replaced. So I have no real loss of CPU time. Anyway I switched the most error producing device to HCMDP2. All the other devices will stay on HPFP2 and will see what happens in a few days.

Regarding your comment on mixing projects, I personally prefer to devote all the power on a project at a time. I find it more rewarding as you move quicker. My objective is also to reach collective badge targets.
Like all at bronze level, or all at gold etc... At the moment my target is to reach gold for HPFP2 firt and HCMDP2 just after.

A last comment, I really do not understand why 19 copies and 15 as a quorum. Even if high precision is justified, with today processors that are nearly all 64 bit double precision capable, I can understand a few copies but 19 sounds like some waste of processing power.
----------------------------------------

[Dec 14, 2009 11:13:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many errors on all devices

I don't think its the hyperthreading. My 8 core (dual 4 core) pc had allot of errors so I changed projects. I think it has to do with win 7 64 bit, which I am running.
----------------------------------------
Crunching for humanity since 2007!

[Dec 15, 2009 3:42:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Too many errors on all devices


A last comment, I really do not understand why 19 copies and 15 as a quorum. Even if high precision is justified, with today processors that are nearly all 64 bit double precision capable, I can understand a few copies but 19 sounds like some waste of processing power.

Nope, not a waste at all:
http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=6105#102290
[Dec 15, 2009 5:13:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Somervillejudson@netscape.net
Veteran Cruncher
USA
Joined: May 16, 2008
Post Count: 1065
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many errors on all devices

Perhaps because of concern some may "abort" due to errors?
[Dec 15, 2009 2:31:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Too many errors on all devices

Aborts due concerns? WCG will send out extra repair jobs for those.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Dec 15, 2009 2:35:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
applause Re: Too many errors on all devices

Thanks Moonian. A very interesting thread. Now I understand why. I feel much better.
Will continue crunching straight at least up to the gold badge. Then we'll see.
----------------------------------------

[Dec 15, 2009 3:48:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread