| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
All my devices run W7 64 bit. The boinc soft version used is 6.2.28 on all of them.
----------------------------------------Yesterday I launched HPFP2 across four devices. CPU is on three of them Core I7 950 and on one I7 975. I got more than 20 WU with errors. The CPU time is around 0.02, 0.04 hours which shows there is a problem. In the same timeframe I have about 40 WU which are Waiting for Validation with CPU times around 4:30 hrs, 20WU with errors, 7 WU that are valid. I will leave the machines crunching today and do a check again at the end of the day. If the Error ratio stays very high I will pull out of this project. ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Found 30 posts / threads in this project forum on the word combination "too AND many". Can't do anything about it as it is a very small percent and the root cause cannot be determined. Currently the I7 hyperthreaded PC's seem to have an above average share of those failures, so best we can offer is to deselect the project on a separate profile and assign the problem device to that profile.
----------------------------------------The state "Waiting for Validation" is not known to me. It suggest quorum has reached minimum. "Pending Validation", particular for the weekend is typically increasing, than falls again when crunchers that were switched off on Friday night are started again when the office re-opens on Monday.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
Sekerob you are right. It is "Pending for Validation" and not "Waiting for Validation".
----------------------------------------I did again a check ten minutes ago and the number of errors continue to pile up. I have In Progress 83 WU at the moment. On a total number of returned 85 WU's (for HPFP2) on the last 12 hours or so we have Error 31 Dev VENUS 26 Dev JUPITER 3 Dev SATURN 2 Pending Validation 44 Dev VENUS 19 Dev JUPITER 13 Dev SATURN 12 Valid 10 Dev VENUS 4 Dev JUPITER 6 Dev SATURN 0 The fourth device URANUS did not return results yet so I do not mention it. VENUS was the first device that crunched on HPFP2 so numbers are higher. Others had HFCC WU in the queue and came later on. All WU's labelled Error are with extremely short CPU time. In fact the WU's are processed in about 4.5 hrs or displayed as such in the task list of the device. It looks like when sent to the WCG servers they become WU's with 0.02 or something in CPU time and labelled as error. But I have not tracked yet aech single Error WU to see if it is really so. I will have to do that and maybe it could be the real problem. A change in the CPU time data from the local unit to the WCG server that would generate a rejection and Error status on perfectly otherwise valid WU's. Is this possible? ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
For now, IF you're still willing to test this, verify all your securities to have full exception to BOINC running components and a non-scan setting for the BOINC data directory. It's save as BOINC itself checks integrity of those files.
----------------------------------------Also, if willing uninstall whatever client you have and go back to 5.10.45 and install as all user, regular app, none-service. Will let BOINC run only when you're having an active session, which you can lock. When you get security pop-ups such as UAC, grand with remember my choice, so it wont ask again. Links to get 5.10.45 and other older formal releases are found in this FAQ!: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16642 Edit: Do not quite follow your last line, but no, change of system dates should not cause errors. PS: Originally when I had the issue, the fewer ran concurrently, the more chance there was they'd finish, so do a project mix. An additional test would be to run a program like Process-Lasso to force the HPF2 affinity to run on the physical cores, if that's something possible on I7. BOINC will then take care for relocating the other sciences to free cores i.e. such that not 2 run off the same core thread. Manually you can set core affinity in the Task Manager.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Dec 14, 2009 10:31:58 AM] |
||
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
Sekerob,
----------------------------------------I checked about 20 WU's to track and see how they behave. All those that complete in about 4.5 hrs go into Pending Validation status. None aof those tracked became Valid. Some others I did not track became Valid but statistically speaking very little becme Valid and in fact they pile up in the PV queue. At the moment 72 WU are PV and more than 100 are In Progress and will quickly pile up as I have about 32 cores on HPFP2. The WU's that finished in a few seconds become Error and are immediately replaced. So I have no real loss of CPU time. Anyway I switched the most error producing device to HCMDP2. All the other devices will stay on HPFP2 and will see what happens in a few days. Regarding your comment on mixing projects, I personally prefer to devote all the power on a project at a time. I find it more rewarding as you move quicker. My objective is also to reach collective badge targets. Like all at bronze level, or all at gold etc... At the moment my target is to reach gold for HPFP2 firt and HCMDP2 just after. A last comment, I really do not understand why 19 copies and 15 as a quorum. Even if high precision is justified, with today processors that are nearly all 64 bit double precision capable, I can understand a few copies but 19 sounds like some waste of processing power. ![]() |
||
|
|
Bearcat
Master Cruncher USA Joined: Jan 6, 2007 Post Count: 2803 Status: Offline Project Badges:
|
I don't think its the hyperthreading. My 8 core (dual 4 core) pc had allot of errors so I changed projects. I think it has to do with win 7 64 bit, which I am running.
----------------------------------------
Crunching for humanity since 2007!
![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A last comment, I really do not understand why 19 copies and 15 as a quorum. Even if high precision is justified, with today processors that are nearly all 64 bit double precision capable, I can understand a few copies but 19 sounds like some waste of processing power. Nope, not a waste at all: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=6105#102290 |
||
|
|
Somervillejudson@netscape.net
Veteran Cruncher USA Joined: May 16, 2008 Post Count: 1065 Status: Offline Project Badges:
|
Perhaps because of concern some may "abort" due to errors?
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Aborts due concerns? WCG will send out extra repair jobs for those.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
Thanks Moonian. A very interesting thread. Now I understand why. I feel much better.
----------------------------------------Will continue crunching straight at least up to the gold badge. Then we'll see. ![]() |
||
|
|
|