| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 26
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not so, RickH. Due to the potential for inconsistencies, WCG use a different work unit pool for each platform: Windows, Linux and Mac.
So, any Invalid results that you get now for new work units are interesting to the tech team. This project really does seem to be a baptism of fire for Rosetta. As far as I know, never before have so many high resolution proteins been folded. A few issues are inevitable. The quantity we have been experiencing is unfortunate, but it should settle down soon. I hope. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
RickH, my HT machine is not extreme, plus alternates between FAAH & HPF2, whatever WCG sends. My log shows 29 HPF2 completed of which 2 invalid (5.06 from early days). Not seen a single 1 bombing or hanging permanently, sometimes sticking in one percentage spot, but CPU counter uninterrupted until 100%. Sometimes it will just sit there for longer time at the end, probably doing some indexing of the result and prepping it for transmittal barely using CPU time. Last few days even figured out to uptimize for HT, running UD+BOINC simultaneous......just happy as a clam, so guess i'm fortunate.
----------------------------------------On your hi probability observation with the XP platform, I believe to have read that WU's send / the results returned for different platforms are not mixed i.e. macs go to macs only, linux to linux, win to win. Stand to be corrected on that. *** Coming BOINC version has CPU instruction set recognition, so it will allow even more precise distribution and result matching. Errata: As i was writing, now see Didactylos was faster to hit the send button....at least its not a factor in the inconclusive mix.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Jul 8, 2006 9:29:52 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So, any Invalid results that you get now for new work units are interesting to the tech team. Whoa, so I should be reporting every Invalid result that was generated with 5.07 (say, all WUs first issued in July, to be safe)? That's an awful lot of results; about a third of my results end up Invalid, and out of the ones that end up Valid, I see something like half have an Invalid or two from someone else logged. Instead of my reporting all those, you may as well just watch for my Host ID and check every unit I crunch, since most of them are apparently "interesting" (in the Chinese curse sense). Blecch. Is this just me, then? I thought everyone was still seeing a lot of Invalids. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One of the techs did mention that some issues were caused by problems on the client, not with the work unit. You should be prepared to learn that it's something you have to fix yourself.
Might not be, though. We're not likely to get much further until after the weekend. And you're right - the techs can easily mine the database for invalid results. What they are most interested in are stalled or abortive work units, and what happened when they died. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Let me qualify.....i'm seeing invalids from others on WU's i've crunched, but not many.....and as said, have only been marked with 2 which were certainly 5.06 when i crunched them.
----------------------------------------Yesterday i got 2 send that had 3 done, marked inconclusive.....put them ahead in my queue and had immediate validation, so it looks to me things are stabilising. PS, only WCG techs can monitor your Host ID......meantime, i dont know if you run more machines, but i'd be worried about your machine PS: Didactylus, found where you come from....its not from 'behind' a Stargate μηνιν άειδε θεά Πηληϊάδεω άχιληος ουλομένην, η μυρί' άχαιοις άλγε'
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Jul 8, 2006 9:46:49 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, standard defensive "it can't be ME" reaction aside, this machine is rock solid. Prime95 stable for 24+ hours, never got Invalids or Errors from HPF1, FAAH or other DC work.
----------------------------------------I would argue that HPF2 should be expected to work on any system which can crunch other BOINC and similar DC projects for weeks or months without errors. If HPF2 has managed to find some obscure thing that affects only it, on systems that work flawlessly otherwise, then pragmatically speaking, it's still going to end up as HPF2's problem to cope with. If it turns out that on AMD X2 systems with socket 939 and PC4000 RAM the floating point FUBAR instruction (used only in HPF2) gives results which are wrong in the 15th decimal place, HPF2 will have to find a way to work around it. Of course, if it turns out that only my system has such an error, then I'll just have to switch back to FAAH or something. There's no way I can find or fix such an obscure thing, when the machine runs perfectly otherwise and I have no other clues. [Edit 1 times, last edit by Former Member at Jul 9, 2006 1:18:02 AM] |
||
|
|
olympic
Senior Cruncher Joined: Jun 12, 2005 Post Count: 156 Status: Offline |
I have 2 dual-core AMD Opteron 939 machines crunching with BOINC and they continued to throw out invalids with Rosetta 5.07. I'm guessing about 1/3 of all results returned turned out invalid. They are both overclocked but passed all the standard stability tests and never had a problem with HPF1, FAAH, etc. So what I have done is switched to FAAH only for a while until all the inconclusives grind their way through the system. At that point I'll crunch some more HPF2's(maybe 10-20 WU's) and see what happens. Maybe by then the bugs will be found and squashed. ;)
----------------------------------------![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
AMD 939 Socket times 3 is not a great statistical value, but if WCG have the capability to analyse which processors cause the large fallout, they will..... Think WCG knows what CPU it sends to or receives from in reasonable detail. Soon it will optimise as it will also be looking at the instruction sets of CPU's (BOINC Only?).
----------------------------------------My P4 2.53 HT has done now 30 odd HPF2 of which 2 invalids on 5.06, and rest valids, all 5.07x, 1 is pending validation. Not many but confirms to me my platform stability.....i.e got 100% on HPF2 5.07x. .....since my last re-image i boot only when required by software updates....that's now well over 7x24 ago, maybe come 'critical patch' Tuesday again. Used to have a BSOD every 3rd day, due a very hi memory address intermittent. PS FUBAR, i'm not too familiar with american acronyms but recollect something that sounded like foobar ![]()
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jul 9, 2006 12:46:23 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well done, Sekerob.
It is of course the first few words of the Iliad (lacking a couple of pothooks). The reference is to our beloved leader, J. D. "Illiad" Frazer - creator of UF. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I got one of those errors recently and result just returned to WCG as follows:
Device Name: BlackBart00 Team ID: Boinc 10296 Acct Nr: 205644 Project Applic Name Report deadline Status WCG hpf2 5.07 za110_00549_0 7/15/2006 10: Computation Error 7/9/2006 4:59:13AM Unrecoverable error for result za110_00549_0 (The environment is incorrect. (0xa)-exit code 10 (0xa)) hope this helps PaulT |
||
|
|
|