| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 5
|
|
| Author |
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
I would like to came back with an issue that remains frustrating. It is the completely random behaviour of identical machines towards a given project. I exclude beta WU as these are de-facto unreliable and if we crunch them it is logical to accept all issues pertaining to them.
----------------------------------------But for fully operational projects there are two categories: the ones that run on all rigs without a hitch (one error in thousands WU's) and those who either run without a hitch or WU's end up in error. In my humble experience the first "no problem" category we have: FAAH, HCC, HFCC, HCMD2, NRW In the "problematic" category we have C4CW, HPF2 I cannot comment for DDDT2 and CEP2 as I do not have been able to really test. DDDT2 comes at such smaller numbers and on some machines by random that I cannot judge. CEP2 is not yet Windows ready. My question here is the following: Does the computational model of the "no problem" category projects have a fundamental difference with the "problematic" category. The idea here is to try to find clues that would explain a difference in behaviour. What is frustrating is that with the "problematic" category I have to scan and test each machine to check weather it can run the project or not. When the number of machines increases it is a loss of time. And I have no way to know for which machines it will be ok. I have today singled out 5 machines out of 17 that will crunch HPF2 without a problem and 5 that are ok with C4CW. A subset of three are the same for C4CW and HPF2. I standardized the machines to the max but it did not solve this issue. All machines use the same OS, processor, HDD, same firewalls and antivirus software. The 17 machines have three types of mainboards from Asus which are all socket 1366. And even if I check the mainboard groups there appears to be no logical pattern. I wonder of I am the only one experiencing this type of problem or other crunchers that have multiple machines have had a similar experiences. My forum signature does show the little number of years for C4CW and HPF2 as a result of these problems. ![]() [Edit 1 times, last edit by Hypernova at Oct 11, 2010 9:27:02 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Remembering one or more devices of yours having an issue with C4CW, but the reports are extremely small on this science and for the life no idea how 95MB RAM and 115 MB VM operational Peak-Use could cause this when all bores are running this simultaneous.
----------------------------------------The root may be in the memory and how solid and attuned it is to the CPU... is there a fubar function in certain series of CPU's? Very occasionally I have the random HPF2 failing and without exception it's always when 1 or more HFCC's are running on other cores (and likely will when FAAH does) and only on W7 (Vista and XP same). Now, having been on Linux since May 1, without a shadow of doubt, it's a more stable crunching platform. HPF2 on there yet has to have it's first fail, and C4CW, well neither on Windows or Linux crashed or gives invalid results, so where does one look when C4CW: Uses LAMPPS as science engine CEP2: Uses Q-Chem HPF2: Uses Rosetta. FAAH/HFCC: Uses AutoDock 4.x DDDT2: Uses CHARMM HCMD2: Uses MAXdo HCC: Uses Chrystal Vision All of these engines do different things at different ways, up to the scientist to decide what gives a result faster at the required resolution. Surprisingly the repair received are predominantly HFCC/HPF2, and occasional C4CW and sometimes CEP2 and of course quite a few HCMD2, due No Reply... don't ask me why. I wonder of I am the only one experiencing this type of problem or other crunchers that have multiple machines have had a similar experiences. The answer to that is No, you're not alone. The exhaustive discussions on HPF2 /401 and less on /711 error bare witness to that. Rather sooner than later WCG implements the tracking at device/science level error rates so it stops sending them when there is an above normal fail rate... mostly because the "If there is no work..." is presently sending anything if the main choice is not available.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Oct 11, 2010 11:57:01 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've already returned 1000 results for C4CW and I don't remember having errors. To me it's on the "no problem" category. When I crunch on my i7 laptop with 6GB RAM, I have no problems running 8 C4CW WUs at the same time.
P.S.: Hypernova, your forum signature says C4WC instead of C4CW ![]() |
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
My experience is tiny compared to the likes of Hypernova and bono_vox, but I have had no trouble with C4CW or HPF2. Two of my three machines have been running mostly C4CW lately (often 4 C4CW WUs on 4 simultaneous threads) with no problems.
----------------------------------------All my machines run under Ubuntu Linux -- have gone from 9.10 to 10.04 since starting crunching. ![]() |
||
|
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges:
|
P.S.: Hypernova, your forum signature says C4WC instead of C4CW Thank's, bono vox. I missed that one. It make up for a nice misunderstanding. ![]() ![]() |
||
|
|
|