| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 24
|
|
| Author |
|
|
Papa3
Senior Cruncher Joined: Apr 23, 2006 Post Count: 360 Status: Offline Project Badges:
|
I'm getting some insane "To Completion" figures on faah3180_indazoleOH_benzylSO3H_MIN_xMut tasks. I currently have five completed, with actual CPU times from 04:07 to 04:13 (way too short IMO - Faah jobs should run at least 12 hours!), with the rest estimated at 10:17 or 23:47 (2 to 6 times the known actual run time).
Has the BOINC 5.10.30 scheduler lost its mind, or is the Faah server feeding the BOINC client a big load of nonsense? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Papa3,
My 2 FAAH work units look completely normal. A few days ago we had a server reset and create a bunch of flop estimates using default values that were completely wrong in the work units. This caused a problem with schedulers. Back then, knreed said that he would create some batches with his own flop estimates and then return things to normal at the end of the week. We had one member post a query because he had just received a HCC unit and a FAAH unit with identical flop estimates, which he regarded as suspicious. I don't know where we are right now with the scheduler changing estimated duration correction factors and flop estimates, but this storm in a tea cup will eventually settle down. At least, I think it will. Lawrence |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi everybody,
----------------------------------------I experienced the following problem yesterday, and I suppose that unfortunately one of my best hosts (at a remote location) is experiencing the same situation I would appreciate if a watch-dog functionality could be added for avoiding such situations (energy nonsense) ! |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hi Yves,
----------------------------------------What OS is this on, obviously still eating CPU time. Any updates to the security software and what brand would this be? FAAH 5.42 is the most proven science app we have. WCG's armstdj reported that it may need a new compile for Mac using a newer BOINC API. Runaway sciences have a build in time out. Considering that statistically e.g. HPF2 can still finish after 6x predicted time and here the 16 hours running at zero %, yet with only 7.36 hours to complete, may indicate it's a stuck percent. No good, so will alert techs. Suggest meantime *Suspending* WCG entirely in Project Tab and Resume after a minutes to see it gets unstuck. If not, abort, but before, check the result status page if the quorum partner has returned the unit successfully. ttyl
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi Sekerob,
----------------------------------------thanks for your answer. I came quickly to my remote location where my "powerful" host is and I found a "non-responding" boinc manager. "Log-off - logon" did not bring any improvement. I had to restart the system and after WU reporting I have around 3 pages of WUs in error. Some WUs brought boinc up-side-down, I don't know why and which one ! I have no idea if I should suspect HCC or FAAH ! Finally only DDDT WUs have been performed correctly on this particular host during the last 2 or 3 days. The reporting to WCG stopped and it seems that boinc manager crashed after no computable WUs (in this case DDDT) were available. It is the first time that such an event occurred (since mid of August 2007). It is the double Quad-Xeon system, 4GB RAM running WinXP 64. The system is currently only devoted to WCG. Perhaps, could you find something ? If necessary, you can contact me directly, Jean has now my direct contact data ! Cheers, |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have a same problem...... Now for 2 days.... I used to do at least 1,5-2Hours per project for 2 projects at the sametime and recieving for it at less 30+(*2) Units and now it's 8-9 Hours for project... with only 69+/- Points and if i want to restart a system it come's up with some like 12+ Hours as an error.!!!! WTF IS THAT AND Should i start uninstalling it and Game On it or P.H.D MAY Start fixing it?
Thanks ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
If you did 2 hours for 30 points per job, that is 15 per hour per core so must be a pretty fast machine. If they now run 8-9 hours, which is correct given the new are 4-5 times longer, than the claim should still be 15 per core per hour. If you post us the detail quorum of several old work units and new we can compare. For that go to Result Status page, click on the offending work unit names and post the detail here. From same machine please.
----------------------------------------And, did you check if by any chance your benchmark was run inbetween and had a bad hairday. cheers Added: This is an example we'd like to see: dddt0401k0625_ ZINC01392641-0000_ 00_ 0-- Valid 03/02/2008 11:42:20 03/04/2008 07:11:01 1.18 25.2 / 22.1 dddt0401k0625_ ZINC01392641-0000_ 00_ 1-- Valid 03/02/2008 11:32:41 03/06/2008 13:50:07 1.79 19.0 / 22.1
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 7, 2008 10:27:44 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
faah3213_ ZINC02900792_ xMut_ md00110_ 01_ 1-- M5 Error 03/06/2008 07:57:24 03/06/2008 23:19:50 3.85 64.2 / 0.0
faah3157_ Acetasemide2_ MIN_ xMut_ md06910_ 07_ 0-- M5 Valid 02/25/2008 03:34:09 02/25/2008 17:15:05 0.89 13.8 / 14.7 faah3219_ ZINC01718486_ xMut_ md00170_ 01_ 0-- (.........8:42:34) faah3219_ ZINC01628183_ xMut_ md00170_ 00_ 0-- (.........8:42:34) faah3216_ ZINC03954652_ xMut_ md00140_ 01_ 1-- M5 Pending Validation 03/06/2008 20:01:12 03/07/2008 09:59:20 4.74 79.0 / 0.0 faah3216_ ZINC05707101_ xMut_ md00140_ 00_ 0-- M5 Pending Validation 03/06/2008 19:40:00 03/07/2008 09:59:20 4.84 80.8 / 0.0 faah3215_ ZINC04994532_ xMut_ md00130_ 01_ 1-- M5 Valid 03/06/2008 15:48:05 03/07/2008 07:46:31 2.84 47.4 / 44.0 |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Please as per my request go one level deeper by clicking on the Work Unit Name and post detail. Also please post the detail of that 8-9 hour job where the credit is 69+/-.
----------------------------------------What i see here is short jobs and long jobs equally claiming the same per hour. In some cases the actual credit is a little more a little less, but that's how the points system works. ttyl
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The point that i'm making is that the lesser system works the more it gets,. the more it works the lesser it gets and returns with errors. Also on restart of a system it comes back as an error even if i exited as it should.
I will send you when the 6+ /8+ hours are going to go for an update.! |
||
|
|
|