Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 20
|
![]() |
Author |
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Jean: Yes, your wording was a bit imprecise
----------------------------------------![]() [Edit]: Copy _3 of my error-29 WU ts05_b001_ps0000, described at The Bad Type A WUs Thread, has completed, and is PV. Thus we have an example of a WU with some copies completing, but other copies erroring out with a frequency that is way beyond the average WCG device failure rate. [Edit 1 times, last edit by Rickjb at Apr 18, 2010 9:27:08 AM] |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I submitted the 3rd error for ts05_a256_ps0000 this morning about 2:15, yet at 3:24 another copy of that WU was sent out. How are the 3 errors being counted? The standard BOINC server-code checks for larger than the limit when it comes to max "success"-tasks and max "error"-tasks for a wu. So if the limit is set to 3 errors, this means wu won't error-out before the 4th. error has been reported. There's also a limit on max "total" tasks, there outdated server-code checks for larger than this limit also. With more resent code on the other hand this limit won't be exceeded. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The standard BOINC server-code checks for larger than the limit when it comes to max "success"-tasks and max "error"-tasks for a wu. So if the limit is set to 3 errors, this means wu won't error-out before the 4th. error has been reported. There's also a limit on max "total" tasks, there outdated server-code checks for larger than this limit also. With more resent code on the other hand this limit won't be exceeded. OK, yes - that makes sense... because they were getting more than 5 errors when the limit was set to 5, also. Forgot about that. Thanks. |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
(Edited my last post, which responded to JMBoullier.)
----------------------------------------New (?) Information: I have attempted to answer my own question (above): Does CHARMM use Monte Carlo methods ...? I searched the DDDT2 program file, wcg_dddt2_charmm_6.17_windows_intelx86, looking for the ASCII strings "onte" and "MONTE", using the strings program in (the excellent freeware) Readypak Unix-like-utilities-for-Windows. We get: > QMDEFN> Some nuclei will be treated quantum mechanically. > The number of QM path integral atoms = > The number of quasi-particles per atom = > The number of Monte Carlo moves (av) = > The number of Monte Carlo moves (eq) = ... > MONTE CARLO : Sampling from Boltzmann Distribution ... > GA_Evolve: Monte Carlo: Starting temerature : > GA_Evolve: Monte Carlo: Final temerature : > GA_Evolve: Monte Carlo: Temperature increment;frequency : ... > GA_E > MONTE CARLO energies per structure ... > HB MONTE CARLO : GENERATION NUMBER ... > SITE atoms using > Monte Carlo points in ... > MONTE CARLO : > MONTE CARLO : GENERATION # TEMPERATURE $ > ANAL: BOND> If this code is activated in DDDT2, it might explain what we are seeing. Furthermore, I suspect that using results where some copies gave an error while others ran to completion would introduce bias and render the WU scientifically invalid. This might also apply to the forced restarts that mweisensee has performed - see exited with code 29 (0x1d, -227) . That's all up to the scientists to decide, of course. HTH - Rick [Edit 2 times, last edit by Rickjb at Apr 18, 2010 11:01:09 AM] |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Jean: Yes, your wording was a bit imprecise Sure, it's possible that the same WU has copies in error and valid copies altogether. After all it's the purpose of distributing repair copies.![]() But, still, I don't see how I could have found such a quorum "the most consistent one". ![]() Anyway, back to the point, I'll make sure that Uplinger does not miss these strange cases when he comes back tomorrow. They might signal something that the techs and the scientists have not noticed yet. Cheers. Jean. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Monte Carlo principles and in DDDT mentioned in a 2009 paper: http://www.utmb.edu/discoveringdenguedrugs-to...s/Watowich-IDDT-Jun09.pdf
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
mclaver
Veteran Cruncher Joined: Dec 19, 2005 Post Count: 566 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It looks like I got some errors I had not noticed.
----------------------------------------Page: 1 Result Name Device Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit ts05_ b436_ ps0000_ 0-- msi-920 Error 4/14/10 20:50:54 4/16/10 16:19:10 38.47 627.4 / 0.0 ts05_ a175_ ps0000_ 0-- WS-USSP-77417 Error 4/14/10 19:40:02 4/16/10 07:00:57 28.80 542.0 / 0.0 I was not the only one that got an error. Multiple people are getting errors, and both have been resent out and are in progress. I will be curious if anyone can complete these WUs. One ran for 28 hours and one ran for 38 hours. That is a pretty long time to run, and get no credit. All the other errors I had on DDDT2 had no cpu time. Any thought of giving credit to those who processed a long time, before they got an error. Also, if everyone is getting errors, why do they keep going out. On the first one, I got an error after 38 hours, my wingman got an error after 27 hours, it went out to someone else, who got an error after 28 hours, then it went out to two other people who are now in progress. ![]() ![]() ![]() |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mitch,
----------------------------------------If your WUs in error have something like that The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d) in their Result Log, then this case has been abundantly covered in several threads of the DDDT2 forum. In short, it is a "normal" error and your WUs will be credited when their respective quora are complete. |
||
|
mclaver
Veteran Cruncher Joined: Dec 19, 2005 Post Count: 566 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mitch,
----------------------------------------If your WUs in error have something like that The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d) in their Result Log, then this case has been abundantly covered in several threads of the DDDT2 forum. In short, it is a "normal" error and your WUs will be credited when their respective quora are complete.[/quote I will be wait for the credit, one of my errors was that but the other was. Result Log Result Name: ts05_ b436_ ps0000_ 0-- <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> process exited with code 29 (0x1d, -227) </message> <stderr_txt> pctComplete = 0.405600 wcgStepsDone = 1500 wcgSteps1 = 5000 wcgCyclesDone = 20 wcgCycles = 50 pctComplete = 0.406000 wcgStepsDone = 1600 wcgSteps1 = 5000 wcgCyclesDone = 20 wcgCycles = 50 pctComplete = 0.406400 wcgStepsDone = 1700 wcgSteps1 = 5000 wcgCyclesDone = 20 wcgCycles = 50 pctComplete = 0.406800 wcgStepsDone = 1800 wcgSteps1 = 5000 wcgCyclesDone = 20 wcgCycles = 50 pctComplete = 0.407200 wcgStepsDone = 1900 wcgSteps1 = 5000 wcgCyclesDone = 20 wcgCycles = 50 pctComplete = 0.407600 wcgStepsDone = 2000 wcgSteps1 = 5000 wcgCyclesDone Is this the same? - Mitch ![]() ![]() ![]() |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It is the error code which matters, so both WUs are in the right category and will be credited.
----------------------------------------The format of the error message may vary depending on the OS and/or the BOINC version. |
||
|
|
![]() |