Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 19
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
As mentioned in another thread, every single wu on my XP box errors out with something like this:
Result Log Result Name: ZIKA_ 000000055_ x4wtg_ HCV_ NS5B_ 5muts_ wRNA_ 0060_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 38694.40 Apparently this is not happening to others? Yes the rig is old but it's been crunching WCG for years and this year with no problems until this project. Started the thread for visibility because I'm quite curious to know if I'm in some solitary twilight zone on this. The wu elapsed time is something over 10 hours before it fails. I'm aborting the rest of the wus and switching the box to some other project unless/until there's some indication it can be of use on this one. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7675 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please post a log of at least one of the units which exceeded the time limit and the specs of the machine. Thanks.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
i007008
Cruncher Joined: Sep 16, 2005 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Sgt. Joe and others,
----------------------------------------I'm on an i3 laptop Win 8.1. Although it's fairly slow it crunches all WUs effectively. I've crunched about 20 Zika WUs so far, and 2 of those have errored out with a time exceeded message. On my machine the Zika WUs take approximately 8 hours to complete, and it becomes quite clear reasonably quickly that the WU is going to error out, because the Remaining estimated time starts to indicate 2 or 3 days to complete. I've attached an error file. HTH Chris Result Log Result Name: ZIKA_ 000000047_ x4wtg_ HCV_ NS5B_ 5muts_ wRNA_ 0198_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 47738.56 (107904.24G/2.26G) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [15:28:17] Number of tasks = 27 [15:28:17] Running task 0,CPU time at start of task 0 was 0.000000 [15:28:17] ./ZINC12774643.pdbqt size = 33 6 ../../projects/www.worldcommunitygrid.org/zika.x4wtg_HCV_NS5B_5muts_wRNA.pdbqt size = 5279 0 [16:02:11] Finished task #0 cpu time used 1227.140625 [16:02:11] Running task 1,CPU time at start of task 1 was 1227.140625 [16:02:11] ./ZINC12775844.pdbqt size = 23 8 ../../projects/www.worldcommunitygrid.org/zika.x4wtg_HCV_NS5B_5muts_wRNA.pdbqt size = 5279 0 [16:20:52] Finished task #1 cpu time used 695.140625 [16:20:52] Running task 2,CPU time at start of task 2 was 1922.281250 [16:20:52] ./ZINC12775848.pdbqt size = 23 8 ../../projects/www.worldcommunitygrid.org/zika.x4wtg_HCV_NS5B_5muts_wRNA.pdbqt size = 5279 0 [16:42:33] Finished task #2 cpu time used 723.484375 [16:42:33] Running task 3,CPU time at start of task 3 was 2645.765625 [16:42:33] ./ZINC12775852.pdbqt size = 23 8 ../../projects/www.worldcommunitygrid.org/zika.x4wtg_HCV_NS5B_5muts_wRNA.pdbqt size = 5279 0 [17:02:34] Finished task #3 cpu time used 711.640625 [17:02:34] Running task 4,CPU time at start of task 4 was 3357.406250 [17:02:34] ./ZINC12775855.pdbqt size = 23 8 ../../projects/www.worldcommunitygrid.org/zika.x4wtg_HCV_NS5B_5muts_wRNA.pdbqt size = 5279 0 [17:30:02] Finished task #4 cpu time used 732.515625 [17:30:02] Running task 5,CPU time at start of task 5 was 4089.921875 [17:30:02] ./ZINC12776030.pdbqt size = 32 5 ../../projects/www.worldcommunitygrid.org/zika.x4wtg_HCV_NS5B_5muts_wRNA.pdbqt size = 5279 0 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFD9701E002 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.5.0 Dump Timestamp : 05/20/16 08:38:39 Install Directory : C:\Program Files\BOINC\ Data Directory : C:\ProgramData\BOINC Project Symstore : LoadLibraryA( C:\Program Files\BOINC\\dbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:\Program Files\BOINC\\symsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:\Program Files\BOINC\\srcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:\Program Files\BOINC\\version.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:\ProgramData\BOINC\slots\4;C:\ProgramData\BOINC\projects\www.worldcommunitygrid.org ModLoad: 000000009db50000 0000000000207000 C:\ProgramData\BOINC\projects\www.worldcommunitygrid.org\wcgrid_zika_7.05_windows_x86_64 (-exported- Symbols Loaded) Linked PDB Filename : C:\Projects\workspace\scienceApps\DS4L\boinc\vina\x64\Release MDDS\wcgrid_mdds_vina_prod_64.pdb ModLoad: 0000000099d10000 00000000001ac000 C:\windows\SYSTEM32\ntdll.dll (6.3.9600.17936) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 6.3.9600.17031 (winblue_gdr.140221-1952) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17031 ModLoad: 0000000099150000 000000000013e000 C:\windows\system32\KERNEL32.DLL (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 6.3.9600.17031 (winblue_gdr.140221-1952) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17031 ModLoad: 0000000096f40000 0000000000115000 C:\windows\system32\KERNELBASE.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 6.3.9600.17031 (winblue_gdr.140221-1952) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17031 ModLoad: 00000000938b0000 000000000000a000 C:\windows\SYSTEM32\VERSION.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 6.3.9600.17415 (winblue_r4.141028-1500) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17415 ModLoad: 00000000979b0000 00000000000aa000 C:\windows\system32\ADVAPI32.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 0000000097a60000 0000000001519000 C:\windows\system32\SHELL32.dll (6.3.9600.17824) (-exported- Symbols Loaded) Linked PDB Filename : shell32.pdb File Version : 6.3.9600.17031 (winblue_gdr.140221-1952) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17031 ModLoad: 0000000099520000 0000000000177000 C:\windows\system32\USER32.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 00000000975e0000 00000000000aa000 C:\windows\system32\msvcrt.dll (7.0.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.9600.17415 (winblue_r4.141028-1500) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.9600.17415 ModLoad: 0000000099290000 0000000000059000 C:\windows\SYSTEM32\sechost.dll (6.3.9600.17734) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 0000000099700000 0000000000141000 C:\windows\system32\RPCRT4.dll (6.3.9600.17919) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 0000000099a80000 0000000000211000 C:\windows\SYSTEM32\combase.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : combase.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 00000000994b0000 0000000000054000 C:\windows\system32\SHLWAPI.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : shlwapi.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 0000000097840000 000000000014f000 C:\windows\system32\GDI32.dll (6.3.9600.17925) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 6.3.9600.17925 (winblue_ltsb.150703-0600) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17925 ModLoad: 0000000099470000 0000000000036000 C:\windows\system32\IMM32.DLL (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 6.3.9600.17415 (winblue_r4.141028-1500) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17415 ModLoad: 0000000099920000 0000000000152000 C:\windows\system32\MSCTF.dll (6.3.9600.17706) (-exported- Symbols Loaded) Linked PDB Filename : msctf.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 00000000959e0000 0000000000021000 c:\PROGRA~1\BULLGU~1\BULLGU~1\BgAgent.dll (2.0.0.56) (-exported- Symbols Loaded) Linked PDB Filename : w:\bg14\_utils\bgagent\objfre_wnet_amd64\amd64\BgAgent.pdb File Version : 2.0.0.56 Company Name : BullGuard Ltd. Product Name : BullGuard Product Version : 1.0.0.0 ModLoad: 0000000092010000 0000000000032000 C:\windows\SYSTEM32\ntmarta.dll (6.3.9600.17415) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 6.3.9600.16384 (winblue_rtm.130821-1623) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.16384 ModLoad: 000000008c030000 0000000000189000 C:\windows\SYSTEM32\dbghelp.dll (6.3.9600.17787) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 6.3.9600.17787 (winblue_r10.150331-1500) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 6.3.9600.17787 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 12807, Write: 5418, Other 145565 - I/O Transfers Counters - Read: 41825356, Write: 14742106, Other 2508036 - Paged Pool Usage - QuotaPagedPoolUsage: 139232, QuotaPeakPagedPoolUsage: 139392 QuotaNonPagedPoolUsage: 8048, QuotaPeakNonPagedPoolUsage: 9168 - Virtual Memory Usage - VirtualSize: 82579456, PeakVirtualSize: 156938240 - Pagefile Usage - PagefileUsage: 82579456, PeakPagefileUsage: 82579456 - Working Set Size - WorkingSetSize: 6512640, PeakWorkingSetSize: 85016576, PageFaultCount: 224715933 *** Dump of thread ID 6640 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000 - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFD9701E002 - Registers - rax=0000000000000000 rbx=000000009dbd8a50 rcx=000000009dc9b970 rdx=000000009dc9b968 rsi=0000000000000000 rdi=0000000000000000 r8=0000000002d5f0d0 r9=00000000650c3770 r10=000000009dc9b960 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=000000009701e002 rsp=0000000002d5f098 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 - Callstack - ChildEBP RetAddr Args to Child 02d5f090 9dbd9953 9dc9b970 9dc9b968 02d5f0d0 650c3770 KERNELBASE!DebugBreak+0x0 02d5f4e0 9dbd8b8f 00000000 00000000 00000000 00000000 wcgrid_zika_7!boost::archive::detail::iserializer<boost::archive::binary_iarchive,boost::math::quaternion<double> >::load_object_data+0x0 02d5f740 9dbd8a70 9dbd8a50 00000000 00000000 00000000 wcgrid_zika_7!boost::archive::detail::iserializer<boost::archive::binary_iarchive,boost::math::quaternion<double> >::load_object_data+0x0 02d5f770 991513d2 00000000 00000000 00000000 00000000 wcgrid_zika_7!boost::archive::detail::iserializer<boost::archive::binary_iarchive,boost::math::quaternion<double> >::load_object_data+0x0 02d5f7a0 99d25454 991513b0 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 02d5f7f0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Dump of thread ID 32765 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 5.000000, User Time: 156250.000000, Wait Time: 2942289920.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !ct_data+0x0 *** Dump of thread ID 30519787 (state: Unknown): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21477048320.000000, Wait Time: 7343750.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !ct_data+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> [Edit 1 times, last edit by i007008 at May 21, 2016 7:49:07 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
The source of this lies in the initial very short runtime estimates at launch. The overrun in a task is 'usually' set at 10x the 'expected' runtime, so if in your case the initial fpops value translated to 4773 seconds [1:19 hours] but the task content really needed more than 10x that value, the result will break off as then it's deemed broken [looping in vain]. Nothing related to the speed of a device as the slower it is, the more seconds the fpops value converts too (as a function of the client benchmark).
----------------------------------------IIRC one other past project had the multiplier cap set at like 20 times, due enormous variability in runtimes, but don't remember which one [HCMD1 or 2 maybe]. At any rate, the project work generator learns from returned results and new work gets higher fpops capping in the header to allow longer runtimes. P.S rerun the client benchmark [somewhere in the menu]. If it's too high, you get less seconds. For instance if I benchmark my client when the CPU is set at 3.7 Ghz, but then set the CPU to run at 2.5 Ghz [summertime heat reasons], the allowed time is really 2.5/3.7*10 = 6.75 times expected runtime, so the higher your benchmark, the less time you get when the task runs way longer than originally computed past the safety net capping value. [Edit 2 times, last edit by SekeRob* at May 21, 2016 8:22:25 AM] |
||
|
i007008
Cruncher Joined: Sep 16, 2005 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Many thanks for the explanation SekeRob* - much appreciated.
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
hmmm, just looked in the control files for a running zika task
----------------------------------------<rsc_fpops_est>9367187844033.000000</rsc_fpops_est> <rsc_fpops_bound>374687513761320.000000</rsc_fpops_bound> The fpops_bound multiplier is actually set at 40x, wow, proofing that the techs were extremely cautious already at this risk being real. [Edit 1 times, last edit by SekeRob* at May 21, 2016 8:26:04 AM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7675 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
With all due respect to Sekerob* for his explanation, I do note that the cpu time is only about 50% of the elapsed time. This raises the question of is the machine throttled in some fashion or what else might be using the cpu cycles ? I would think an i3 should be able to handle the Zika units in way under 10 hours.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am running Zika using windows 10 on an i7 6700. I am seeing 97 to 99% ratio of cpu / elapsed time.
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
If BOINC were throttled at 50%, then the recorded CPU time, against which the fpops are measured, would essentially double the elapsed time which here is seemingly used as a incorrect labeling in the log.
----------------------------------------If taking the fbound of my machine 374687513761320.000000 with benchmarks of <p_fpops>2588593553.718619</p_fpops> <p_iops>7945375340.226294</p_iops> total 10533968893,944913 My machine would have been allowed 35569 seconds on the task. The tasks have been doing 4:40 hours or so, or 16800 seconds, so the multiplicator of 40 is still running it close and standard time for the assignment is 889 seconds (fpest 9367187844033 / benchmark 10533968893). NB: not sure if BOINC sums integer and flops benchmarks for computing fpops, but it certainly does for calculating credit. If not, the calculation would become 9367187844033 / 2588593553.718619 = 3618 seconds as standard runtime, 144720 as cap (40x standard). That is a whole lot of too much. ... Would have the research how the converter works [think Crystal Pellet once posted the calculation, so that I could go and find]. edit: argh, did not look closely but think BOINC originally showed 1:00:18 as TTC at RtS state... that's 3618 seconds. [Edit 1 times, last edit by SekeRob* at May 21, 2016 1:13:03 PM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
I am running Zika using windows 10 on an i7 6700. I am seeing 97 to 99% ratio of cpu / elapsed time. ATM the log does not make sense why it went into time exceed 47K seconds, when the recorded up to the point of exit was a 5279 seconds. BOINC throttle at 50 or 60% would show as it does... walkclock times at left, annotation of the progress in CPU time at right. My Windows desktop does > 99% efficiency, and Linux desktop near exact 99%, but at 100% CPU time setting. If the problem device is not throttled [though WCG default standard profile setting is like 50 or 60%], then as Sgt.Joe notes, something else could be eating the time. Still no explanation for the exceed situation. |
||
|
|
![]() |