Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test starting Nov 4, 2013 [ Issues Thread ] Version 7.21 |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 152
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No credit for tasks which failed due to elapsed time error ? It looks as if you're correct for now:BETA_ BETA_ 9999981_ 0695_ 3-- 721 Valid 05/11/13 03:08:55 05/11/13 05:11:12 1.99 53.3 / 64.0 BETA_ BETA_ 9999981_ 0695_ 2-- 721 Valid 04/11/13 21:28:34 05/11/13 04:24:54 2.75 74.7 / 64.0 BETA_ BETA_ 9999981_ 0695_ 0-- 721 Error 04/11/13 18:00:38 05/11/13 02:58:33 0.82 28.0 / 0.0 << Maximum elapsed time exceeded BETA_ BETA_ 9999981_ 0695_ 1-- 721 User Aborted 04/11/13 18:00:36 04/11/13 21:28:32 0.00 29.4 / 0.0 You'll have to hope for a manual adjustment after this initial credit process. |
||
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges: |
4 already errored out: - 3 of them "exceeded elapsed time limit 2934.55 (1749949.51G/570.75G)" after 48:54 mins elapsed time (0.81 h CPU time) - 1 of them the same error after 43:48 mins elapsed time (0.73 h CPU time) - 1 PVal after 1.18/1.19 h All of them on my Win7 SP1 64b, i7-3770, 7.2.26, 8GB RAM, 60GB SSD rig running only OS, BOINC and MS Security Essentials. 7 CEP2 WU's left in memory. All 4 I received on my Mac OS X 10.9. (Mavericks), i5-2500S, 7.0.65, 12GB RAM, 1TB HDD, small SSD with OS only - my main PC with ESET CyberSecurity - are still running. 1 FAAH WU left in memory. On both Win and Mac I am shrubbing PrimeGrid PPS SV GPGPU OpenCL AMD/ATI WU's (will not continue with them during this Beta) ETA1: - running 8 tasks concurrent on Win and 4 concurrent on Mac - RAM per task on Mac: 165 - 217MB - RAM per task on Win: 52 - 110MB - LAIM on for both PC and rig ETA2: got a few more errors as mentioned above (one of them claims huuuuuuge credits: 378,4 for 0.99 h CPU time / 1.16 elapsed time), but after techs' intervention, everything seems to run OK. Some WU's went Valid (OK, one of them ATM ), some of them are PVal. CPU time: from 0.02 - 2.67 ETA3: everything seems to be under control, so I am going to bed Cheers and ETA4: on the top of "exceeded elapsed time limit" errors, I got one with "Maximum disk usage exceeded" error on my Win rig. This WU BETA_BETA_9999987_0690 (the one left from the 1st batch) was already aborted by 3 crunchers, all running 6.10.xx (2x 58, 1x 60) BOINC (if this info is important for techs). ETA5: all WU's' (I have gotten 42) CPU time - Elapsed time ratio is as for other sub-projects except of 6 of them (all on my Win rig): 1. BETA_BETA_9999981_0706 (already Valid), 2.53 / 3.81 2. BETA_BETA_9999981_0701 (in PVal), 2.52 / 3.56 3. BETA_BETA_9999981_0844 (Valid), 2.58 / 3.07 4. BETA_BETA_9999987_0317a (PVal), 0.21 / 2.53 5. BETA_BETA_9999987_0727a (PVal), 0.26 / 3.81 6. BETA_BETA_9999987_0237a (PVal), 0.31 / 2.52 Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 [Edit 2 times, last edit by branjo at Nov 5, 2013 9:08:15 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
tony205,
----------------------------------------Not sure what your specific CPU related elapsed time of _3 has anything to do with the time-span to return _1, which was cut off in 1.09 CPU hours. The equation stems from the number of FPOPS a task has in the header (it's a factor of FPOPS, something like factor 5 or 10). For your device that will have been 3.70 to hit the mark. BTW, you just revealed a quorum page bug. It only shows the CPU time, not also the elapsed time as the header implies which makes overviewing situations where CPU time goes missing harder to detect... you can't in fact see it for the wingman how efficiently they processed. FAHV_ x4GW6bINfbA_ 0348104_ 0552_ 1-- 706 Valid 11/4/13 09:21:49 11/5/13 08:28:50 1.29 41.8 / 38.4 FAHV_ x4GW6bINfbA_ 0348104_ 0552_ 0-- 706 Valid 11/4/13 03:43:17 11/4/13 09:21:35 0.76 35.1 / 38.4 This is opposed to the Result Status page where both are displayed, but for the user only: FAHV_ x4GW6bINfbA_ 0348104_ 0552_ 1-- 2524499 Valid 11/4/13 09:21:49 11/5/13 08:28:50 1.29 / 1.31 41.8 / 38.4 'Maximum elapsed time exceeded', is a little bit misleading. A generic term for both measuring CPU and GPU task limits, but it really is in the case of CPU tasks, meant to mean the max FPOPS done in CPU time as exceeded. Edit2: BTW, my summary of Beta test v7.21: 5 on Linux 64 quad, completing to PV state with 99.8+% efficiency. No issues. 2 on W7-32 duo completing to PV state, one having an over 4 hour gap in CPU - Elapsed time. Hands off, this device runs 97%+ efficient, it has been for the full test. BETA_ BETA_ 9999983_ 0728_ 0-- 95711 Pending Validation 11/4/13 18:36:25 11/5/13 04:27:29 5.28 / 9.84 111.6 / 0.0 The log is absolutely picture perfect, no interruptions, with the local Wallclock times being logged for each pass. The start and end bit: Result Log Result Name: BETA_ BETA_ 9999983_ 0728_ 0-- <core_client_version>7.2.18</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_windows_intelx86 -SettingsFile BETA_9999983_0728.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 100000 Running [19:42:39]: Computing pass 0 [19:42:54]: Computing pass 1 [19:43:10]: Computing pass 2 .... [05:26:29]: Computing pass 1846 [05:26:47]: Computing pass 1847 Result.out = 3389269.000000 Run complete, CPU time: 19245.984375 05:27:10 (1020): called boinc_finish </stderr_txt> ]]> Wallclock is sure from 19:42 to 5:26 next day (another member asked for actual start time to be printed in log, seconding that). 21 on W7-64, 8 core. 15 ran to normal completion, of which several have huge CPU-Elapsed gaps. Others run 99% efficient. BETA_ BETA_ 9999982_ 0985_ 2-- 1854592 Valid 11/4/13 19:37:33 11/5/13 03:17:36 2.07 / 6.49 175.8 / 126.6 Where's that time going? The CPU cores in Task Manager are not seen to idle. Up to the point of completion the Elapsed/CPU times seem close [Need to ask Fred of BOINCTasks if feature can be added to highlight if efficiency drops below X percent, similar to the highlight if the checkpoint interval gets too high). A list of the non-erred tasks with last column showing the efficiencies. Mostly 99%, but a few show 32 , 35 and 40%. World Community Grid 7.21 beta17 BETA_BETA_9999982_0007_2 03:29:41 (03:29:15) 05-11-2013 07:47 05-11-2013 07:48 99.793 World Community Grid 7.21 beta17 BETA_BETA_9999981_0984_3 06:30:47 (06:29:41) 05-11-2013 07:36 05-11-2013 07:36 99.719 World Community Grid 7.21 beta17 BETA_BETA_9999986_0883a_2 04:43:16 (04:42:28) 05-11-2013 05:19 05-11-2013 05:20 99.718 World Community Grid 7.21 beta17 BETA_BETA_9999982_0985_2 06:29:15 (02:04:01) 05-11-2013 04:17 05-11-2013 04:17 31.860 World Community Grid 7.21 beta17 BETA_BETA_9999987_0480a_0 05:17:02 (01:49:49) 05-11-2013 03:19 05-11-2013 03:20 34.639 World Community Grid 7.21 beta17 BETA_BETA_9999983_0076_2 02:23:49 (00:58:12) 05-11-2013 01:18 05-11-2013 01:21 40.468 World Community Grid 7.21 beta17 BETA_BETA_9999979_0167_0 04:15:48 (03:48:46) 04-11-2013 22:39 04-11-2013 22:39 89.432 World Community Grid 7.21 beta17 BETA_BETA_9999979_0176_0 03:50:55 (03:48:24) 04-11-2013 22:14 04-11-2013 22:15 98.910 World Community Grid 7.21 beta17 BETA_BETA_9999986_0331a_1 01:22:49 (01:22:43) 04-11-2013 22:05 04-11-2013 22:05 99.879 World Community Grid 7.21 beta17 BETA_BETA_9999986_0794a_1 01:08:57 (01:08:47) 04-11-2013 21:51 04-11-2013 21:52 99.758 World Community Grid 7.21 beta17 BETA_BETA_9999980_0080_0 02:50:11 (02:48:38) 04-11-2013 21:16 04-11-2013 21:16 99.089 World Community Grid 7.21 beta17 BETA_BETA_9999980_0226_1 00:02:51 (00:02:47) 04-11-2013 18:55 04-11-2013 18:56 97.661 World Community Grid 7.21 beta17 BETA_BETA_9999980_0225_1 00:02:53 (00:02:51) 04-11-2013 18:55 04-11-2013 18:56 98.844 To continue for same device 5 had the time exceed, for which the cause was known. 2 Froze on CPU time, Elapsed continuing. Were suspended with LAIM off so they would unload from memory, which they did... not stuck there, and upon resume crashed almost immediately with this error: Result Log Result Name: BETA_ BETA_ 9999988_ 0907a_ 2-- <core_client_version>7.2.23</core_client_version> <![CDATA[ <message> (unknown error) - exit code -529697949 (0xe06d7363) </message> <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_windows_x86_64 -SettingsFile BETA_9999988_0907a.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 100000 Running Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_windows_x86_64 -SettingsFile BETA_9999988_0907a.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 100000 Running Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x000007FEFD8E940D Engaging BOINC Windows Runtime Debugger... ******************** ... followed by a large log dump. One of the above 2 lost all time, the other returned to pass 812 and went south with same error and big dump. edit3: On the (unknown error) - exit code -529697949 (0xe06d7363), found at Berkeley something relating to the Vbox feature, but not running a Vbox version of BOINC. Did find in roundabout this post by developer: http://permalink.gmane.org/gmane.comp.distributed.boinc.user/1626 If you are seeing this error on both 32-bit and 64-bit OSs, it is more than likely a memory allocation error where new or malloc is being passed a negative value. On 32-bit OSs that request would be 2^16 or more and on 64-bit OSs it would be 2^32 or more, in both cases those values would exceed the amount of virtual memory allowed for a user-mode process. [Edit 3 times, last edit by Former Member at Nov 5, 2013 11:50:19 AM] |
||
|
JSYKES
Senior Cruncher Joined: Apr 28, 2007 Post Count: 200 Status: Offline Project Badges: |
I have another 6 Betas on my work PC this morning and none are going anywhere - all on multiple restarts - two had 12 hr ** run times and 4 just 50 mins but none got further than 3mins (and 0%) before restart. I've aborted them too.
----------------------------------------WU's were 9999986_0633a_1**; 0452a_1; 0699a_1; 9999987_0121a_1; 0031a_1 and 9999988_0882a_1** I've had 8 or 9 Betas now and none have progressed!!! |
||
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges: |
I don't even know where to begin. 3 different machines, 10 cores in total, different symptoms. 1. Main PC completed a couple of WUs successfully (actually paused and restarted a WU to see if I can mess with it and it still completed - as far as I can tell), before the whole Boinc client just disappeared into thin air. Had to start Boinc again. Now I still have 3 WUs, 2 are older and have over 2 hours processing time registered with just 45 seconds left on the clock, yet they are at 0.5% 2. Home laptop also completed a WU successfully, but I have several "task xxx exited with zero status but no finished file" messages in the log. Also, the actual computing time is far less than what is registered in time spent (1h10 for 3h20) and the estimate says 6 minutes left even though the actual progress is 22%. 3. Work laptop has all 4 cores fully loaded. Hasn't completed any WU and I also see a lot of those "you may need to reset the project" messages. Looks like we may need a third batch... Edit: forgot to give some specs: - all machines run Win 7 64 bit (different flavors though: PC and home laptop run Ultimate and work runs Professional, of course) - all run Boinc 7.0.64 x64 - PC runs on AMD CPU while the other 2 on Intel Edit2: one of the WUs on the home laptop decided to report as completed all of the sudden, so left with only the troublesome one I decided to restart Boinc. The time elapsed went down to the actual time spent and now it seems to be moving along just fine. Guess that did the trick. Edit3: oh, wait, just noticed that the estimated time is still off - only 18 minutes even though I am at 28% after 1h35. I got 2 WUs on my other work laptop (Win XP SP3, 7.0.64 x86, Intel, running at 100% both cores) and after a lot of "you may need to reset the project" messages in the log and a reboot they looked like they were on the good path (over 1h of processing time) then one by one, they errored out with a C++ runtime message plastered on my screen. This Beta is definitely not the last one...It can't be. Knowledge is limited. Imagination encircles the world! - Albert Einstein [Edit 1 times, last edit by CandymanWCG at Nov 5, 2013 10:28:49 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
All Betas are done now:
2 on Linux-Lubuntu64, concurrent: valid 1 on Win7-32: errored out in the beginning, errored out with exit code -529697949 (i posted the complete log somewhere earlier in this thread) 4 on same Win7-32, concurrent, +1 GPU running: Got stuck very often, but finally turned valid. I noticed they got stuck when i was doing something, like hiding Boinc Mgr, editing local prefs, opening Firefox. By stuck i mean full core usage, no progress, no cpu time progress, didn't stop running when suspended+laim on. I always got them running again with suspend+laim off+unsuspend. Eventually i let them run completely hands off, not using the computer at all. 4 on same Win7-32, VM Linux-Mint-32 vbox, concurrent (native Win client was running only GPU tasks while running these): Didn't get stuck at all (issue Windows only?). Three had quickly (1/sec) updating progress information, the other updated only every 15 seconds, but in larger steps. One of the quick group errored (log attached*) at ~90% progress. The task has two valid wingmen wus now. The other 3 are valid. * -- Result Name: BETA_ BETA_ 9999981_ 0281_ 1-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> Commandline = ../../projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_i686-pc-linux-gnu -SettingsFile BETA_9999981_0281.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing Commandline = ../../projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_i686-pc-linux-gnu -SettingsFile BETA_9999981_0281.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 500000 Running *** glibc detected *** ../../projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_i686-pc-linux-gnu: malloc(): smallbin double linked list corrupted: 0x0c5d6a00 *** ======= Backtrace: ========= [0x81798fe] [0x817cf09] [0x817d799] [0x813ec6a] [0x80b2e99] [0x80aa021] [0x80a9ce5] [0x80acb52] [0x80a4c09] [0x80a6670] [0x80a6e81] [0x80831f1] [0x8085776] [0x8085857] [0x8067e41] [0x814d9c2] [0x8048201] ======= Memory map: ======== 00311000-00312000 r-xp 00000000 00:00 0 [vdso] 08048000-0824f000 r-xp 00000000 08:01 423459 /var/lib/boinc-client/projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_i686-pc-linux-gnu 0824f000-08251000 rw-p 00207000 08:01 423459 /var/lib/boinc-client/projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_i686-pc-linux-gnu 08251000-08276000 rw-p 00000000 00:00 0 09353000-11987000 rw-p 00000000 00:00 0 [heap] b76f2000-b77d8000 rw-p 00000000 00:00 0 b77d9000-b77da000 rw-s 00000000 08:01 429564 /var/lib/boinc-client/slots/3/boinc_mcm1_3 b77da000-b77db000 ---p 00000000 00:00 0 b77db000-b77e2000 rw-p 00000000 00:00 0 b77e2000-b77e4000 rw-s 00000000 08:01 429551 /var/lib/boinc-client/slots/3/boinc_mmap_file bff06000-bff27000 rw-p 00000000 00:00 0 [stack] SIGABRT: abort called Stack trace (22 frames): [0x80d84cd] [0x311400] [0x311416] [0x81563e3] [0x8173f1f] [0x81798fe] [0x817cf09] [0x817d799] [0x813ec6a] [0x80b2e99] [0x80aa021] [0x80a9ce5] [0x80acb52] [0x80a4c09] [0x80a6670] [0x80a6e81] [0x80831f1] [0x8085776] [0x8085857] [0x8067e41] [0x814d9c2] [0x8048201] Exiting... </stderr_txt> ]]> |
||
|
nittany85
Cruncher United States of America Joined: Apr 29, 2007 Post Count: 17 Status: Offline Project Badges: |
Downloaded two Beta Test tasks. Both seemed to be stuck in a loop with the clock resetting to zero after about two minutes of CPU time.
----------------------------------------The reference result name for the taks are BETA_BETA_9999984_0955a_2-- BETA_BETA_9999984_0081a_2-- Here is information on the computer running the tasks; Intel(R) Core(tm) i5-2520M CPU @ 2.50GHz 1(4) Windows 7 [Edit 1 times, last edit by nittany85 at Nov 5, 2013 12:55:50 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I received 2 WU on one of my machines, unfortunate both of them shows the multiple restart problem: after 1-2 minutes the task is restarted without any information in the "Messages" window. Windows 7 x64 SP1, CPU i7.
BETA_ BETA_ 9999983_ 0502_ 2-- HSX-BUC-0002 In Progress 11/5/13 08:57:12 11/7/13 08:57:12 0.00 / 0.00 0.0 / 0.0 BETA_ BETA_ 9999984_ 0833a_ 1-- HSX-BUC-0002 In Progress 11/4/13 18:40:57 11/6/13 18:40:57 0.00 / 0.00 0.0 / 0.0 |
||
|
ashrader330
Advanced Cruncher Joined: Jan 6, 2008 Post Count: 97 Status: Offline Project Badges: |
My machines have completed 9 work units. 6 were done on OpenSUSE 12.3 64 bit and all were valid. I thought one was running long but the wingman ran longer than I did so I guess there were just some long ones. That long one also had the progress bar refresh issue that I mentioned in a previous post. But other than that, they ran well.
----------------------------------------3 were run on Windows 7 Pro 64 bit and only 1 was valid. The first failed with "Maximum elapsed time exceeded" but it ran less than an hour wall time but the CPU time was only a few minutes (0.07 / 0.96). The other one failed with "exit code -529697949 (0xe06d7363)". It seems to have run a normal amount of time with no real big difference between CPU and wall time. Run time: 4.2y HPF2, 6.9y FAAH, 7.9y HFCC, 20.8y HCC, 26.0y CEP2, 26.0y MCM, 2.1y UGM, 2.0y OET WU: 4.8k HPF2, 12.3k FAAH, 12.4k HFCC, 135k HCC, 34.3k CEP2, 43.8k MCM, 4.2k UGM, 19.7k OET |
||
|
ngsmith
Cruncher Joined: Jul 21, 2005 Post Count: 48 Status: Offline Project Badges: |
Received 3 beta tasks, with one currently running.
----------------------------------------11/5/2013 9:15:11 AM | World Community Grid | Restarting task BETA_BETA_9999984_0994a_2 using beta17 version 721 in slot 6 11/5/2013 9:18:16 AM | World Community Grid | Task BETA_BETA_9999984_0994a_2 exited with zero status but no 'finished' file 11/5/2013 9:18:16 AM | World Community Grid | If this happens repeatedly you may need to reset the project. 11/5/2013 9:18:16 AM | World Community Grid | Restarting task BETA_BETA_9999984_0994a_2 using beta17 version 721 in slot 6 11/5/2013 9:21:21 AM | World Community Grid | Task BETA_BETA_9999984_0994a_2 exited with zero status but no 'finished' file 11/5/2013 9:21:21 AM | World Community Grid | If this happens repeatedly you may need to reset the project. 11/5/2013 9:21:21 AM | World Community Grid | Restarting task BETA_BETA_9999984_0994a_2 using beta17 version 721 in slot 6 Same restart issue that occurred on previous beta tasks. Win 7, 64bit, 12 GB, 7.0.64 |
||
|
|