| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 264
|
|
| Author |
|
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges:
|
TimAndHedy,
Look at this thread to see info to supply for a stuck workunit http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,35845 if you still have it running. Thanks, armstrdj |
||
|
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges:
|
breathesgelatin,
Check the website for the results which are erroring out and post the result log from one of them. If you have not done this before navigate to My Contribution->Result Status and from there you can filter based on result status = error. Click on the status and that will show the result log. Thanks, armstrdj |
||
|
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges:
|
Linux Mint 15 64-bit
Result Name: MCM1_ 0000176_ 2595_ 0-- <core_client_version>7.2.28</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> Commandline = ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_x86_64-pc-linux-gnu -SettingsFile MCM1_0000176_2595.txt -DatabaseFile dataset-17_72_SDG_v1.txt Initializing wcg_learn_limit = 500000 Running [14:03:04]: Computing pass 0 *** glibc detected *** ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_x86_64-pc-linux-gnu: munmap_chunk(): invalid pointer: 0x000000000474bee0 *** ======= Backtrace: ========= [0x5434c2] [0x483ccb] [0x483c37] [0x482b73] [0x42fbe9] [0x44294d] [0x442fc3] [0x443080] [0x42585c] [0x51712b] [0x400449] ======= Memory map: ======== 00400000-00648000 r-xp 00000000 08:03 1310877 /boinc/data/projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_x86_64-pc-linux-gnu 00848000-0084b000 rw-p 00248000 08:03 1310877 /boinc/data/projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_x86_64-pc-linux-gnu 0084b000-00883000 rw-p 00000000 00:00 0 0114e000-047ad000 rw-p 00000000 00:00 0 [heap] 7f00710ae000-7f00710af000 rw-p 00000000 00:00 0 7f00710af000-7f00710b0000 rw-s 00000000 08:03 1310984 /boinc/data/slots/3/boinc_mcm1_3 7f00710b0000-7f00710b1000 ---p 00000000 00:00 0 7f00710b1000-7f00710b8000 rw-p 00000000 00:00 0 [stack:12021] 7f00710b8000-7f00710ba000 rw-s 00000000 08:03 1310937 /boinc/data/slots/3/boinc_mmap_file 7fff2dd93000-7fff2ddb4000 rw-p 00000000 00:00 0 [stack] 7fff2ddfe000-7fff2de00000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] SIGABRT: abort called Stack trace (16 frames): [0x4987cd] [0x47fc80] [0x47fb4b] [0x51fc75] [0x53d9a7] [0x5434c2] [0x483ccb] [0x483c37] [0x482b73] [0x42fbe9] [0x44294d] [0x442fc3] [0x443080] [0x42585c] [0x51712b] [0x400449] Exiting... </stderr_txt> ]]> |
||
|
|
TimAndHedy
Senior Cruncher Joined: Jan 27, 2009 Post Count: 267 Status: Offline Project Badges:
|
TimAndHedy, Look at this thread to see info to supply for a stuck workunit http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,35845 if you still have it running. Thanks, armstrdj I rebooted and it stayed at 100% complete but restarted the processing time back to 0. I aborted the unit. It would be nice to have this purged from the system. It is wasting a lot of processing time, in my case something like 53 hours, who knows on the systems that had it for 10 days. [Edit 1 times, last edit by TimAndHedy at Nov 22, 2013 4:26:27 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi TimAndHedy,
Yes, I would also like an occasional explanation of these problems. They are most common at the start of a project. As the project scientists discover what causes these long-running errors, the numbers are usually reduced, but without any real mention on the board. Lawrence |
||
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
<core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_windows_intelx86 -SettingsFile MCM1_0000112_5309.txt -DatabaseFile dataset-17_72_SDG_v1.txt Initializing wcg_learn_limit = 500000 Running Result.out = 3992966.000000 Run complete, CPU time: 19527.046875 20:19:49 (4808): called boinc_finish Cheers ![]() ![]() |
||
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
I am very suspisious that the WUs are running far longer than is being reported, perhaps as much a 10 times longer. It is hard to be sure since the start time is not logged in stderr.txt. I've got some jobs that I think ran for over 48 hours, but have only logged a few. Is the CPU time being truncated? I've got 8 cores running 24x7, but I am not seeing 8 days of work beeing reported.
----------------------------------------
11/21/2013 0:003:11:09:03 10,264 20 11/20/2013 0:007:01:33:09 21,184 35 11/19/2013 0:008:16:35:48 23,457 61 11/18/2013 0:008:17:02:00 19,716 73 11/17/2013 0:005:23:28:57 14,527 40 11/16/2013 0:004:16:03:17 11,983 28 11/15/2013 0:008:23:41:41 23,589 53 11/14/2013 0:008:17:19:52 23,448 41 11/13/2013 0:006:10:40:09 17,362 30 11/12/2013 0:006:20:12:14 17,122 33 11/11/2013 0:008:04:21:37 19,228 38 11/10/2013 0:006:00:55:42 16,790 31 11/09/2013 0:005:02:35:24 12,500 36 11/08/2013 0:006:06:58:28 18,588 45 Cheers ![]() [Edit - highlight time] ![]() [Edit 1 times, last edit by NixChix at Nov 22, 2013 5:36:40 PM] |
||
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
Are any of you that *know* that you have had jobs running for more than 24 hours seeing that reflected in the server results?
----------------------------------------Also, would someone who has had one please post your results log. I would like to compare to some of mine that I think have been running long. Cheers ![]() ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I haven't seen this described before. I have two WUs running where the estimated time to completion is increasing significantly. The previous four WUs on this machine completed normally.
Properties of task MCM1_0000215_5002_0 Application Mapping Cancer Markers 7.26 Workunit name MCM1_0000215_5002 State Running Received Fri 22 Nov 2013 10:53:43 AM CST Report deadline Mon 02 Dec 2013 10:53:41 AM CST Estimated computation size 39067 GFLOPS CPU time at last checkpoint 05:50:22 CPU time 06:00:02 Elapsed time 06:00:22 Estimated time remaining 10:04:43 <---- This was 06:46:34 when it started. Fraction done 27.560 % Virtual memory size 82.72 MB Working set size 41.02 MB Directory slots/1 Process ID 10709 The other job report is similar. I have two more queued up. So far I have suspended both jobs and restarted them. The estimated time to completion is still going up. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
There is something I have not seen before:
----------------------------------------MCM1_ 0000122_ 9493_ 4-- 726 Too Late 11/22/13 10:29:37 11/23/13 10:46:42 5.22 111.7 / 0.0 MCM1_ 0000122_ 9493_ 3-- 726 Too Late 11/21/13 02:09:56 11/22/13 10:29:09 18.94 109.9 / 0.0 MCM1_ 0000122_ 9493_ 2-- 726 Too Late 11/20/13 05:37:02 11/21/13 00:43:48 4.48 85.3 / 0.0 <Mine MCM1_ 0000122_ 9493_ 1-- - Detached 11/20/13 03:47:02 11/20/13 05:36:35 0.00 0.0 / 0.0 MCM1_ 0000122_ 9493_ 0-- 726 Too Late 11/20/13 02:34:50 11/20/13 08:25:39 3.04 111.7 / 0.0 All of the items except the detached item are marked "Too Late." Will this unit be reissued or is it just a total dud ? I actually have two of these units so this is not the only one like this. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Nov 23, 2013 5:03:07 PM] |
||
|
|
|