Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 6
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1820 times and has 5 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
angry can't suspend workunit!

Just downloaded my first AC@H units. Gah! I want to finish the workunits my computer was almost finished with, then start on these new ones. I hit suspend for the six AC@H units I just got, and THEY WON'T SUSPEND. Even though they all show "task suspended" two of them are still crunching away. (To add insult to injury 6 of the units I had downloaded for other projects but HAD NOT STARTED on yet errored out with "Can't create shared memory: system shmat" as soon as I hit to suspend the AC@H units.)
ach1_2_34_4 and ach1_2_32_1 are the two that will not suspend.
And, oh yeah, the cpu time count is not changing, but the % complete and time to completion are. confused
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 20, 2007 10:22:05 PM]
[Sep 20, 2007 10:19:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused Re: can't suspend workunit!

Hi sedgecom,
Better start out by telling us which version of BOINC you are running, OS, system and then tell us which projects you are running and information on preferences such as queue size, etc.
Lawrence
[Sep 20, 2007 10:27:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: can't suspend workunit!

I only run WCG, no other projects. Right now I have 17 workunits in my que:6 AC@H, 7 FA@H, and 4 dddt. I have my que size set to keep three days of workunits, so what I have in there is about right. (The AC@H units have an est finish of 10 hours each.) MAC OSX for my OS. Here's the messages from the last time I booted, they have the rest of the stuff you need I hope.
Fri Sep 14 21:26:22 2007||Starting BOINC client version 5.8.16 for i686-apple-darwin
Fri Sep 14 21:26:22 2007||Libraries: libcurl/7.15.5 OpenSSL/0.9.7l zlib/1.2.3
Fri Sep 14 21:26:22 2007||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU T7600 @ 2.33GHz [x86 Family 6 Model 15 Stepping 6]
Fri Sep 14 21:26:22 2007||Memory: 2.00 GB physical, 10.97 GB virtual
Fri Sep 14 21:26:22 2007||Disk: 185.99 GB total, 10.73 GB free
Fri Sep 14 21:26:22 2007|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 173443; location: home; project prefs: home

Anything else you need?
[Sep 20, 2007 10:41:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: can't suspend workunit!

To add to Lawrence' info request, regarding the ".... system shmat"? Can you post the actual messages by copy/pasting them from BOINC plus about 10 lines before and after the shared memory error.

If AC@H has a deadline of 5 days, having 6 of them and with estimates of 10-20 hours run time each on top and other jobs in the queue having like 9 or 11 days, BOINC will often work off those with the earliest deadlines first, particular if it thinks in a state of panic.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 20, 2007 10:45:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: can't suspend workunit!

and I know that BOINC will preempt some jobs, but the two it preempted this time only had three minutes left each.
threw in my benchmarks since it ran them earlier today:
Thu Sep 20 07:31:05 2007||Running CPU benchmarks
Thu Sep 20 07:31:05 2007||Suspending computation - running CPU benchmarks
Thu Sep 20 07:32:06 2007||Benchmark results:
Thu Sep 20 07:32:06 2007|| Number of CPUs: 2
Thu Sep 20 07:32:06 2007|| 1887 floating point MIPS (Whetstone) per CPU
Thu Sep 20 07:32:06 2007|| 5414 integer MIPS (Dhrystone) per CPU
Thu Sep 20 07:32:07 2007||Resuming computation


Thu Sep 20 18:00:49 2007|World Community Grid|Starting faah2338_ZINC00388321_xmd01440_00_0
Thu Sep 20 18:00:49 2007|World Community Grid|Starting task faah2338_ZINC00388321_xmd01440_00_0 using faah version 541
Thu Sep 20 18:01:02 2007|World Community Grid|Starting dddt0101a0077_ZINC04382588-0000_00_1
Thu Sep 20 18:01:03 2007|World Community Grid|Starting task dddt0101a0077_ZINC04382588-0000_00_1 using dddt version 508
Thu Sep 20 18:01:09 2007|World Community Grid|Starting dddt0101a0078_ZINC02619305-0000_02_0
Thu Sep 20 18:01:09 2007|World Community Grid|Starting task dddt0101a0078_ZINC02619305-0000_02_0 using dddt version 508
Thu Sep 20 18:01:12 2007|World Community Grid|Starting dddt0101a0078_ZINC03501250-0001_02_1
Thu Sep 20 18:01:13 2007|World Community Grid|Starting task dddt0101a0078_ZINC03501250-0001_02_1 using dddt version 508
Thu Sep 20 18:01:26 2007|World Community Grid|Starting dddt0101a0080_ZINC02638537-0000_03_1
Thu Sep 20 18:01:27 2007|World Community Grid|Deferring communication for 1 min 0 sec
Thu Sep 20 18:01:27 2007|World Community Grid|Reason: Unrecoverable error for result dddt0101a0080_ZINC02638537-0000_03_1 (Can't create shared memory: system shmat)
Thu Sep 20 18:01:27 2007|World Community Grid|Starting dddt0101a0080_ZINC02871652-0001_00_1
Thu Sep 20 18:01:28 2007|World Community Grid|Deferring communication for 1 min 0 sec
Thu Sep 20 18:01:28 2007|World Community Grid|Reason: Unrecoverable error for result dddt0101a0080_ZINC02871652-0001_00_1 (Can't create shared memory: system shmat)
Thu Sep 20 18:01:28 2007|World Community Grid|Computation for task dddt0101a0080_ZINC02638537-0000_03_1 finished
Thu Sep 20 18:01:28 2007|World Community Grid|Output file dddt0101a0080_ZINC02638537-0000_03_1_0 for task dddt0101a0080_ZINC02638537-0000_03_1 absent
Thu Sep 20 18:01:28 2007|World Community Grid|Output file dddt0101a0080_ZINC02638537-0000_03_1_1 for task dddt0101a0080_ZINC02638537-0000_03_1 absent
Thu Sep 20 18:01:28 2007|World Community Grid|Starting dddt0101a0080_ZINC03216912-0000_01_1
Thu Sep 20 18:01:29 2007|World Community Grid|Deferring communication for 1 min 0 sec
Thu Sep 20 18:01:29 2007|World Community Grid|Reason: Unrecoverable error for result dddt0101a0080_ZINC03216912-0000_01_1 (Can't create shared memory: system shmat)
Thu Sep 20 18:01:29 2007|World Community Grid|Computation for task dddt0101a0080_ZINC02871652-0001_00_1 finished
Thu Sep 20 18:01:29 2007|World Community Grid|Output file dddt0101a0080_ZINC02871652-0001_00_1_0 for task dddt0101a0080_ZINC02871652-0001_00_1 absent
Thu Sep 20 18:01:29 2007|World Community Grid|Output file dddt0101a0080_ZINC02871652-0001_00_1_1 for task dddt0101a0080_ZINC02871652-0001_00_1 absent
Thu Sep 20 18:01:29 2007|World Community Grid|Starting dddt0101a0080_ZINC03216912-0000_02_0
Thu Sep 20 18:01:30 2007|World Community Grid|Deferring communication for 1 min 0 sec
Thu Sep 20 18:01:30 2007|World Community Grid|Reason: Unrecoverable error for result dddt0101a0080_ZINC03216912-0000_02_0 (Can't create shared memory: system shmat)
Thu Sep 20 18:01:30 2007|World Community Grid|Starting dddt0101a0080_ZINC03223725-0000_00_0
Thu Sep 20 18:01:31 2007|World Community Grid|Deferring communication for 1 min 9 sec
Thu Sep 20 18:01:31 2007|World Community Grid|Reason: Unrecoverable error for result dddt0101a0080_ZINC03223725-0000_00_0 (Can't create shared memory: system shmat)
Thu Sep 20 18:01:31 2007|World Community Grid|Computation for task dddt0101a0080_ZINC03216912-0000_01_1 finished
Thu Sep 20 18:01:31 2007|World Community Grid|Output file dddt0101a0080_ZINC03216912-0000_01_1_0 for task dddt0101a0080_ZINC03216912-0000_01_1 absent
Thu Sep 20 18:01:31 2007|World Community Grid|Output file dddt0101a0080_ZINC03216912-0000_01_1_1 for task dddt0101a0080_ZINC03216912-0000_01_1 absent
Thu Sep 20 18:01:31 2007|World Community Grid|Computation for task dddt0101a0080_ZINC03223725-0000_00_0 finished
Thu Sep 20 18:01:31 2007|World Community Grid|Output file dddt0101a0080_ZINC03223725-0000_00_0_0 for task dddt0101a0080_ZINC03223725-0000_00_0 absent
Thu Sep 20 18:01:31 2007|World Community Grid|Output file dddt0101a0080_ZINC03223725-0000_00_0_1 for task dddt0101a0080_ZINC03223725-0000_00_0 absent
Thu Sep 20 18:01:31 2007|World Community Grid|Starting faah2342_ZINC01695760_xmd01480_02_1
Thu Sep 20 18:01:31 2007|World Community Grid|Deferring communication for 4 min 29 sec
Thu Sep 20 18:01:31 2007|World Community Grid|Reason: Unrecoverable error for result faah2342_ZINC01695760_xmd01480_02_1 (Can't create shared memory: system shmat)
Thu Sep 20 18:01:32 2007|World Community Grid|Computation for task dddt0101a0080_ZINC03216912-0000_02_0 finished
Thu Sep 20 18:01:32 2007|World Community Grid|Output file dddt0101a0080_ZINC03216912-0000_02_0_0 for task dddt0101a0080_ZINC03216912-0000_02_0 absent
Thu Sep 20 18:01:32 2007|World Community Grid|Output file dddt0101a0080_ZINC03216912-0000_02_0_1 for task dddt0101a0080_ZINC03216912-0000_02_0 absent
Thu Sep 20 18:01:33 2007|World Community Grid|Computation for task faah2342_ZINC01695760_xmd01480_02_1 finished
Thu Sep 20 18:01:33 2007|World Community Grid|Output file faah2342_ZINC01695760_xmd01480_02_1_0 for task faah2342_ZINC01695760_xmd01480_02_1 absent
Thu Sep 20 18:01:33 2007|World Community Grid|Output file faah2342_ZINC01695760_xmd01480_02_1_1 for task faah2342_ZINC01695760_xmd01480_02_1 absent
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 20, 2007 11:16:18 PM]
[Sep 20, 2007 11:04:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: can't suspend workunit!

and I know that BOINC will preempt some jobs, but the two it preempted this time only had three minutes left each.

Yep, we know of that one.... the client thinks it's finished, showing like 99.9 or 100% complete in the tasks tab when in fact there is still some mop up housekeeping to do preparing the result file for sending. In that situation BOINC preempts to go to the next job until time comes to finish up. Eventually they go.... the really do.

Will look at your log shortly.


okay, why it is called 'shmat' I dont know, but I've been reading frequently of MAC and BOINC with shared memory issues. Here's an article describing a permanent solution (i hope).

http://www.spy-hill.net/help/apple/SharedMemory.html

Added: Looked it up. SHMAT is a linux/unix term for shared memory segments/mmaps
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 3 times, last edit by Sekerob at Sep 21, 2007 7:16:18 AM]
[Sep 21, 2007 7:02:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread