| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Locked Total posts in this thread: 296
|
|
| Author |
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
I confirm, that if you suspend one beta task on windows XP system, it does suspend properly it was working earlier, but not working now. Same xp laptop, I suspend one of the beta WU and it doesn't suspend. From what I can see on the taskmanager, the beta WU still crunch at roughly 95% while the other WU that boinc says is running only gets CPU once in a while ![]() |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
The current batch of BETA_BETA_ace80* WUs seem to be running OK here (XP-64 SP2 + BOINC 6.2.19 x64 + Intel C2Q; XP-64 SP1 + BOINC 6.2.19 x64 + Intel C2Q; 7Pro-64 + BOINC 6.10.58 x64 + Intel Gulftown + HT). [Edit]: All Valid.
----------------------------------------A quick peek at the messages file of the first of these machines (Q9650 @ 3.8GHz) shows checkpoint intervals from 11-23 min. This machine runs faster than the vast majority out there, so checkpoint intervals on most devices will be longer than this. My personal preference would be for checkpointing about every 10 min, depending on the amount of data to be saved. In my case, electricity comes at 3 different rates (peak/shoulder/off-peak) and if I am at home at peak start and end times (2-8pm M-F) I shut down my 2 least efficient machines, which are the 2 Q9650s. One will hibernate OK but the other requires a full shutdown, and catching a checkpoint of the usually single CEP2 WU is a PITA. Slightly more fequent checkpoints for the new project would be nice, and much more frequent checkpoints for CEP2 would be great (I think they're working on it). --- I note the problems reported above with %CPU time setting being ignored, causing overheating on laptops. Not an issue for me as I run all machines @100%+. Have any of the Windows users checked whether throttling works using Threadmaster or TTHrottle instead of the BOINC setting? Unacceptable to have to use these in the production version, but they may get you thru the betas without meltdowns. (-HTH) threadmastergui link: http://timwells.net/content/threadmaster-gui --- [OT] Since I've strayed slightly off-topic ... I see in the linked description of AutoDock Vina , that Vina is claimed to be much more productive than vanilla AutoDock. Is FAAH still using vanilla AutoDock, and if so, would Vina be more efficient for FAAH? If so, are there plans to migrate FAAH to Vina? [/OT] [Edit 2 times, last edit by Rickjb at Jul 31, 2011 5:53:30 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Have ThreadMasterGUI running in my duo laptop and had trained this on the BETA_BETA to control the second process but since temps are relatively modest from this process set it to 100%. Now one running full throttle and 73%, so good time to track it on W7-32 efficiency.
----------------------------------------For Linux, got 6, listed below. Worst case, 31 minutes on 3:08 hours, while the system was not touched but to do what it's on for: Crunching. 6.12 beta13 BETA_BETA_ace80_0000000_1737_0 02:20:32 (02:07:08) 30-07-2011 02:16 30-07-2011 02:17 13 minutes diff. 6.12 beta13 BETA_BETA_ace80_0000000_3643_0 03:08:07 (02:37:42) 30-07-2011 02:13 30-07-2011 02:13 31 minutes diff. (16.5%) 6.12 beta13 BETA_BETA_ace80_0000000_3301_1 02:16:16 (02:02:00) 30-07-2011 02:12 30-07-2011 02:12 14 minutes diff. 6.12 beta13 BETA_BETA_ace80_0000000_3591_0 01:19:20 (01:12:48) 30-07-2011 00:55 30-07-2011 00:55 7 minutes diff. 6.12 beta13 BETA_BETA_ace80_0000000_0589_0 02:05:10 (01:52:06) 29-07-2011 23:35 29-07-2011 23:37 13 minutes diff. 6.12 beta13 BETA_BETA_ace80_0000000_4515_1 01:56:16 (01:40:10) 29-07-2011 22:59 29-07-2011 23:00 16 minutes diff. Summary reading of the posts so far: 1. The worker task does not suspend at times when told so (OSses?) 2. The worker task does not adhere to throttle control (OSses?) - My W7-32 does adhere at 60% setting showing the typical jagged processor usage profile. Client 6.12.33 3. Part of task's CPU time is being lost on Linux (see task list above) 4. When suspend a task, it keeps running and an additional task is started on Linux. (On W7-32 worker suspends properly) 5. Checkpoint time is not recorded or shown to increment till next checkpoint and stops after few seconds on Linux. Same for CPU column where CPU time is not incrementing, rather jumps from checkpoint to checkpoint, only observed on Linux. 6. Overall progress is not consistent. Seen it jump at end from 89% too 100%. Computing time for 1 job on W7-32 indicated after 65% to finish in 6.5 hours, but finished in 4.56. Missed any? Copy list and add / enhance item. All tasks did finish properly or run without noticing and reported. Linux: 6 of which, 4 valid, 1 PV, 1 Inconclusive (No differences seen in logs) W7-32: 2 of which, 1 valid, 1 IP. Sorry techs, if this causes you to burn some midnight oil. --//-- edit: The throttle adherence was with the 6.12.33 test client. edit2: The inconclusive from my Linux quad turned valid. Now show off Linux as6 of 6 valid. A7th task was netted and will see if any of the reported problems can be replicated, but with just one, it might not. 20 minutes so far between Elapsed and CPU [at 70%], i.e. loss does appear to occur randomly. [Edit 3 times, last edit by Former Member at Jul 30, 2011 7:46:05 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
[OT] Since I've strayed slightly off-topic ... I see in the linked description of AutoDock Vina, that Vina is claimed to be much more productive than vanilla AutoDock. Is FAAH still using vanilla AutoDock, and if so, would Vina be more efficient for FAAH? If so, are there plans to migrate FAAH to Vina? [/OT] Yes, it's a stray and yes FAAH is scheduled for an upgrade [old news in the FAAH forum], after X and Y and Z :D[/OT] --//-- edit: And yes, the 2 tools have no trouble to reign the worker task TMG of course needs to be instructed which one to *manage*. [Edit 1 times, last edit by Former Member at Jul 30, 2011 6:23:11 AM] |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1406 Status: Offline Project Badges:
|
Summary reading of the posts so far: 1. The worker task does not suspend at times when told so (OSses?) Tested this on W7-64: Suspends directly. Tested this on XP32: Suspends after about 2 minutes. Will test it on Linux64 if I get a resend. All machines (incl. 2 laptops) running 100% without heating issues (ambient temperature 24C). [ot]Inefficiency: Noticed that every minute 59 slide*-files are created in WCG project directory. Looks like independent what WCG-project is running and the setting of "Write to disk". Could save a lot of disk-io's if not doing so. I never use graphics for screensaver or whatever. Most systems don't even have a screen connected.[/ot] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Can replicate what Jean observed. Linux 64 11.04, kernel 2.6.38.10, bld 47, quad
1) BETA Vina worker running, 3 others taking near 100% 2) Suspend the Beta job via BOINC Manager (6.10.59), control app shows suspended, but System Monitor shows the vina worker continuing and a 5th job started. 3) [The positive part], unsuspending the Vina job in BM and suspending the 5th running job, regains BM control. 4) The properties screen of the Beta job in Local BM shows 8 minutes difference after 3:04 hours, but the CPU value is static. Checked remote in BM and BOINCTasks showing same. Now to watch what gap happens at end. There are 7 tasks showing completion in the stderr.txt file, the 8th started. The log allows good reconstruction where the gaps set. INFO: No state to restore. Start from the beginning. [07:22:30] Number of tasks = 8 [07:22:30] Starting job 0,CPU time is 0.000000. [07:22:30] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [07:44:34] Finished Job #0 cpu time used 1315.150000 [07:44:34] Starting job 1,CPU time is 1315.150000. [07:44:34] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [08:06:45] Finished Job #1 cpu time used 1319.190000 [08:06:45] Starting job 2,CPU time is 2634.340000. [08:06:45] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [08:28:40] Finished Job #2 cpu time used 1312.920000 [08:28:40] Starting job 3,CPU time is 3947.260000. [08:28:40] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [08:50:33] Finished Job #3 cpu time used 1308.750000 [08:50:33] Starting job 4,CPU time is 5256.010000. [08:50:33] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [09:20:22] Finished Job #4 cpu time used 1781.710000 [09:20:22] Starting job 5,CPU time is 7037.720000. [09:20:22] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [09:50:11] Finished Job #5 cpu time used 1772.170000 [09:50:11] Starting job 6,CPU time is 8809.890000. [09:50:11] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [10:20:07] Finished Job #6 cpu time used 1780.310000 [10:20:07] Starting job 7,CPU time is 10590.200000. [10:20:07] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yep, confirmed: these beta WUs are completely wild, once started they do what they want and ignore BOINC completely. ![]() I'm dealing with this from OS (debian) level (kill -STOP/kill -CONT), so as to not fry my CPU while seeing if the task finishes successfully. [Edit 1 times, last edit by Former Member at Jul 30, 2011 8:41:05 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
While nobody will
----------------------------------------With all that's I've done to break the tasks, they all proof robust and to finish proper... but for that black hole in the time space continuum. So did the last task described in my previous post. There was a 60 second gap in the log at start of task 8, but at the end, last job not logging the CPU time, the RS page shows 2:57 versus 3:21 for Elapsed. Bout 24 minutes variance, and it having gone inconclusive. Now that's a test to see if suspending/pre-empting/resuming control is causing fail by the validator. BETA_ BETA_ ace80_ 0000000_ 4583_ 0-- 1479931 Inconclusive 7/29/11 22:55:45 7/30/11 08:54:08 2.95 74.7 / 0.0 <core_client_version>6.10.59</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [07:22:30] Number of tasks = 8 [07:22:30] Starting job 0,CPU time is 0.000000. .... [10:20:07] Finished Job #6 cpu time used 1780.310000 [10:20:07] Starting job 7,CPU time is 10590.200000. [10:20:07] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [10:53:58] Finished Job #7 cpu time used 1807.580000 10:53:58 (10615): called boinc_finish </stderr_txt> ]]> The number looked for is 42. --//-- edit: strike 'not' :) [Edit 2 times, last edit by Former Member at Jul 30, 2011 9:04:33 AM] |
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
These betas don't seem to like AMD processors, at least not my Phenom II X4 910e. Of 7 completed, 3 invalid, 2 valid, 2 PV.
----------------------------------------This machine has no issues with other sciences. ![]() [Edit 1 times, last edit by kateiacy at Jul 30, 2011 9:24:28 AM] |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1406 Status: Offline Project Badges:
|
Summary reading of the posts so far: 1. The worker task does not suspend at times when told so (OSses?) Tested this on W7-64: Suspends directly. Tested this on XP32: Suspends after about 2 minutes. Will test it on Linux64 if I get a resend. ![]() I got that resend on my Linux64 laptop: Suspending all Ready to start tasks. Suspend the BETA. BETA goes on running including writing to disk. E.g.: The stdout.txt changed: WARNING: The search space volume > 27000 Angstrom^3 (See FAQ) In Boinc Manager the progress increments normal. At 12.5 % the BETA process suspends really now, because of reaching his 1st checkpoint. |
||
|
|
|