Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Locked
Total posts in this thread: 296
Posts: 296   Pages: 30   [ Previous Page | 21 22 23 24 25 26 27 28 29 30 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 941849 times and has 295 replies Next Thread
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Re: New Beta Starting 2011/07/22

I confirm, that if you suspend one beta task on windows XP system, it does suspend properly



it was working earlier, but not working now. Same xp laptop, I suspend one of the beta WU and it doesn't suspend. From what I can see on the taskmanager, the beta WU still crunch at roughly 95% while the other WU that boinc says is running only gets CPU once in a while
----------------------------------------

[Jul 30, 2011 5:14:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Re: New Beta Starting 2011/07/22

The current batch of BETA_BETA_ace80* WUs seem to be running OK here (XP-64 SP2 + BOINC 6.2.19 x64 + Intel C2Q; XP-64 SP1 + BOINC 6.2.19 x64 + Intel C2Q; 7Pro-64 + BOINC 6.10.58 x64 + Intel Gulftown + HT). [Edit]: All Valid.

A quick peek at the messages file of the first of these machines (Q9650 @ 3.8GHz) shows checkpoint intervals from 11-23 min. This machine runs faster than the vast majority out there, so checkpoint intervals on most devices will be longer than this. My personal preference would be for checkpointing about every 10 min, depending on the amount of data to be saved. In my case, electricity comes at 3 different rates (peak/shoulder/off-peak) and if I am at home at peak start and end times (2-8pm M-F) I shut down my 2 least efficient machines, which are the 2 Q9650s. One will hibernate OK but the other requires a full shutdown, and catching a checkpoint of the usually single CEP2 WU is a PITA.
Slightly more fequent checkpoints for the new project would be nice, and much more frequent checkpoints for CEP2 would be great (I think they're working on it).
---
I note the problems reported above with %CPU time setting being ignored, causing overheating on laptops. Not an issue for me as I run all machines @100%+. Have any of the Windows users checked whether throttling works using Threadmaster or TTHrottle instead of the BOINC setting? Unacceptable to have to use these in the production version, but they may get you thru the betas without meltdowns. (-HTH)
threadmastergui link: http://timwells.net/content/threadmaster-gui
---
[OT] Since I've strayed slightly off-topic ... I see in the linked description of AutoDock Vina , that Vina is claimed to be much more productive than vanilla AutoDock. Is FAAH still using vanilla AutoDock, and if so, would Vina be more efficient for FAAH? If so, are there plans to migrate FAAH to Vina? [/OT]
----------------------------------------
[Edit 2 times, last edit by Rickjb at Jul 31, 2011 5:53:30 AM]
[Jul 30, 2011 5:37:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: New Beta Starting 2011/07/22

Have ThreadMasterGUI running in my duo laptop and had trained this on the BETA_BETA to control the second process but since temps are relatively modest from this process set it to 100%. Now one running full throttle and 73%, so good time to track it on W7-32 efficiency.

For Linux, got 6, listed below. Worst case, 31 minutes on 3:08 hours, while the system was not touched but to do what it's on for: Crunching.

6.12 beta13 BETA_BETA_ace80_0000000_1737_0 02:20:32 (02:07:08) 30-07-2011 02:16 30-07-2011 02:17 13 minutes diff.
6.12 beta13 BETA_BETA_ace80_0000000_3643_0 03:08:07 (02:37:42) 30-07-2011 02:13 30-07-2011 02:13 31 minutes diff. (16.5%)
6.12 beta13 BETA_BETA_ace80_0000000_3301_1 02:16:16 (02:02:00) 30-07-2011 02:12 30-07-2011 02:12 14 minutes diff.
6.12 beta13 BETA_BETA_ace80_0000000_3591_0 01:19:20 (01:12:48) 30-07-2011 00:55 30-07-2011 00:55 7 minutes diff.
6.12 beta13 BETA_BETA_ace80_0000000_0589_0 02:05:10 (01:52:06) 29-07-2011 23:35 29-07-2011 23:37 13 minutes diff.
6.12 beta13 BETA_BETA_ace80_0000000_4515_1 01:56:16 (01:40:10) 29-07-2011 22:59 29-07-2011 23:00 16 minutes diff.

Summary reading of the posts so far:

1. The worker task does not suspend at times when told so (OSses?)
2. The worker task does not adhere to throttle control (OSses?) - My W7-32 does adhere at 60% setting showing the typical jagged processor usage profile. Client 6.12.33
3. Part of task's CPU time is being lost on Linux (see task list above)
4. When suspend a task, it keeps running and an additional task is started on Linux. (On W7-32 worker suspends properly)
5. Checkpoint time is not recorded or shown to increment till next checkpoint and stops after few seconds on Linux. Same for CPU column where CPU time is not incrementing, rather jumps from checkpoint to checkpoint, only observed on Linux.
6. Overall progress is not consistent. Seen it jump at end from 89% too 100%. Computing time for 1 job on W7-32 indicated after 65% to finish in 6.5 hours, but finished in 4.56.

Missed any? Copy list and add / enhance item.

All tasks did finish properly or run without noticing and reported.

Linux: 6 of which, 4 valid, 1 PV, 1 Inconclusive (No differences seen in logs)
W7-32: 2 of which, 1 valid, 1 IP.

Sorry techs, if this causes you to burn some midnight oil.

--//--

edit: The throttle adherence was with the 6.12.33 test client.

edit2: The inconclusive from my Linux quad turned valid. Now show off Linux as6 of 6 valid. A7th task was netted and will see if any of the reported problems can be replicated, but with just one, it might not. 20 minutes so far between Elapsed and CPU [at 70%], i.e. loss does appear to occur randomly.
----------------------------------------
[Edit 3 times, last edit by Former Member at Jul 30, 2011 7:46:05 AM]
[Jul 30, 2011 6:08:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: New Beta Starting 2011/07/22

[OT] Since I've strayed slightly off-topic ... I see in the linked description of AutoDock Vina, that Vina is claimed to be much more productive than vanilla AutoDock. Is FAAH still using vanilla AutoDock, and if so, would Vina be more efficient for FAAH? If so, are there plans to migrate FAAH to Vina? [/OT]

Yes, it's a stray and yes FAAH is scheduled for an upgrade [old news in the FAAH forum], after X and Y and Z :D[/OT]

--//--

edit: And yes, the 2 tools have no trouble to reign the worker task TMG of course needs to be instructed which one to *manage*.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 30, 2011 6:23:11 AM]
[Jul 30, 2011 6:15:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1406
Status: Offline
Project Badges:
Re: New Beta Starting 2011/07/22

Summary reading of the posts so far:

1. The worker task does not suspend at times when told so (OSses?)

Tested this on W7-64: Suspends directly.
Tested this on XP32: Suspends after about 2 minutes.
Will test it on Linux64 if I get a resend. biggrin

All machines (incl. 2 laptops) running 100% without heating issues (ambient temperature 24C).

[ot]Inefficiency: Noticed that every minute 59 slide*-files are created in WCG project directory. Looks like independent what WCG-project is running and the setting of "Write to disk". Could save a lot of disk-io's if not doing so. I never use graphics for screensaver or whatever. Most systems don't even have a screen connected.[/ot]
[Jul 30, 2011 7:17:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: New Beta Starting 2011/07/22

Can replicate what Jean observed. Linux 64 11.04, kernel 2.6.38.10, bld 47, quad

1) BETA Vina worker running, 3 others taking near 100%
2) Suspend the Beta job via BOINC Manager (6.10.59), control app shows suspended, but System Monitor shows the vina worker continuing and a 5th job started.
3) [The positive part], unsuspending the Vina job in BM and suspending the 5th running job, regains BM control.
4) The properties screen of the Beta job in Local BM shows 8 minutes difference after 3:04 hours, but the CPU value is static. Checked remote in BM and BOINCTasks showing same.

Now to watch what gap happens at end. There are 7 tasks showing completion in the stderr.txt file, the 8th started. The log allows good reconstruction where the gaps set.

INFO: No state to restore. Start from the beginning.
[07:22:30] Number of tasks = 8
[07:22:30] Starting job 0,CPU time is 0.000000.
[07:22:30] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[07:44:34] Finished Job #0 cpu time used 1315.150000
[07:44:34] Starting job 1,CPU time is 1315.150000.
[07:44:34] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[08:06:45] Finished Job #1 cpu time used 1319.190000
[08:06:45] Starting job 2,CPU time is 2634.340000.
[08:06:45] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[08:28:40] Finished Job #2 cpu time used 1312.920000
[08:28:40] Starting job 3,CPU time is 3947.260000.
[08:28:40] ZINC04866126.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[08:50:33] Finished Job #3 cpu time used 1308.750000
[08:50:33] Starting job 4,CPU time is 5256.010000.
[08:50:33] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[09:20:22] Finished Job #4 cpu time used 1781.710000
[09:20:22] Starting job 5,CPU time is 7037.720000.
[09:20:22] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[09:50:11] Finished Job #5 cpu time used 1772.170000
[09:50:11] Starting job 6,CPU time is 8809.890000.
[09:50:11] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[10:20:07] Finished Job #6 cpu time used 1780.310000
[10:20:07] Starting job 7,CPU time is 10590.200000.
[10:20:07] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[Jul 30, 2011 8:36:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: New Beta Starting 2011/07/22

Yep, confirmed: these beta WUs are completely wild, once started they do what they want and ignore BOINC completely. crying

I'm dealing with this from OS (debian) level (kill -STOP/kill -CONT), so as to not fry my CPU while seeing if the task finishes successfully.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 30, 2011 8:41:05 AM]
[Jul 30, 2011 8:40:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: New Beta Starting 2011/07/22

While nobody will not doubt that the rogueness will have its fix before production [picturing real live pre-empting at switch times... don't know how to force that state easily if it does then], the key issue is the none abiding by the standard throttling control of BM... functions like "while the processor usage is less than X %" would not work either.

With all that's I've done to break the tasks, they all proof robust and to finish proper... but for that black hole in the time space continuum. So did the last task described in my previous post. There was a 60 second gap in the log at start of task 8, but at the end, last job not logging the CPU time, the RS page shows 2:57 versus 3:21 for Elapsed. Bout 24 minutes variance, and it having gone inconclusive. Now that's a test to see if suspending/pre-empting/resuming control is causing fail by the validator.

BETA_ BETA_ ace80_ 0000000_ 4583_ 0-- 1479931 Inconclusive 7/29/11 22:55:45 7/30/11 08:54:08 2.95 74.7 / 0.0

<core_client_version>6.10.59</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[07:22:30] Number of tasks = 8
[07:22:30] Starting job 0,CPU time is 0.000000.
....
[10:20:07] Finished Job #6 cpu time used 1780.310000
[10:20:07] Starting job 7,CPU time is 10590.200000.
[10:20:07] ZINC04969997.pdbqt size = 30 8 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0
[10:53:58] Finished Job #7 cpu time used 1807.580000
10:53:58 (10615): called boinc_finish

</stderr_txt>
]]>

The number looked for is 42.

--//--

edit: strike 'not' :)
----------------------------------------
[Edit 2 times, last edit by Former Member at Jul 30, 2011 9:04:33 AM]
[Jul 30, 2011 9:02:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Re: New Beta Starting 2011/07/22

These betas don't seem to like AMD processors, at least not my Phenom II X4 910e. Of 7 completed, 3 invalid, 2 valid, 2 PV.

This machine has no issues with other sciences.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by kateiacy at Jul 30, 2011 9:24:28 AM]
[Jul 30, 2011 9:23:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1406
Status: Offline
Project Badges:
Re: New Beta Starting 2011/07/22

Summary reading of the posts so far:

1. The worker task does not suspend at times when told so (OSses?)

Tested this on W7-64: Suspends directly.
Tested this on XP32: Suspends after about 2 minutes.
Will test it on Linux64 if I get a resend. biggrin

I got that resend on my Linux64 laptop:

Suspending all Ready to start tasks. Suspend the BETA.
BETA goes on running including writing to disk.
E.g.:
The stdout.txt changed:

WARNING: The search space volume > 27000 Angstrom^3 (See FAQ)
1697212402

0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
*********************************************

In Boinc Manager the progress increments normal.
At 12.5 % the BETA process suspends really now, because of reaching his 1st checkpoint.
[Jul 30, 2011 9:30:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 296   Pages: 30   [ Previous Page | 21 22 23 24 25 26 27 28 29 30 | Next Page ]
[ Jump to Last Post ]
Post new Thread