Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Starting 2011/07/22 |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 296
|
Author |
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8976 Status: Offline Project Badges: |
No problems encountered (other than estimated time to completion being higher than actual Elapse) using the following:
----------------------------------------OS: Windows 7 Ultimate BIONIC MGR: 6.10.58 (will upgrade to 6.12.33 after this post) CPU: Dell XPS 9100 i7 X980 3.33 GHz 64-bit OS |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1316 Status: Offline Project Badges: |
Haven't you forgotten the CPU time of the last job (job #7 for an ace80 beta WU)? When I compare the CPU times shown in the Result Status page against what is reported in the Result Log of each WU it matches the CPU time at the start of the last job, not that time plus the CPU time of the last job as it should. Bonjour Jean, If the difference between elapsed- and cpu-time would be caused by not counting 1 job to the total, I would not see an incremental increasing gap between elapsed and cpu during all 8 jobs |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I was able to observe the situation after the first checkpoint of a WU which had been running at 100% when it shouldn't have been. Immediately upon continuing from the first checkpoint, everything was normal. The WU was only using the configured amount of CPU and was reporting current CPU usage constantly, as it should. I believe there was only one wcg_beta13_vina_6.12_i686-pc-linux-gnu process during that time. Then, it spawned 2 sub-processes, and all control/reporting went down the drain.
My laptop is also running 100% of the time now due to the beta, but I don't think you understand. If you just unplug the power (and let it run of the battery) the system will cool done. BOINC will still use 100% of the CPU, but the CPU speed will just slow down (the bios reduces power to the CPU). Unplug it for 5 mins and you will see the difference in temperature. anhhai, this is extremely unhelpful. I'm roughly 15000km from my affected laptop at present. Even if I weren't, I'm not going to sit and unplug/plug in every 10 mins. Totally unacceptable "solution". (... even if it would work for my laptop, which it wouldn't anyhow.) Forcing the process to stop and restart is the only way to deal with it safely, apart from aborting it. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm roughly 15000km from my affected laptop at present. The distance doesn't matter, I think. I am 150 cm away from my affected laptop, and had to restart it 3 times now because it switched itself off to protect from overheating - luckily it does that. This may be acceptable for beta work units, but I do not think it's acceptable for production wu's. (Yes, I know that this laptop is quite at its limits - but still protecting itself, otherwise I wouldn't allow it to run beta units!) edit: on another laptop (ibm t43, Intel Pentium M 2,13 Mhz, xp, sp3) a beta wu has just started; this laptop is my internet computer (Firefox 5 and Opera 11, both running at the same time, but nothing else except Boinc). Boinc is set to use up to 100 % cpu, and I am really impressed by the response (waiting) time... it can take up to 10 seconds before it reacts on typing or a mouse click. Never had that before (no thermal problems, though)! :-( [Edit 1 times, last edit by Former Member at Jul 31, 2011 12:14:34 AM] |
||
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges: |
anhhai, this is extremely unhelpful. I'm roughly 15000km from my affected laptop at present. Even if I weren't, I'm not going to sit and unplug/plug in every 10 mins. Totally unacceptable "solution". (... even if it would work for my laptop, which it wouldn't anyhow.) Forcing the process to stop and restart is the only way to deal with it safely, apart from aborting it. I agree it is an unacceptable solution, i just wanted to give that information out to people who are using their laptop and have betas running. aborting a beta that has been running for a while is a waste of resources. don't worry i am sure uplinger and the others will fix the problem |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
anhhai, this is extremely unhelpful. I'm roughly 15000km from my affected laptop at present. Even if I weren't, I'm not going to sit and unplug/plug in every 10 mins. Totally unacceptable "solution". (... even if it would work for my laptop, which it wouldn't anyhow.) Forcing the process to stop and restart is the only way to deal with it safely, apart from aborting it. I agree it is an unacceptable solution, i just wanted to give that information out to people who are using their laptop and have betas running. aborting a beta that has been running for a while is a waste of resources. don't worry i am sure uplinger and the others will fix the problem anhai, the comment you quoted isn't from me, but I was the one to whom you first made the suggestion. I realized belatedly that you were giving me a way to save my betas -- an alternative to aborting them. So thank you! I tried your idea on my one remaining laptop beta this evening (I had aborted the others, but had kept one and left its machine shut off while I was out all day). Unfortunately, even being on battery power didn't reduce the CPU usage or the temperature. :( These betas really did "run wild" under Linux. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Apart from the control/CPU overheating/CPU timing issues under Linux, there seem (from my very small sample) to be significantly high inconclusive/invalid results. (Maybe a computational difference between machine types?)
----------------------------------------Of the 12 ace80 betas I've received, 58% have involved 3 machines, due to the initial 2-machine result being inconclusive. (One of the invalid results was mine, on a machine which ran another beta simultaneously on another core and was marked valid. None of the logs have shown any sign of errors either.) [edited to update stats on increasing number of inconclusive results, including repair jobs] [Edit 2 times, last edit by Former Member at Jul 31, 2011 8:04:36 PM] |
||
|
darth_vader
Veteran Cruncher A galaxy far, far away... Joined: Jul 13, 2005 Post Count: 514 Status: Offline Project Badges: |
The over-use of CPU is not limited to Linux. On my old creaky laptop, the beta is running at 100% CPU even though it was supposed to be restricted to 75%. This particular system is:
----------------------------------------- 1.3 GHz Pentium M - 1GB - WinXP - BOINC 6.2.28 It's also taking a very long time to run. 90% done at just over 24 hours. I'm curious to see if the result will be valid or not. - D Edit: It completed after 39+ hours and is valid: BETA_ BETA_ ace80_ 0000000_ 2459_ 0-- 612 Valid 7/29/11 22:47:37 7/31/11 18:25:16 39.24 122.1 / 96.6 <-- Mine BETA_ BETA_ ace80_ 0000000_ 2459_ 1-- 612 Valid 7/29/11 22:47:35 7/30/11 01:59:44 2.99 71.1 / 96.6 There does not seem to be anything unusual in the WU's messages, it just took a long time. Result Name: BETA_ BETA_ ace80_ 0000000_ 2459_ 0-- <core_client_version>6.2.28</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [15:48:34] Number of tasks = 8 [15:48:34] Starting job 0,CPU time is 0.000000. [15:48:35] ZINC04801860.pdbqt size = 23 5 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [19:35:41] Finished Job #0 cpu time used 12783.802205 [19:35:41] Starting job 1,CPU time is 12783.802205. [19:35:41] ZINC04801860.pdbqt size = 23 5 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [23:16:44] Finished Job #1 cpu time used 12750.894886 [23:16:44] Starting job 2,CPU time is 25534.697091. [23:16:44] ZINC04801860.pdbqt size = 23 5 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [02:56:01] Finished Job #2 cpu time used 12800.175749 [02:56:01] Starting job 3,CPU time is 38334.872840. [02:56:01] ZINC04801860.pdbqt size = 23 5 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [06:34:42] Finished Job #3 cpu time used 12702.264960 [06:34:42] Starting job 4,CPU time is 51037.137800. [06:34:42] ZINC04760008.pdbqt size = 30 4 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [12:59:13] Finished Job #4 cpu time used 22445.485022 [12:59:13] Starting job 5,CPU time is 73482.622822. [12:59:14] ZINC04760008.pdbqt size = 30 4 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [19:25:46] Finished Job #5 cpu time used 22570.494778 [19:25:46] Starting job 6,CPU time is 96053.117600. [19:25:47] ZINC04760008.pdbqt size = 30 4 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [01:46:38] Finished Job #6 cpu time used 22590.193102 [01:46:38] Starting job 7,CPU time is 118643.310702. [01:46:39] ZINC04760008.pdbqt size = 30 4 ../../projects/www.worldcommunitygrid.org/beta13.target_ace.pdbqt size = 4660 0 [08:13:33] Finished Job #7 cpu time used 22607.377813 08:13:33 (512): called boinc_finish </stderr_txt> ]]> [Edit 1 times, last edit by darth_vader at Jul 31, 2011 7:38:45 PM] |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
Apart from the control/CPU overheating/CPU timing issues under Linux, there seem (from my very small sample) to be significantly high inconclusive/invalid results. (Maybe a computational difference between machine types?) Of the 10 ace80 betas I've received, 40% have involved 3 machines, due to the initial 2-machine result being inconclusive. (None of the invalid results have been mine, but the logs have shown no sign of errors either.) My only-slightly-larger sample shows the same thing even more strongly. Of 16 WUs that I completed and returned, 9 are either currently inconclusive or have been invalid for me or a wingman. On 3 machines with Intel CPUs, I have 7 valids, 1 inconclusive, and 1 PV. On 1 AMD Phenom II, I have 2 valids, 4 invalids, and 1 inconclusive. So in my case, the AMD is the outlier. (All of these machines run Ubuntu 10.04.) Does anyone know whether these betas were 64-bit or 32-bit, btw? |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Haven't you forgotten the CPU time of the last job (job #7 for an ace80 beta WU)? When I compare the CPU times shown in the Result Status page against what is reported in the Result Log of each WU it matches the CPU time at the start of the last job, not that time plus the CPU time of the last job as it should. Bonjour Jean, If the difference between elapsed- and cpu-time would be caused by not counting 1 job to the total, I would not see an incremental increasing gap between elapsed and cpu during all 8 jobs My comment was specifically for genes who had said that CPU times were adding up correctly (or that's what I have understood), which is true only until the beginning of the last job of a WU from what I can observe. I don't know how things look like in your machine, but in mine the gap between elapsed and CPU times depends essentially on when I look at it between two checkpoints: - if I use Properties one minute after a checkpoint the gap is about one minute minus the few seconds of CPU time which are reported immediately after a checkpoint before communication between the application and BOINC seems to be broken. - if I use Properties 10 minutes later the gap will be about 10 minutes and so on until the next checkpoint is taken. For me, the gap is minimal and reasonably small if I happen to look at it immediately after a checkpoint. And of course it is unfortunately not acceptable at the end of a WU since the CPU time of the last job has not been tallied at all. Note than even BoincTasks is of little help for monitoring where you stand between checkpoints because BOINC is also blind about how much CPU time has been used since the last checkpoint. ---------------------------------------- [Edit 1 times, last edit by JmBoullier at Aug 1, 2011 1:15:30 PM] |
||
|
|