Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 90
|
Author |
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Had to cancel WU
----------------------------------------BETA_E236438_410_S.384.C35F2H10N6O5S4.MSQQQGDDEMBVSW-UHFFFAOYSA-N.1_s1_14 under Windows 8.1. Reason: Ignores Tthrottle's requests to slow down for not overheating the processor. Consequence: The beta WU runs full speed and the processor is 10 °C above the requested limit despite the other threads having been reduced to almost no activity. Not 100 % certain but I think this is not a new behaviour for CEP2 which is not normally running in this ultrabook. Probably for that particular reason, but the decision to exclude this project had been taken years ago so I am not fully certain. |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
BETA_E236439_155_S.428.C54H24O2S4.FWBINELOJIFGEF-UHFFFAOYSA-N.5_s1_14
----------------------------------------cancelled too. Same reason, same punishment. Will I have to remove this machine from the beta testing pool? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
On LINUX, Beta WUs seem to be running at higher priority than the other work units. OET units are being CPU starved. Other WUs on same machine with BETAs are running at 75% to 80% CPU utilization. Suspend the BETAs and other WUs climb back to 99% to 100%. This never happened with the previous testing
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am cancelling the 35 BETAs due to impact to other WUs. OET FW units that used to take 11 to 17 minutes now taking 25 to 33 minutes. BETAs need to learn to play nice....
----------------------------------------After cancelling the 35 BETAs, 9 OET work units got computation errors. All 9 received the same error: <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [18:53:54] Number of tasks = 1 [18:53:54] Running task 0,CPU time at start of task 0 was 0.000000 [18:53:54] ./ZINC07736473.pdbqt size = 20 3 ../../projects/www.worldcommunitygrid.org/oet1.xZAGP-FW_rig.pdbqt size = 2296 0 [19:02:40] Finished task #0 cpu time used 449.640000 19:02:40 (6321): called boinc_finish </stderr_txt> <message> finish file present too long </message> [Edit 2 times, last edit by Doneske at Mar 17, 2016 12:11:46 AM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7581 Status: Recently Active Project Badges: |
On LINUX, Beta WUs seem to be running at higher priority than the other work units. OET units are being CPU starved. Other WUs on same machine with BETAs are running at 75% to 80% CPU utilization. Suspend the BETAs and other WUs climb back to 99% to 100%. This never happened with the previous testing Now that is an interesting observation. Could it be each Beta WU is utilizing more than one CPU ? Or is it that the I/O pipelines are so full from the Beta WU that the other CPU's are data starved ? Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Error after ~45 minutes:
BETA_ E236440_ 951_ S.456.C53H21N3O2S5.WLUYHOLBGNPRMQ-UHFFFAOYSA-N.1_ s1_ 14_ 1-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [16:52:40] Number of jobs = 5 [16:52:40] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [17:41:01] Finished Job #0 17:41:04 (4084): called boinc_finish |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I got one WU E236439 At 11% to has not checkpointed yet.
I was running the beta and 3 UGMs. The CPU temp began to increase. I suspended the 3 UGMs and the temp returned to normal. I have restarted one UGM and the temp has gone up about 3 deg F. Now running the beta, 2 UGMs and one OET. CPU temp is remaining under 150 deg F. So far so good. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
On LINUX, Beta WUs seem to be running at higher priority than the other work units. OET units are being CPU starved. Other WUs on same machine with BETAs are running at 75% to 80% CPU utilization. Suspend the BETAs and other WUs climb back to 99% to 100%. This never happened with the previous testing Now that is an interesting observation. Could it be each Beta WU is utilizing more than one CPU ? Or is it that the I/O pipelines are so full from the Beta WU that the other CPU's are data starved ? Cheers My first thought was the I/O so I let them run for several hours expecting it to eventually even out. When it didn't, I thought that maybe the BETAs were set to being multi-threaded which means it was probably going to run that way for the length of the jobs. That would have caused big problems potentially with the OET work since I have about 14000 queued up. The I/O never has been a problem in the past nor has the BETA work. Something was definitely different about this batch. |
||
|
Seoulpowergrid
Veteran Cruncher Joined: Apr 12, 2013 Post Count: 815 Status: Offline Project Badges: |
Got 1 in PV with CPU time/elapsed time of 7.01 / 7.11
----------------------------------------9 hours later I got 16 more and are running/will run soon. |
||
|
slakin
Advanced Cruncher Joined: Jul 4, 2008 Post Count: 79 Status: Offline Project Badges: |
I have two running on my laptop both just over 25% complete with over 4:40 in CPU time and still have not checkpointed. One on a home desktop did checkpont after just over 2 hours and I was able to successfully suspend/restart with LAIM off. seems like a long time for the laptops to run without a checkpoint.
|
||
|
|