Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
![]() |
Author |
|
Spiderman
Advanced Cruncher United States Joined: Jul 13, 2020 Post Count: 117 Status: Offline Project Badges: ![]() ![]() ![]() |
I'm uncertain whether to post here, or under the GPU forum, or OpenPandemics...
Q: Anyone else noticed a few OPN GPU tasks that never end? It's not all, but enough to be troublesome. -- I happened to notice one of my machines with a "No Reply" when I downloaded my Results into a spreadsheet yesterday morning. When I looked over on that particular box, there was a OPNG WU (OPNG_0193257_00054_0) that had over 2-days of runtime ticking. It was overdue and had another WU behind it that was within an hour of being overdue as well. This is one of my Windows machines and has the integrated AMD GPU enabled (no add-on card), set to run without interruption. There are no other processes on it and even if so, it's told to go-forward and compute CPU/GPU no matter what. [Local Preferences] I rebooted it and it went about it's business re-running from scratch (not sure where the checkpoint disappeared to?). This morning it was still going so I aborted the task. It started running another OPNG that was due yesterday (which I will probably abort also if it doesn't finish it soon). Sadly, this box is now suspect by the WCG Server and I'm seeing "This computer has finished a daily quota of 1 tasks" in the Event Logs. -- Another Windows machine has an integrated *Intel* GPU and I found it with the same issue, but instead of rebooting, I suspended it and then told it to resume. It eventually finished the WU. -- I'm not seeing anything in the Event Logs to tell me anything. The only common denominator of these are: 1) Windows 2) Latest v7.24.1 BOINC Client 3) Whatever it is doing, it isn't listening to the WCG Due Date in order to stop after the time-limit has occurred. 4) Doesn't matter if Intel or AMD GPU 5) v7.22.2 BOINC Client doesn't appear to do this (I only have one other Windows with GPU, all others are CPU-only or Linux). 6) This only happens on select OPN GPU tasks (not all) but holds-up the chain since this is an integrated graphics processor. -- Very odd and nothing in the Event Logs to give a hint what the issue is. I could turn GPU off, but like most, I try to squeeze every portion of processing power out of these systems. -- Before I started down the path of reporting to BOINC Support over at Berkley, I wanted to see if anyone else had noticed this? Thanks... |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Never seen this with any OPNG WUs, and honestly don't remember to have ever seen this for OPN1 either.
----------------------------------------I just see this occasionally with MCM1 or SCC1 WUs, where they most of the time then run up to 99.xxx% before stopping to properly finish. Usually when checking in those cases the properties of those WUs in the BOINC manager, the CPU time has just dashes. A couple of times recently I saw however some of those MCM1 or SCC1 tasks that run to maybe 10-15% and then just running up the clock, with no obvious progress. But those are 1 or 2 out of 5000-6000 WUs total at that point in the Results list, so nothing that got me worried at this point. And then WCG reaction to any report, if I would even able to make this in time is probably just to ignore this anyway... Ralf ![]() |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12392 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Spiderman
I have an i7 3770 with an Intel GPU. Its current OPNG unit has not reached its deadline but looking at its properties I see that it has a fraction done of 78.899%, elapsed time of 2:52:41 and CPU time of just 0:10:35. This indicates to me that it is only active part time. Is that what is happening with you? Incidentally, I find that BOINC does not stop units that exceed their deadline. It is possible to earn the credit if it reports its results before its replacement. The only server aborts that I get are of re-sends where the late running units report before my unit reaches its first check-point. Mike |
||
|
Spiderman
Advanced Cruncher United States Joined: Jul 13, 2020 Post Count: 117 Status: Offline Project Badges: ![]() ![]() ![]() |
Mike,
Thanks -- perhaps I should've left running just to see if it ever does stop. I'm unsure. However, I've aborted a total of (3) now that were 2-3 days overdue (and surely had new Wingmen assigned and reported back -- I know my first one did). This last one that I stopped this morning had run for over 20-hours -- normally they take 1.5 hours. The one it does it most on is an Intel i7. The other machine I noticed it on is an Intel i5. I double-checked and Power & Sleep modes are turned-off on everything possible on those systems. If it continues, I'll see if I can dig-up the associated logs + files and send to BOINC-Berkeley for them to analyze & determine if there's a bug in this latest release of the Client. Appreciate the reply! |
||
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 244 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have one AMD APU that will stay on 99% until it times out. I just moved that machine into a profile which doesn't allow GPU work.
----------------------------------------![]() “Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792) |
||
|
Spiderman
Advanced Cruncher United States Joined: Jul 13, 2020 Post Count: 117 Status: Offline Project Badges: ![]() ![]() ![]() |
This Dell Intel i7-3770 machine has continually been hanging on OPNG GPU tasks -- most of the other Dell's and HP's in the stack don't have that issue, but after aborting several OPNG WU's over the past 1-2 months I decided to turn GPU processing off for WCG on this single machine (no GPU issues for my backup project when WCG is down which is what is so strange).
I don't use Profiles so added a 'cc_config.xml' file and told it to not allow GPU processing on this machine for WCG but allow it to run for others. A 'BoincCmd --read_cc_config' called from the commandline reinitialized and allowed me to confirm GPU processing was off for WCG on this single machine. <cc_config> <options> <exclude_gpu> <url>http://www.worldcommunitygrid.org</url> </exclude_gpu> </options> </cc_config> -- That fixed the issue. |
||
|
|
![]() |