Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 161
Posts: 161   Pages: 17   [ Previous Page | 8 9 10 11 12 13 14 15 16 17 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 596241 times and has 160 replies Next Thread
twilyth
Master Cruncher
US
Joined: Mar 30, 2007
Post Count: 2130
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

I have a similar problem with one rig that is running a PCI 3 card in a PCI 2 slot. If the screen saver comes on, I get an error saying that the gpu is no longer working or is missing.

I also get the problem if I log in remotely an minimize the screen.

Finally what I did was start boinc on that machine as follows

c:\program files\boinc\boinc.exe --allow_remote_gui_rpc

Then I went to the Programdata directory and found the gui_rpc_auth.cfg file. The string in there is the boinc password. You can change this to something else I think. I just copied it.

Then back on the remote host, I logged in from boinc manager with that pw and everything is working fine. I do have to run 2 BM's to see everything but that's fine.

I just had to set the power options on the target machine so that there was no screensaver and the monitor never powers down. That's ok since it's on a KVM anyway and switching back and forth on that doesn't seem to matter.
----------------------------------------


----------------------------------------
[Edit 1 times, last edit by twilyth at Nov 25, 2012 11:52:31 PM]
[Nov 25, 2012 11:50:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pfm3136
Cruncher
Joined: Apr 11, 2010
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

Uplinger, 2 machines running GPU tasks, both had the reg fix applied and rebooted as per your instructions.

The registry fix hasn't solved the problem unfortunately.

The machines have multiple GPUs configured with app_info.xml and stuck units aren't limited to any particular GPU. stderr doesn't appear to show anything out of the ordinary and if the units are suspended then resumed, they complete within a few minutes (compared to the hours they have been sitting idle). BOINCTasks sometimes shows high CPU activity which I presume means that image one is completing.

Is there anything else I can check? Also, do you have a link for the Microsoft fix to see if there are any more pointers?

Thanks again for your time.



I had the same problem, don't know if it aplies to you, but disable crossfire/sli and it should be ok.
That reg fix for the watchdog is for another problem (you should see the driver crashing and recovering wich would "kill" all the work units on the gpu affected), unfortunately doesn't work on windows 8.
----------------------------------------
[Nov 26, 2012 2:14:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
coolstream
Senior Cruncher
SCOTLAND
Joined: Nov 8, 2005
Post Count: 475
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

Thanks for all the input, guys. A lot of the suggestions were not relevant because the power management settings suggestions were the same as I was already using, however I have been testing them to see if any one in particular might make a noticeable difference.

I have now set both machines to use non-Aero and will check again later to see if I get any more stuck units.

@ pfm3136, although I have multi GPUs in each machine, I'm not using crossfire, so that one can be easlily ruled out.

@ twilyth, interesting observations about remote connections. I had to stop using remote desktop on the GPU machines for this very reason. That was in the time of single image units, but I never got a situation like this until the advent of dual image units so I'm almost sure that it's not the culprit here.

re gui_rpc_auth.cfg and monitoring, instead of using two BMs you might want to conside BoincTasks which lets you monitor multiple machines in one interface. It is highly configurable and can make life so much easier. http://www.efmer.eu/boinc/boinc_tasks/

Thanks again to you all. It's comforting to know that I'm not the only one, even though it's so annoying because I never had these issues before the WUs changed from single units.
----------------------------------------

Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY.
[Nov 26, 2012 4:09:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
rilian
Veteran Cruncher
Ukraine - we rule!
Joined: Jun 17, 2007
Post Count: 1460
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

hi - i see HCC has now 111 days before end -- http://i137.photobucket.com/albums/q210/Sekerob/WCGYearsPi1Project.png

is there any official announcement that more tasks are added to the GRID ?

thanks!
----------------------------------------
[Nov 26, 2012 10:30:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Zigfried
Senior Cruncher
Brazil
Joined: Dec 12, 2005
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

This project ran out of WU for 2 or 3 days.

That chart shows the time based on avarage of works per day and those days without work made a little mess on it. In a few days it will be ok. But i dont know why it is showed as PAUSED.
----------------------------------------

[Nov 26, 2012 2:39:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1404
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

This project ran out of WU for 2 or 3 days.

That chart shows the time based on avarage of works per day and those days without work made a little mess on it. In a few days it will be ok. But i dont know why it is showed as PAUSED.
Only a little mess!

IIR Sekerob's explanation correctly:
The highest and lowest value of the last 4 weeks/30 days are discarded to calculate the estimated lifetime.
So one of the 'empty' days doesn't count.
But sure it will be less than 111 days.
[Nov 26, 2012 3:50:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

It's a mess, because the double image switch got the double whammy effect from the few days no new work shortly after. The "trimmed mean" works on a shorter period, so now it's 124 days ;P and "Paused" is now "Normal" [the old intertube cache view] :D It takes 10 days to get back to a "clean" average again. At the moment HCC is overweight [in total runtime] http://bit.ly/WCGQLK , so it might get a downward share adjustment [measured in results of total daily processed for WCG].

Trimmed Mean on MS office works by specifying a percent of high/low you want to exclude from the average. If the object is to remove only the single highest and lowest on last 30 days, the percent factor to dismiss is 2/30th i.e. 0.067. Enter 0.134 and it will dismiss the two highest and lowest, et cetera et cetera. At any rate, not going to touch the algo's, with the exception that from an arbitrary point I've assumed all results to be double image [to get back to the total completed images as a fraction of the overall estimated images that were guessed to be there at the last word of "so much added to...". Anyway, whenever Viktors has word on new estimates [he's the man doing the project duration planning], we'll surely hear. Still think this project will go intermittent at some point. There's always new sets of crystal being generated somewhere on the globe that can ** make use of the HCC system.

** edit: and have been making use of the HCC gateway to the grids processing power.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 26, 2012 4:08:44 PM]
[Nov 26, 2012 4:07:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

hi - i see HCC has now 111 days before end -- http://i137.photobucket.com/albums/q210/Sekerob/WCGYearsPi1Project.png

is there any official announcement that more tasks are added to the GRID ?

thanks!

Probably lesss than that if everybody starts to add more and more GPUs.
----------------------------------------

[Nov 26, 2012 5:59:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
twilyth
Master Cruncher
US
Joined: Mar 30, 2007
Post Count: 2130
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

Coolstream: Thanks for the info on boinctasks, but I don't like to use anything not boinc supported. Lots of nice utilities come and go. What happens is you become dependent upon them and then one day they don't work any more and no one wants to step and support them.
----------------------------------------


[Nov 26, 2012 7:38:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Low Work for Help Conquer Cancer ( Nov 20, 2012 )

Coolstream: Thanks for the info on boinctasks, but I don't like to use anything not boinc supported. Lots of nice utilities come and go. What happens is you become dependent upon them and then one day they don't work any more and no one wants to step and support them.

I doubt BoincTasks is going anywhere for a long time. The developer is always looking for feedback to make it better. What can you lose by giving it a try. It's a must if you run multiple machines IMHO.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Nov 26, 2012 8:27:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 161   Pages: 17   [ Previous Page | 8 9 10 11 12 13 14 15 16 17 | Next Page ]
[ Jump to Last Post ]
Post new Thread