Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 63
Posts: 63   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 19234 times and has 62 replies Next Thread
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

@[HWU]Melkor:"What should I have to paste here?"
Generally, only post full details of an error if nobody has already described the same sort of error. In that case, describe what went wrong and you might copy and paste the abnormal parts of the Results Status log.
If your type of error has already been reported, just state the type of error, eg "hung at 99.415%".

Reading armstrdj's post again, in this thread we were asked to paste in the last few lines of the stderr.txt file in the error WU's Slot directory. Sorry I mislead you by suggesting you look at the Results Status log. The slot directory is cleared when the WU ends, so your stderr.txt is lost. Only the log in your WCG Results Status remains.

You report having the default 50% throttle setting so I think that your error happened because BOINC currently can't re-start GPU WUs after they have paused during throttling. Your BOINC version etc is useful info because by examining a number of these situations the techs may discover another factor that needs to be present to trigger the error.
You have already given enough details I think smile
[Sep 23, 2012 1:32:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

Another issue with these beta tasks has been reported in the GPU Support Forum :
BOINC gone crazy after completing BETA tasks

After running GPU tasks, BOINC can go into a mode where it seems to ignore your work cache setting and downloads new tasks until the server estimates that your machine can't complete them by the 10-day limit.

Someone might be working on a fix.
---
[Edit]: Afterthought - this may be a phenomenon rather than a bug.
Let's imagine that the WUs came out with their CPU time estimates the same as for normal CPU HCC tasks - say 1hr - and they ran in, say, 5 min, ie the estimates were out by a factor of 12. Your BOINC client would think your machine was on steroids, and if your work cache was set to 1 day and no more GPU tasks were available it would try to download 12 days' CPU work.
But there are lots of "if"s I don't know about.
----------------------------------------
[Edit 2 times, last edit by Rickjb at Sep 24, 2012 4:05:22 AM]
[Sep 23, 2012 1:54:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

Let's imagine that the WUs came out with their CPU time estimates the same as for normal CPU HCC tasks - say 1hr - and they ran in, say, 5 min, ie the estimates were out by a factor of 12. Your BOINC client would think your machine was on steroids, and if your work cache was set to 1 day and no more GPU tasks were available it would try to download 12 days' CPU work.
Looks like there needs to be a separate WU-cache for GPU-based WUs so that the logic for calculations of cache-sizes are independent of one another. That is: the CPU-based-WUs interact only with the cache for CPU-based-WUs while GPU-based-WUs interact only with the cache for GPU-based-WUs.
;
[Sep 24, 2012 6:27:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
RetiredTech
Advanced Cruncher
Canada
Joined: Feb 2, 2012
Post Count: 91
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

I recently received a WU for this BETA and it hung at 99.415 %. Suspending and rebooting did nothing. I moved NVIDIA Physx to CPU and rebooted. This time the WU completed.
I have use at most 80% CPU.
Use always.
[Sep 24, 2012 6:43:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

Another issue with these beta tasks has been reported in the GPU Support Forum :
BOINC gone crazy after completing BETA tasks

After running GPU tasks, BOINC can go into a mode where it seems to ignore your work cache setting and downloads new tasks until the server estimates that your machine can't complete them by the 10-day limit.

Someone might be working on a fix.
---
[Edit]: Afterthought - this may be a phenomenon rather than a bug.
Let's imagine that the WUs came out with their CPU time estimates the same as for normal CPU HCC tasks - say 1hr - and they ran in, say, 5 min, ie the estimates were out by a factor of 12. Your BOINC client would think your machine was on steroids, and if your work cache was set to 1 day and no more GPU tasks were available it would try to download 12 days' CPU work.
But there are lots of "if"s I don't know about.

Though there is as yet no separate buffer control for CPU and GPU [no one has made a compelling case at the developers FAICR for such user configurable settings], GPU and CPU have separate schedulers. Knreed posted that he'd fixed some bug, working with Berkeley to stop this occurring, but seemingly... At any rate, my latest 7 test client still has a single DCF per project [speak, WCG is a single project to BOINC], and the <dont_use_dcf/> entry is there but not operational. Choice *was* a number of months ago not to use that feature [disabling the DCF in effect]. Maybe/maybe not they are related. Long as the run time estimates are set right for GPU Beta, this should not affect CPU side, moreover it's still Beta, so there is no cross feeding of run times averages between GPU and CPU, don't expect there to be since is is a separate feed. This is something which has all to be learned on the go with this first for WCG.
[Sep 24, 2012 9:30:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

At any rate, my latest 7 test client still has a single DCF per project [speak, WCG is a single project to BOINC], and the <dont_use_dcf/> entry is there but not operational.

Well, funny how DCF for WCG has been stuck at 1.000000 for some time now if <dont_use_dcf/> isn't working...

Just looking on the tasks being "ready to report", with cpu-times between 14165.25 and 22219.69 seconds it's strange DCF haven't been changed by any of these when all of them had fpops_est of 54028909110941.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Sep 24, 2012 11:12:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

SekeRob, you mentioned in your other posts, that the DCF is now on the server. Now is that a DCF for a CPU-based-WU? We may need to have a DCF for a GPU-based-WU to operate on a cache for GPU-based-WUs. I can imagine further difficulties if GPUs are not given its own space so to speak, and force GPUs to be 'just another CPU' in a manner of speaking. Already the over-downloading of CPU-based WUs is apparently caused by the servers misreading a reported done-GPU-based-WU as reflective of a client's CPU-performance and may have thus triggered a server command to fill the client's cache based on a DCF that is supposedly for CPU-based-WUs only but which DCF the GPU-based-WUs performance may have modified.
;
[Sep 25, 2012 12:07:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

SekeRob, you mentioned in your other posts, that the DCF is now on the server. Now is that a DCF for a CPU-based-WU?

The server keeps track of various parameters for each individual computer, app-version and plan-class-combination. These parameters is used to give "correct" run-time estimates, but they're not used before 10 valid tasks with a particular computer, app-version and plan-class-combination.

Meaning, even if GPU-tasks is originally estimated to take 1 hour but takes only 5 minutes, this factor of 12 disrepancy will not influence the estimates for CPU-tasks for v7.0.28 and later clients.

v6-clients on the other hand can be completely overcommitted with CPU-work due to the incorrect GPU-estimates.

Well, GPU will still have some effect in v7, if example a quad-core has a 3-day cache, when gets a string of GPU-work it will "lose" one cpu-core and this means the CPU-work now will take 4 days.

There's also been various improvements and bug-fixes to BOINC-client, among these one going on estimates:
•client: fix error in runtime estimation for active tasks.
This change is in v7.0.34 and later, so while it's unclear if it could give excess CPU-cache, it's still a good idea to upgrade to v7.0.36.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Sep 25, 2012 7:43:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
katoda
Senior Cruncher
Poland
Joined: Apr 28, 2007
Post Count: 170
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

Looks like this wave of beta tasks is finished, just my two cents: I got 4 beta tasks on one of my machines and first of them was stuck on 99.415%. The machine is a notebook winth Windows7 54bit and nVida 540M GPU and to avoid overheating, throttle is on - use max 50% of cpu time.
I turned throttle off, restarted BOINC and all beta tasks finished successfully. I observed that during computation, both CPU core and GPU were used in 100%, despite of throttle settings.
I hope that this issue will be solved quickly - I think that a lot of people use this setting because of heat and noise, using 100% of a notebook CPU is not an option for me.
EDIT: just to avoid confusion, "issue" means hanged workunits, not 100% CPU utilisation :)
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by katoda at Sep 27, 2012 8:21:50 AM]
[Sep 25, 2012 2:51:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Conquer Cancer - GPU v6.51

The 4 betas I had were either completed with error or were stuck at 99.415% complete (so I aborted them). Hopefully the next betas will run successfully. I did not go to the task bar to see the percentage of CPU/GPU usage. Also, on that machine I am running Vista.
[Sep 25, 2012 3:27:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 63   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread