Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 33
Posts: 33   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5935 times and has 32 replies Next Thread
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

Results Page is very slow ore even sometimes impossible to reach.
I have also noticed a big increase in the PV pile. I have about 1200 WUs PV per rig with an HD7970 GPU and about half 600 WUs per GTX580 GPU. This makes it at the moment in cumulative over 10'000 PV WUs in the pile. I have the feeling that validators are drowning under the WU flow. tongue
----------------------------------------

[Nov 19, 2012 11:21:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

Maybe while they're working on this, they can figure how to exclude 6.10.58 clients from even getting any GPU WUs. My PVs are littered with 6.10.58 errors. Sometimes there's a few minutes between sent and returned and sometimes it's days.

Same here. Been like that since the beta. Can't tell you how many times I've recommended that people update to some version of 7. whistling
I have PVs that are 17 days old.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Nov 19, 2012 11:25:36 PM]
[Nov 19, 2012 11:23:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

There are a few things going on:

1) The homogenous_app_version feature of the server code which we use to ensure that nvidia results are compared with nvidia results, ati with ati, cpu with cpu, did not support the 'reliable' mechanism until early Sunday morning (UTC). Old results are now getting cleared and pending validations are dropping.

2) When we released the new app version and the new workunits, we succeeded in reducing the number of results per day. However, we made a mistake and we caused the size per row on the table to be larger than two rows were previously. This has resulted in a drop in performance of the database. We have corrected our error so smaller rows are being created now but it will take a couple of days for this change to go through the database and result in improved performance. We are temporarily also reducing the time that old results stay in the database in order to shrink the size of the table.

Item #2 is directly related to the slow/failed load times of the result status table.

As far as 6.10.58 vs 7.0 - based on the last time I ran the numbers, both saw similar rates of success from running GPU.
[Nov 20, 2012 1:10:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
pirogue
Veteran Cruncher
USA
Joined: Dec 8, 2008
Post Count: 685
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??


As far as 6.10.58 vs 7.0 - based on the last time I ran the numbers, both saw similar rates of success from running GPU.

Interesting. From my small sample:
If there's a WU where there's a PV, an error, and an in progress, it's around 99 out of 100 where the error is from a 6.10.x machine.
----------------------------------------

[Nov 20, 2012 1:36:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
OldChap
Veteran Cruncher
UK
Joined: Jun 5, 2009
Post Count: 978
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

Thanks for disseminating this information. You always give us a clearer picture of what is behind the things we see.

Have you guys had any thoughts yet about how best to resolve the short period of cached units?

I think most of us would appreciate the move to a full day being possible but that may be in the order of >2500 wu's per gpu for some.

We cannot know all the issues that this may cause at your end but from where I sit such a move would help us overcome the shorter outages that are inevitable with infant projects.
----------------------------------------

[Nov 20, 2012 1:46:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

I do not know exactly how things work at WCG servers, but isn't it possible to dynamically set the daily task limit per GPU for a given host.
When a host (example my MARS machine) returns crunched WUs back to the WCG servers, GPU/CPU time and elapsed time are known (they are identical in my rigs). Over say 3 days an average value can be computed. From that average value can be calculated the daily task crunching capability of MARS and then the server would allocate a daily limit for MARS.
In this way every host would have its own daily limit tuned to its average capability. When I set the cache size to say two days then the server knows how many tasks to be sent (if available).
If that host gets upgraded then its average will increase and the limit would be adapted accordingly. Some hosts would have very high daily allocations but others very small. In average over the whole grid there should be no change and there would be less hosts waiting for WUs. To avoid continous adaptations we could set a threshold (example +10%) that would trigger the daily allocation adaptation.
Does this make sense or my reasoning is flawed somewhere. confused
----------------------------------------

[Nov 20, 2012 7:28:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??


As far as 6.10.58 vs 7.0 - based on the last time I ran the numbers, both saw similar rates of success from running GPU.

Interesting. From my small sample:
If there's a WU where there's a PV, an error, and an in progress, it's around 99 out of 100 where the error is from a 6.10.x machine.

Same here.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Nov 20, 2012 10:52:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

In this way every host would have its own daily limit tuned to its average capability. When I set the cache size to say two days then the server knows how many tasks to be sent (if available).

The daily quota shouldn't be a problem, since AFAIK it's already so high that unless you're generating any errors you shouldn't hit this limit.

As for giving you upto 7 days of cache on the other hand is not such a good idea. The problem is a fast GPU can do 1000+ tasks/day, meaning a full 7-day cache would be 7000+ tasks. If you have 1000 such computers, you've got 7 million tasks in the database, and with so many to look through all database-traffic will go slower.

By limiting cache to example 500 GPU-tasks max at once, a computer can still crunch 1000+ per day, and 1000 such computers can still crunch 7 million in a week. But, at any given time there'll only be 0.5 million tasks in the datase, if no grace-period and everything is immediately validated.

With a 1-day grace-period and some random waiting for validation it will probably be somewhere between 1.5 - 2 million tasks in database at once. While large it's much less chance to run into database-performance-problems.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Nov 20, 2012 3:32:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

... and about performance problems, the home of BOINC now had to quadruple the task sizes [See Quota? thread for discussion], and set quota too.
[Nov 20, 2012 3:43:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: GPU Validation server issue??

... and about performance problems, the home of BOINC now had to quadruple the task sizes [See Quota? thread for discussion], and set quota too.

What and/or who was that in response to?
BOINC has no problem with a miriad of other projects being able to be granted credits. If in fact this is an issue, HCC1 should find a way to increase the 'run time' of their WU's. If HCC1 cannot for some reason differentiate CPU from GPU..... Their bad. Learn to!!!!!!
If it's all about the science, let's let it be that way. GPU's are faster, get rid of CPU WU's. There's HFCC for them. What? The HCC CPU'ers don't care about children????

CMON, this argument is getting stale;-(
[Nov 20, 2012 7:51:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 33   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread