Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1870 times and has 2 replies Next Thread
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 445
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
A reason for so many third and fourth copies of HCC GPU workunits

I've noticed that one of my computers always downloads HCC GPU workunits with an initial estimate of 00:01:46 or 00:01:47 to run, but actually takes closer to 12 minutes to run each of those workunits. Such an underestimate of the time required is likely to produce a large percentage of workunits that either run past their deadlines or have to be aborted by the user to prevent this, unless the user sets the queue of workunits to be very short.

Could the next version of this application do more to adjust the estimates of time required to agree with past actual run times on the same computer? I suspect that it will need to maintain both an estimate of the CPU speed and an estimate of the GPU speed in order to do this.
[Feb 26, 2013 3:22:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A reason for so many third and fourth copies of HCC GPU workunits

On the title, FAIU, the techs have set a very high fault allowance, meaning devices are permitted to have many errors **, so it's a consequence that more copies need to be circulated before a quorum is reached.

The time [FLOPS] is build on last few days historical data, so it is in principle pretty accurate from that perspective.

The WCG server has send an instruction to clients to not use the DCF, so projections should in principle be pretty stable for Ready to Start work.

Question: Is this 1:46 a "run it exclusively by itself", or when "run with many concurrent", a setting of e.g. 12 concurrent on one GPU? Also, do you run a mix of CPU and GPU versions of HCC? Finally, what client version exactly?

edit: ** To clarify, high fault allowance is like 80%+ [or was it 85-90%?], and cards still not being blacklisted, which is understandable in a way given that a successful task runs 18 times faster [if having the GPU to itself]. Do not know though for sure if that's any part of the motivation to be highly tolerant of low success-rate devices.
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 26, 2013 5:30:35 PM]
[Feb 26, 2013 4:37:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A reason for so many third and fourth copies of HCC GPU workunits

I see this occasionally - running multiple concurrent GPU WUs only HCC1

*Sometimes* the time estimates get all goofey in BM, and as robert posted, will show very short estimates .... but .... within a minute or three the estimates return to normal. It looks very odd when it happens and the PC asks for lots and lots of work. Only once did this cause the machine to ask for more work than it could possibly execute (slow POS deserves to be replaced :O) but that's it ... 1 time in about 6 months ... not too bad I think.
[Feb 26, 2013 5:25:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread