Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 53
Posts: 53   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 10533 times and has 52 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1403
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

... Other sciences have a 5-10 times sometimes 40 times overrun build in as in estimated FPOPS times whatever applicable project factor.

For OET1 the maximum runtime allowed is 40 times the estimated elapsed time,
so to error out for that reason the estimate should be more than 40x too short.
[Feb 10, 2015 6:59:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

Short times? After returning several in the 2-3 hour range (old hardware), I now have 4 that have been running from 7-11 hours and have a range of 52 -82% done...


ETA one is now in PV at 13+ hours.
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 10, 2015 10:50:03 PM]
[Feb 10, 2015 7:48:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
tmedve
Senior Cruncher
USA
Joined: Nov 16, 2004
Post Count: 191
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

Has OET1 dried up for the day? I have gotten "no tasks available for OET" and "tasks are committed to other platforms" for the past several hours.

HA! Never mind. Just as soon as I sent off this post, I started getting new jobs.
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by tmedve at Feb 10, 2015 8:16:55 PM]
[Feb 10, 2015 8:13:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

Has OET1 dried up for the day? I have gotten "no tasks available for OET" and "tasks are committed to other platforms" for the past several hours.

HA! Never mind. Just as soon as I sent off this post, I started getting new jobs.
Yeah, I know that feeling... sad

"Complain and you shall receive..." devilish

Ralf wink
[Feb 10, 2015 11:21:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sandvika
Advanced Cruncher
United Kingdom
Joined: Apr 27, 2007
Post Count: 112
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

Is it just me or are some of these units REALLY short? I just returned one in four minutes without any apparent error, on a 2GHz Sossaman rig (2006 vintage). The task properties suggest there was only one job in it, but still ... four minutes?? The rig in question doesn't normally throw errors or invalids, either.


We've seen some of these batches in beta. Some of the *ZAGP* WUs such as those in batch 320 are really short, the *MBGP* WUs such as those in batch 309 are really long, comparable to CEP2 in duration. However, there's a lot of variability and I've got a *ZAGP* WU from batch 313 that's only reached 25% after 4 hours. I sincerely hope that there isn't an 18 hour limit per WU as with CEP2 because that led to me losing >10% of my work and baling out of the project.

The WU duration estimates across my devices vary hugely, probably as an artifact of getting a wild mix of durations with the first WUs crunched. This governs how much work BOINC requests, so I have one device maxed out with 25 WUs in progress per thread because it decided they only take 40 minutes. In fact the WUs currently crunching will take 4x longer than that, so I guess it will recalibrate in due course and stop asking for more work. Meanwhile, those devices that started with big WUs are still constrained by the quota system and are not getting more work either. Fingers crossed that it will all settle down quite quickly.

good luck

Assumptions is the pitfall.

If you have a client version 7.x you could do with a major rewrite on your last paragraph. Factors <dont_use_dcf/>, device benchmark constant, server determined average FPOPS on returned work by the pool.

And no, CEP2 has a purposeful cut-off in hard hours. Other sciences have a 5-10 times sometimes 40 times overrun build in as in estimated FPOPS times whatever applicable project factor. The actual average runtime on CEP2 never exceeded 9 hours in last 11 months, had a one time peak out at 13.5, see http://bit.ly/WCGART , meaning 18 hour devices are the exception, should indeed be considered to not be opted in. Conversely, even completing the first of 8 jobs in a task before the cut-off would have been contributing. Doing jobs #1-3 is better and #4-7 is cranberry juice on top of the icing.


OK, 4 identical devices, show remaining times on "ready to start" OET1 WUs of 01:17:29 (was 00:47:20), 01:45:10 (hasn't changed), 04:07:02 (was 01:43:37) and 03:28:57 (was 02:43:09). The first started off with tiddlers, the last with larger ones. The only variable is the run time of the WUs. Yesterday all 4 were crunching FAAH and had almost identical estimates. So what's your explanation since you are suggesting it's not related at all to the work actually being processed?

18+ hours was a consequence of hyperthreading being enabled. The CPUs are perfectly capable of crunching a big CEP2 WU in 9-10 hours without hyperthreading but it's a server that gives spare cycles to WCG, not a WCG system that doubles up as a server. The onus is on CEP2 to behave appropriately in awareness of its execution environment, but unfortunately, it doesn't. Killing a job in this instance is like stopping at a green traffic light in anticipation of it turning red at some point in the future. I'll leave it at that because it's off topic in this discussion and was done to death in the CEP2 forum. If OET1 behaves as it did in beta with some 30+ hours WUs completing fine then that's all that matters.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Sandvika at Feb 10, 2015 11:49:30 PM]
[Feb 10, 2015 11:48:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

Short times? After returning several in the 2-3 hour range (old hardware), I now have 4 that have been running from 7-11 hours and have a range of 52 -82% done...

I've had some of those. Mine started with OET1_ 0000307_ xEBGP-OM_ rig...

Has OET1 dried up for the day? I have gotten "no tasks available for OET" and "tasks are committed to other platforms" for the past several hours.

I get those too, along with "Message from server: This computer has reached a limit on tasks in progress". (Intermittently. Other times, the server gives out work, as with the example above.)
[Feb 11, 2015 2:27:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

Short times? After returning several in the 2-3 hour range (old hardware), I now have 4 that have been running from 7-11 hours and have a range of 52 -82% done...

I've had some of those. Mine started with OET1_ 0000307_ xEBGP-OM_ rig...
Kremmen,
I have got one of those "0000307" WUs which is close to complete with more than 22 hours CPU on a machine which needs between 1 and 2 hours for the short ones...

So, nothing wrong, be patient. smile
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Feb 11, 2015 4:51:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

I have got one of those "0000307" WUs which is close to complete with more than 22 hours CPU on a machine which needs between 1 and 2 hours for the short ones...

I've got a FW_rig WU running now showing 622 secs and 60% done.
Another OM_rig WU on the other core of the same CPU is at 21418 secs and 20% done.

Looks like that monster will be almost exactly 100 times as long.
[Feb 11, 2015 3:21:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
3rkko
Advanced Cruncher
Finland
Joined: Aug 2, 2008
Post Count: 105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

There seems to be very high variability in WU completion times. Some complete in 25 minutes, others take as much as 17 hours!

OET1_ 0000324_ xMBGP-FA_ rig_ 11471_ 1-- oxygen Pending Validation 10.2.2015 14:00:46 11.2.2015 20:11:14 16.58 / 16.80 534.6 / 0.0

I have one WU that is currently 6:40/38.3% done, that will be 17.2 hours total.
----------------------------------------

[Feb 11, 2015 9:57:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN-A Shrubbery
Senior Cruncher
Joined: Jan 8, 2006
Post Count: 476
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Outsmart Ebola Together - v7.19 - Released

I'll take your variability and raise you 4 minutes and 21 hours on the same machine at the same time.

Makes the buffer size appear somewhat erratic.
----------------------------------------

[Feb 12, 2015 12:33:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 53   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread