| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 21
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi, all. The screwy completion time estimates in BOINC have been annoying me lately, so I took a crack at improving them. If they bug you too, you're welcome to help beta test my changes. It's a drop-in replacement for the standard boinc.exe version 5.4.9; Windows .exe and changed source files included.
----------------------------------------Download from: boinc_client_5.4.9_timefix.zip Changes (from readme.txt): 1. Switches to extrapolated remaining time as soon as the unit has been processed long enough to build up a useful average rate. In this version, when the unit is at 0% progress, the old GFlops-based estimate is shown. Once it reaches 5% progress, the extrapolated time based on used CPU time and progress percentage is shown instead. Between those two percentages, the estimate is a weighted average which smooths out the transition to avoid wild swings. 2. The extrapolated time to completion is only recomputed when the progress percentage changes, since this is the only point where the program has accurate values to estimate with. In between progress percentage updates, the estimate will simply tick down one second per CPU second used, accounting for the progress which is assumed to be happening between the updates we receive. These two changes make the time estimates behave in a much more natural feeling manner. The wacky estimates which most work units start with will quickly converge down to something at least approximately correct once the unit starts running, and the timer will tick downwards as intuitively expected, with occasional jitters one way or the other to recalibrate after each progress update. Let me know how you like it, and any problems you see. Questions and comments can go here or to my email (included in readme to keep it off the web). And if anyone knows of a place to host this permanently, let me know; I don't really feel like making a whole permanent web site just for this. [Edit 1 times, last edit by Former Member at Jun 6, 2006 8:14:42 PM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hi Rick, by coincidence i wrote something about this issue several time and even just this morning in a thread about WU's missing in BOINC local screen, but shown on WCG..... i give it a wirl and see if it comes close to the spreadsheet algorythm....biggest issue (i think) 'all' have is that it starts with some off the wall number 20 /25 hours....then 2 hours and 1/4 into the calc still shows 12 hours...fathom that.
----------------------------------------'ll let you know if good or bad....its 6.6.6 so anything might come out ![]() oh and by the way, your calc may work for WCG, but would it work for the other 20 Grid projects running on BOINC?
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 3 times, last edit by Sekerob at Jun 6, 2006 8:03:41 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
oh and by the way, your calc may work for WCG, but would it work for the other 20 Grid projects running on BOINC? Well, it's just a simple linear extrapolation based on the cpu used for the progress so far, so it mostly depends on how steady the rate of progress is for the unit. It should give good results on any WU that doesn't speed up and/or slow down a lot as it runs, regardless of the project. And for the ones that aren't steady, well, it probably won't be any worse than the current method, which gives some pretty lousy estimates. ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
hmmm with your CPU lineair calculation comment you put me on wrong foot.....is the amendment going to give a wallclock time to complete or a cpu time to complete? On FAAH the % changes per second CPU time, on Rosetta it changes on varying jumps of 0.3 to 0.9 %. On FAAH that means one could see erratic jumps up and down.....i'll shut up now and watch. I'll kick your beta off once the current WU is finished....don't want to ruin good hours ;>)
----------------------------------------okay, 35 minutes and 6.5% into it, it gives a darn close CPU time to complete.....not wallclocktime, which is the intend of the field. On Wallclock its now 6 minutes light and 40 minutes till completion. update 65% into the job, still seconds within what my spreadsheet produces. If you could get it to read a parm from some ini file where one can enter an average CPU allocation percent (mine works out at 93.5%), it could take the residual time divided by that....so 1 hour clock works as 56.1 minutes CPU. Of course i'm not stopping you from going overboard and make it read a number of taskmanager % allocations over a fixed period, time span of 5 minutes e.g. and take the average of that to the to cpu time to complete....it would be pretty pretty close.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Jun 6, 2006 2:42:30 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Actually, the CPU allocation percent for converting between CPU time and wall time is computed right there in the client program, so it's already available to me. It's saved as <time_stats/cpu_efficiency> in the client_state.xml file.
But... according to the source code, the "To Completion" column in the manager is supposed to display CPU time remaining, not wall clock time. The function which I'm changing to fix these estimates is called est_cpu_time_to_completion(), and the variable which the manager prints in that column is called estimated_cpu_time_remaining. So the manager has always been displaying CPU time there, not wall time, but the estimates were so bad before that you couldn't tell. I guess I could change the manager so that it used the cpu efficiency figure to show the wall time there anyway. But I'm not sure if it's worth the trouble... what do most people really expect that column to show? And the difference is only a few minutes, anyway; compared to how far off the estimates used to be, it doesn't seem like much. Hmmm... ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I positively vote for wallclock time....it's like, will it finish before 24:00 UTC or after....see if anyone votes which way. For sure the WU was just finished and kept CPU to go time (TTG) to a few seconds exact....i'm sure more than a few will appreciate it.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Another thought... instead of me changing the way the manager is intended to work, you might be interested in taking a look at BoincView. As I mentioned in my readme.txt, I've found the latest version of BoincView to work very well with my time estimate fix.
In addition to the CPU Time Used and CPU Time To Completion colums that the standard manager shows, it also has columns for Total CPU Time and Completion At, which is the wall clock date & time that the unit should finish. I think this last column already factors in the recent CPU efficiency for that unit, which BoincView computes and updates constantly. I'd be very curious to know how well that Completion At field matches up with your spreadsheet's final wall clock estimate. It should be close. I've switched to using BoincView as my main UI for BOINC instead of the standard manager program. It does everything the manager does except for the stats graphs, plus it adds nice work status colors and these new fields, along with data logging and easy remote system switching. It's at http://boincview.amanheis.de/ if you're interested. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Fa Bene......i'll give that a whirl too
---------------------------------------- . I just discovered a header piece which only exists in the FAAH log on WCG, but not in the Rosetta...low and behold, it got pretty much my number in there of what i call "slip value" (100 less the bolded number below gives 93.6 opposed to the 93.5 i use, which is called here "CPU idle Factor")...it thus beats me why the BOINC agent starts off with this number in the 20/30 hours, when time and again the FAAH run in 8.75 to 9.5 hours wallclock. <core_client_version>5.4.9</core_client_version> <stderr_txt> Failed to open wcg_checkpoint.dat for reading. rc: 2. File doesn't exist? INFO: WCGRID_MAX_CPU found. Value = 94 INFO: CPU Idle Factor is 0.063830 World Community Grid AutoDock (projects/www.worldcommunitygrid.org/wcg_faah_autodock_5.09_windows_intelx86)
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To be honest, I'm not sure what all the fuss is about.
My BOINC estimate has always been fairly close. Perhaps it is an effect of multicore or hyperthreading? That could easily make the CPU time misleading. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It depends on how we define fairly close and how close you need the estimates to be to suit your lifestyle. I have always considered the estimates to be out to lunch but it doesn't bother me. No hyperthreading or multicore here, btw.
I don't need close estimates but I understand they're nice to have if you want to cache a week's worth of WU then disconnect your modem and go fishing for a week and find all WUs will make the deadline when you return and plug your modem back in. I never disconnect my modem but I live in an area that sees infrequent electrical storms and they are relatively mild when they do occur. I also have huge spike protection and I rent my modem (no choice, ISP's rules) so I feel differently about zapped modems than do people who own their modems and live in areas that get major electrical storms. |
||
|
|
|