Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2063 times and has 14 replies Next Thread
Tom WCG
Cruncher
Joined: May 30, 2006
Post Count: 31
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: start order for Wu's

I have observed practically all of the issued discussed above on my machine. Glad that I am not the only one. As long as the developers are aware of these issues (which is way beyond me), I am happy.
[Sep 24, 2012 8:41:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: start order for Wu's

I also was seeing this in the past week and it was driving me nuts. BOINC was halting CFSW jobs with seconds left and due within 24 hrs and running jobs with several days left to go. It put me in panic mode and I started to micromanage the clients. It appeared to me that panic mode is all screwed up. And my queue wasn't set to 10 days; it was temporarily set to 7 days about a week earlier for a one-time fill up of the queue.

Rob's report that the BOINC developers don't plan to address panic mode's nasty habit of suspending jobs that are nearly done is disappointing. I had a job fail when it resumed after being suspended with 0:00 time remaining. sad

Cheers coffee
----------------------------------------

[Sep 29, 2012 7:30:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: start order for Wu's

I also was seeing this in the past week and it was driving me nuts. BOINC was halting CFSW jobs with seconds left and due within 24 hrs and running jobs with several days left to go. It put me in panic mode and I started to micromanage the clients. It appeared to me that panic mode is all screwed up. And my queue wasn't set to 10 days; it was temporarily set to 7 days about a week earlier for a one-time fill up of the queue.

v6.10.xx is really bad when it comes to deadline-handling if runs multiple BOINC-projects, and will happily choose to run tasks for a project with 1+ month to deadline, while other projects tasks misses their deadline.

For other clients, on all tasks where "Connected..." > "time-to-deadline" the client will run the tasks in EDF-mode, so if "Connected..." is 1 day they shouldn't normally miss their deadline.

Where are some instances where computer will still miss deadline on some of the work. This can happen if has downloaded too much work, if estimated run-time is wrong, or if runs something other computer-heavy, or shut-down computer for some time or something. Suspending tasks or projects can also give deadline-problems. Getting GPU-beta-work can also lead to deadline-problems, since the GPU uses 1 cpu-core while running.

Rob's report that the BOINC developers don't plan to address panic mode's nasty habit of suspending jobs that are nearly done is disappointing. I had a job fail when it resumed after being suspended with 0:00 time remaining. sad

The problem is all the applications lying about the progress and how much time is left to do. Apparently some badly-behaving applications claims only 1 minute or something left to do, while in reality it's many hours left to do. Since BOINC-client also must handle these badly-behaving applications, relying on the claimed time left to do isn't possible and tasks is instead paused while other tasks needs high priority.

As for the task failing on resuming, this is strange, especially if left in memory while suspended. Still, seeing how many badly-behaving applications lies claiming no time left to do and shows "100% progress" but in reality is stuck in this state for several seconds, it's a possibility pausing at this point can lead to problems.

No good-behaving application should ever show "100% progress" before it's really finished.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
----------------------------------------
[Edit 1 times, last edit by Ingleside at Sep 29, 2012 9:25:39 PM]
[Sep 29, 2012 9:23:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: start order for Wu's

As for the task failing on resuming, this is strange, especially if left in memory while suspended. Still, seeing how many badly-behaving applications lies claiming no time left to do and shows "100% progress" but in reality is stuck in this state for several seconds, it's a possibility pausing at this point can lead to problems.

No good-behaving application should ever show "100% progress" before it's really finished.

In this particular case, it was the C4SW app. I do allow the keep in memory option, which is one reason I don't like the panic mode practice of piling up partially completed jobs. It was stuck on 00:00 time left for a longer period after I intervened and allowed it to run again. I think the error message was something about the output missing. At the time I felt it was due to pausing at a critical point. If the badly behaving apps is what is driving the devlopers to ignore the remaining time, then I would like to at least have an easier time overriding the client behavior, rather than have to babysit the client.

I was able to get all the jobs returned before their deadlines, but I doubt that the client would have done it correctly if left alone. As soon as I manually worked it through the jobs that the client scheduler was skipping, the cause of the panic mode ended. Thus the panic mode of the scheduler was exacerbating the perceived problem. FIFO would have worked just fine in this case.

Cheers coffee
----------------------------------------

[Sep 29, 2012 10:05:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: start order for Wu's

At Sekerob's suggestion ( The CEP2 Forum > Re: Research Log: Updates from the Harvard Team ), I'm trying out BOINC 7.0.42 to use its new app_config.xml feature to control the max number of simultaneous CEP2 WUs that run.
Previously I've stuck with BOINC 6.2.19, and micromanaged execution of CEP2 by having only a small number of these in the cache, and fetching them to run as required. Sometimes I'd run out of CEP2 WUs and would have to temporarily nudge the cache value up a bit to force fetching some more. By default, 6.2.19 very helpfully reports the no of seconds of work that it is fetching, so I could see whether I'd fetched too much and take quick action to reverse this. Later clients don't report the amount fetched unless you turn on some verbose debug option that swamps you with irrelevant unnecessary info. This was a major reason for staying with 6.2.19.
With 6.2.19, I sort the BOINC Manager display on %Progress, and after downloading new work, the start order for WUs that are Ready to Start is generally FIFO and the displayed list of WUs at 0% Progress usually sorts in that order too. This really helps when micromanaging. However, 6.2.19 occasionally plays random shuffle with the cache display. This can be fixed by switching to "Accessible" View and back to "Grid" View.
Where are these View functions in BOINC 7.0.42?

In 7.0.42, WU start order seems to be vey haphazard, with WUs downloaded recently running while WUs downloaded more than 1 day ago sit still Ready to Start. Reading this thread it seems that this rot was there in 6.10.58 too. No wonder I get so many repair WUs with 6.2.19! Later clients delay execution of WUs and cause many computers to miss the "fast returner" threshold that qualifies them to be sent repair WUs.
Is there a way of sorting the cache display in to-start order in 7.0.42?
Is there a way to cause WUs that are Ready to Start to start in download order, excluding those at risk of missing their return deadline?
---
Also concerning WU start order in 7.0.42, I have a question re specifying max_concurrent for multiple sciences in the new app_config.xml file, but perhaps app_config.xml deserves its own thread.
[Dec 31, 2012 8:34:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread