Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2016 times and has 14 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
No work available

Come on, this is not something I want to see.

"Mon 12 Dec 2005 07:46:31 PM CST|World Community Grid|Message from server: (there was work but it was committed to other platforms)"

They can't keep enough wu's queued up for Linux machines?

This is my highest producing machine and it's out of work.
confused
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 13, 2005 1:56:22 AM]
[Dec 13, 2005 1:43:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

I now have a new work unit on this machine, thanks.

But please, if I'm willing to donate my machines time and the cost of running the WCG program 24/7, all I ask is that that I don't have them running idle, waiting for work.
[Dec 13, 2005 2:27:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

ladypcer2,

If your concern is about the cost of electricity or more work being done on your CPU, then you should know that if you don't have any work units, World Community Grid will only use as much power and CPU time as any other background application. This is generally less than leaving a browser window or word processor open.

We try to keep the work units ready to go, but since we've only been running the Linux client for a little over a month, there are still sometimes some random things that occur that cause strange behavior. Such is the way of computers, I'm afraid.
[Dec 13, 2005 2:38:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

nelsoc,

If I were you I'd go and find somewhere to hide ... wink

Some of us "Linux Users" utilise older machines just to run a project such as this ... we actually do know how much it "cost" to run our systems ... the point, I'm sure you'll agree, is finding these systems idle. It is a little disheartening to say the least.

LP is one of a band of people I would call "dedicated" to "crunching" and I know how frustrating it is to find a system down through lack of work.

UD springs to mind ... shhh

Regards
mucks cool
[Dec 13, 2005 3:39:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

nelsoc, thank you for your reply.
I have run DC projects since 1999 (starting with Seti) and like Mucks says, I am a dedicated cruncher. I run a small crunching farm on my own dime, and am pretty serious about it.
I'm not meaning to sound too terse. I know the ups and downs of most DC programs, and that they have bumps in the road every so often.
My concern is, not to have a machine running idle because it has no work available to it, while it could be running another program.
I could just attach to another Boinc program and set it to switch out every so often, but I prefer to just run WCG full time, for now.
It's just that since I have been through several Boinc programs, maybe I am somewhat gun shy, as most of the others have had problems with running out of work, leaving machines sitting idle, and I'm hoping that's not going to get started here. I guess my tolerance is low at this point.
I think it's a fairly small thing to ask that the project make sure there is enough work to go around.
[Dec 13, 2005 6:07:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

ladypcer2,

Perhaps my previous post wasn't clear enough. I'm sorry for that.

I appreciate your response, so please be aware that we have not run out of work units. The simple fact that you have more work to do now can attest to that. We are aware that other DC projects have run out of work, and if, somehow, that happens here, we will let our users know and be completely honest about it. It may seem like it sometimes, but we know that we are not operating in a DC vacuum. smile
[Dec 14, 2005 3:53:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work available

ladpcer2,

We apologize that you experience this error. Let me give you a run down on what happened and what we are doing to prevent this from happening again:

First - how is work assigned by BOINC:

1) First some terms that BOINC uses that is a little different then you would normally think. When a workunit is loaded into BOINC, it automatically creates multiple RESULT records for that workunit. A particular result for the workunit is assigned to a particular client (and the result record is eventually associated with the actual result when the client completes it).

2) When a BOINC client requests work from the server, it is contacting a CGI executable called the scheduling server. This checks a shared memory segment that contains a list of 210 results ready to be handed out. This shared memory segment is populated by something called the feeder. Every 15 seconds it scans the memory segment and fills up any slots in the list that have been emptied due to a result being assigned to a host.

3) The Human Proteome Folding project is one of the slightly more unusual projects in that results computed on Linux deviates slightly from results computed on Windows. We have verified that this in no way impacts the value of the results and its usefulness to scientists. It does however impact our backend validation processes. As a result we have set boinc to assign all results for the same workunit to be sent out to all Windows machines or all Linux machines so that the results can be validated effectively. When BOINC sends the first result for a workunit to a host it marks that workunit as being a 'Windows' or 'Linux' workunit. All additional results for that work unit will then be assigned to the appropriate OS.

4) The web address www.worldcommunitygrid.org and secure.worldcommunitygrid.org point at our load balancer which distributes the load to two web servers (which is where the cgi scheduling server resides). Unfortunately our load balancer is a bit out of kilter and it actually sends about 60% of the load to one web server and 40% to the other. The way that BOINC handles running the scheduling server is that it loads odd number results into the shared memory segment on one server and even numbered results into the shared memory segment on the other server.

Ok - that is the background. Now for the problem.

Usually the shared memory segment is filled up with some work marked for Linux, some work marked for windows and some work that has not yet been assigned to a particular OS. This is how it should be. However, periodically due to the unbalanced load balancer and the fact that about 85% of our BOINC clients are Linux and 15% are Windows and that Windows boxes tend to follow the sun (i.e. they are on during the day and off at night while Linux boxes tend to have a more even distribution), the shared memory segment for the server that gets less traffic can become filled with results that are only marked for Windows. This will cause Linux clients requesting work from this server to get the message that you received.

What I am doing to mitigate this is that we are periodically flipping which server gets odd and even results to distribute (I also have a monitor in place now to detect when we are getting close to this condition). This has improved the situation significantly but it is far from an ideal setup. We are looking at changing the way the shared memory segment is populated so that it would use a locking mechanism for results rather then just assigning odd or even results. This would permanently fix the problem.

Now there was a second condition that occurred on Monday night that made this situation worse. BOINC at World Community Grid has nearly completed its first batch for the Human Proteome Folding project and is about half way through its second batch (I am loading a third batch now). The first batch had a 600 out of around 40,000 workunits left to finish on Monday. These 600 workunits weren't finish because they were waiting for one more result to be returned so that validation could be performed. We get about 99.7% of results that we ever receive back - back in the first seven days after a result is assigned. These 600 workunits had results that had been out for more the 10 days (and thus it is very unlikely that we will get any of those results back). BOINC won't issue new results for these workunits until those results have hit their 21 day deadline. The Human Proteome Folding project uses a LOT of disk space and we wanted to finish up the first batch so that we could free up that space. So I triggered an extra result for each of those workunits. These were the next results loaded into the shared memory segment as slots became available. What I did not realize is that those 600 workunits had almost exclusively been assigned to windows machines. Within a couple of hours the scheduling servers only had work to assign to windows machines. It took about an hour and half for this situation to clear. It was during this time that you experienced the problem.

We will be looking at doing that differently if we do that again in the future (for one - we will only do a few at a time and not 600 at once).

I want to assure you that we have plenty of work that needs to be done and that we are taking steps to ensure that clients from all platforms are always able to obtain new work.

Kevin
[Dec 14, 2005 4:01:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

Ah! Thank you both very much, and yes that helps explain things.

I have my "queued wu" setting low because I hadn't had any problems with getting new wu's and because I didn't want to be a "wu hog". Maybe I should bump that up a little to help cover such events, too.

In the last day or two, I have switched all of my machines (Windows and Linux) over to running the Boinc version of WCG, as I prefer to that over the UD version.

Thank you again for the explanation. I do appreciate being kept informed on how the process works, and will be a little more patient now. wink
[Dec 14, 2005 6:43:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work available

In general if you have computers continually connected to the internet I agree that keeping the setting very low is good. However, due to the occasional hiccup as describe above or even just due to routine maintenance and database backups, it is handy to have an extra workunit cached. The caching mechanism is one of the things I love about the BOINC agent!

thanks for your ongoing support and participation!

Kevin
[Dec 14, 2005 7:28:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No work available

I'm a complete n00b to BOINC but having a wee dabble with it. Can anyone give an idiot's guide to caching extra units? One thing I like about the UD s/ware is the UDMon utility, which enables me to keep a extra WU (or 2) in case of server or internet problems. I'd like to do the same with BOINC biggrin
[Dec 14, 2005 9:49:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread