Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 149
Posts: 149   Pages: 15   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 19629 times and has 148 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

Hello Questar,
Yes, knreed will run this script again. He is taking a week long vacation, so don't expect it immediately. If your computer is very slow, you might want to abort any work units you have not started yet.

Lawrence
[Aug 10, 2008 4:09:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

I thought that recent jobs had its timeout adjusted by the server and it appears that the adjustment didn't work. I was told that the adjustment was done while I was running my second long task. At that time one of the tasks has aborted, the second one was still running and mine was running as a make-up job for the one that got aborted.

After my job was done, the other initial task timed out, so another computer was assigned to fill in and that computer timed out. Which implies that the adjustment didn't work.

faah5015_ 1htg_ 1qbr_ 00_ 4-- In Progress ......... 08/09/2008 03:56:07 08/15/2008 06:20:23 0.00 0.0 / 0.0
faah5015_ 1htg_ 1qbr_ 00_ 3-- Error .................. 08/07/2008 08:31:31 08/09/2008 03:43:19 35.24 596.2 / 0.0
faah5015_ 1htg_ 1qbr_ 00_ 2-- Pending Validation 08/02/2008 15:04:22 08/06/2008 01:54:39 56.87 655.7 / 0.0
faah5015_ 1htg_ 1qbr_ 00_ 0-- Error .................. 07/30/2008 07:55:47 08/07/2008 07:48:30 50.45 837.0 / 837.0
faah5015_ 1htg_ 1qbr_ 00_ 1-- Error .................. 07/30/2008 07:41:31 08/02/2008 14:49:55 0.70 3.1 / 0.0

[Aug 10, 2008 12:18:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

Did you follow the instructions of knreed earlier in this thread how to check the amount of timeout fpops and how to increase them (when the client is not running)?

[Added: Forced a few jobs to the front and present shorter show a factor 10 ( <rsc_fpops_bound> divided by <rsc_fpops_est> ):
<workunit>
<name>faah4297_indazoleSO3H_MIN_xmd01130_01</name>
<app_name>faah</app_name>
<version_num>605</version_num>
<rsc_fpops_est>24713681743955.000000</rsc_fpops_est>
<rsc_fpops_bound>247136817439550.000000</rsc_fpops_bound>

<rsc_memory_bound>125000000.000000</rsc_memory_bound>
<rsc_disk_bound>209715200.000000</rsc_disk_bound>

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Aug 10, 2008 2:17:50 PM]
[Aug 10, 2008 12:30:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dan60
Senior Cruncher
Brazil
Joined: Mar 29, 2006
Post Count: 185
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: this is a really long work unit

I've gotten this faah5011_1gnm_1hps_00_4 which has been being processed for 29:00:00 and still will keep on for some 7:50:50. I might not get credits for it, but it isn't the reason why I'm into FightAIDS@Home, so it will be delivered today (I hope so), one day overdue. smile


best regards
[Aug 10, 2008 1:57:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

Yes I did Sekerob. Mine is the one that is pending validation.
[Aug 11, 2008 10:57:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

Finally it gets validated. :) 648 credit for it.
[Aug 12, 2008 3:10:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
E165852
Cruncher
Joined: Jul 17, 2008
Post Count: 4
Status: Offline
Reply to this Post  Reply with Quote 
angry Re: this is a really long work unit

I have cought faah5013_6upj_1izi_00 It may run about 16-17 hours. Hum, better than the last one which has spent more than 34h and still in verification status. However, the mini monster I cought today, a guy already has completed the computation, I will eat the rest of part up.
It seems a considerable number of monsters has released around the end of July and now timed out WU are re-distributed again. How long will it last....? cool
[Aug 18, 2008 8:24:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: this is a really long work unit

Hi E165852,
Yes, there have been a lot of posts by knreed et al on the batch of overly large work units that was produced by a mistake in the script that sizes work units. The estimate (a week ago) was that we would cover them all about the 18th so at that point we could make up any error units. We'll just have to see how long that takes.

Lawrence
[Aug 18, 2008 8:50:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: this is a really long work unit

Would it be a good thing if you could find a way to schedule these extra large WU's only to systems that are running fairly quick CPU speeds?


Yes - there are actually a lot of advantages to doing this. We have been working with David Anderson and BOINC to get this capability added. David has done a lot of work on this already and the folks at Superlink@Technion! are the first BOINC project to put the new code into production. We will be updating our servers to utilize the new code later this year.

Once we have the code, the server will assess the 'effective power' of the computer requesting work and try to send it work that won't take it more than a day or so. Effective power is the raw power of the computer * the amount of time that BOINC is allowed to run work on the computer.

Once we have tested this and feel good about it, we will modify how we create workunits so that there is a lot of variation in the size and computers will be able to get the appropriate size of work. This will reduce load on our servers as we will be able to send bigger workunits to those powerful always on computers and it will improve our ability to effectively use those computers that are less powerful and are only on infrequently (and thus have a hard time completing work currently).

So it is a definite advantage to do this and we are anxious to get this in place.


Knreed,

Ok.. so I've been turning the algorithm around in my head for a while and I'm wondering if the scheduler / dispatcher code will take into consideration the queue depth a particular client already has prior the dispatch of a WU to a client.

As I understand things, one of the values used help determine how reliable a client is perceived is how long (total wall clock time) it takes for a WU to be returned from the time of dispatch.

So (in my simple mind) even if a client exists that has a very high clock speed; having a queue depth of WU's pending to execute should have some kind of effect on how that client is perceived as being reliable.

Why? Well, a WU that is dispatched to a client that has a queue depth of say 10 days, will likely return the WU Sometime around the 10th day. Of course this is perfectly fine because the WU completes and is returned withing the maximum time period. Additionally, the credits are the same as if the WU was returned within say the total CPU time + 12 minutes.

But, the client that returns the WU within the total CPU + 12 minutes is more reliable simply because the WU is indeed returned more expeditiously.

Wadaya say?

N' thanks in advance for your consideration.
[Aug 21, 2008 12:56:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: this is a really long work unit

Barney, the logic used to determine a "reliable" host is independent of the logic that will be used to determine the best size of work unit to send.

For rush jobs sent to "reliable" hosts, the turn-around time is important. For normal tasks, this is not a consideration.
[Aug 21, 2008 1:27:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 149   Pages: 15   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread