Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 44
Posts: 44   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 29722 times and has 43 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

HFCC_ 00005798_ TrkB_ 0000_ 0-- Valid 3/14/09 15:44:42 3/14/09 18:11:25 2.19 48.1 / 51.0
HFCC_ 00006774_ TrkB_ 0000_ 0-- Valid 3/14/09 19:04:40 3/15/09 00:45:10 2.19 48.1 / 50.2
HFCC_ 00006774_ TrkB_ 0001_ 0-- Valid 3/14/09 19:04:41 3/16/09 01:29:08 2.19 48.0 / 50.6


Seems like the cpu performance tester in the WU isn't all that accruate or consistent.

No complaints; just observations.

Okay... let me non-valore expend some time: You only see 2.19 hours. Could be really 2.1850 or 2.1949 hours for the run.

The little benchmark has a double function to do with the validation of the result and setting a comparison with the server base reference. If your machine had only the slightest of different load, the benchmark would vary and it does. The claim is the claim per hour run time and only changes when the general client benchmark is refreshed. Without software update every 120 hours it's updated.

And, given there is kernel time and some PF delta and it varying too based on the non-deterministic content of the computations, this is in fact an extremely tight sample, with slightly varying flops, something which is not considered when the claim is computed.


Sekerob,

thanks for the explanation. I appreciate it.
[Mar 18, 2009 3:35:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

David,

Which work units are these that you don't think will make the deadline? What are the deadlines you see and how much cpu time have they used up? Don't abort because if they're a little bit late you'll still credit. At the moment you can turn in work that is 5 days overdue and still receive credit. (Note: At the moment means that this value does change over time without notice) Once your work unit is past the deadline though another work unit will be sent out to another host.

-Uplinger

Monday I aborted 41 HFCC Tasks sitting in queue that together with another sixty or so came in at 0:46 hours TTC, when the device mean actual is exactly 3 hours. Just now I canceled 1 more of that lot which I knew would start after the deadline, and 2 are still running that I know will cause extra copies to be send. My view is, that it saved 42 results being crunched in duplication of effort. That is to me the overarching... 126 hours gone toward none-duplicated work, when not required. Micro managed it may be, but still, why not do so when it's obvious.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 18, 2009 11:21:13 AM]
[Mar 18, 2009 11:20:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
oliverstirling@hotmail.com
Cruncher
Joined: Aug 23, 2008
Post Count: 2
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

I know HFCC wu's were being lengthened from 3 or 6 hours but I've just received a couple that are going to take 36 and 26 hours respectively. Are all the new wu's for this project going to be in this region?
[Mar 18, 2009 11:49:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

No, BOINC is a mess in time estimation when the flops and run time changes. My quad ran dead mean of 3 hours on the first batches with 4 day deadline (did about 120 till now). The 6 hour target mean has been quickly increased to 7 hours in fact with a deadline of 10 days... once they come out of the feeder.

That said, the 7 hours is mean. When 3 hours mean, my quad had a few tough ones and yesterday took 10:42 hours to finish one, so there is some substantial variability at times.

Anyway, just let them run and BOINC will learn, if you stick to just HFCC ;-)

Added: Oliver, as a suggestion, can you change your member name to exclude the email address part (My Grid > My Profile). You can store that in the My Forum Profile so all members can find it under the envelope icon in your forum signature area. For your own protection against spammers, pfishers and what not.

cheers

Edit 2x: spelling and note added.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Mar 18, 2009 12:02:20 PM]
[Mar 18, 2009 11:57:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
oliverstirling
Advanced Cruncher
United Kingdom
Joined: May 7, 2007
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Unit Information

Cheers for that.

Have changed my profile name as well, didn't really think about it when I switched over from grid.org (my bad!)
[Mar 18, 2009 12:10:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

David,

Which work units are these that you don't think will make the deadline? What are the deadlines you see and how much cpu time have they used up? Don't abort because if they're a little bit late you'll still credit. At the moment you can turn in work that is 5 days overdue and still receive credit. (Note: At the moment means that this value does change over time without notice) Once your work unit is past the deadline though another work unit will be sent out to another host.

-Uplinger

Monday I aborted 41 HFCC Tasks sitting in queue that together with another sixty or so came in at 0:46 hours TTC, when the device mean actual is exactly 3 hours. Just now I canceled 1 more of that lot which I knew would start after the deadline, and 2 are still running that I know will cause extra copies to be send. My view is, that it saved 42 results being crunched in duplication of effort. That is to me the overarching... 126 hours gone toward none-duplicated work, when not required. Micro managed it may be, but still, why not do so when it's obvious.

If this redundancy cancellations could be automated. Someone reading and agreeing, on one manually(?) aborting and not resubmitting helps.

HFCC_ 00006061_ TrkB_ 0003_ 1-- Aborted 3/18/09 12:26:08 3/18/09 12:55:50 0.00 0.0 / 0.0
HFCC_ 00006061_ TrkB_ 0003_ 0-- Valid 3/14/09 12:26:35 3/18/09 12:42:04 2.99 46.1 / 42.7

if the other volunteer would do the same for below it saves another few hours:

HFCC_ 00006061_ TrkB_ 0004_ 1-- In Progress 3/18/09 12:26:08 3/19/09 20:06:56 0.00 0.0 / 0.0
HFCC_ 00006061_ TrkB_ 0004_ 0-- Valid 3/14/09 12:26:35 3/18/09 13:08:31 2.99 46.2 / 43.9

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 18, 2009 1:26:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Unit Information

David,

Which work units are these that you don't think will make the deadline? What are the deadlines you see and how much cpu time have they used up? Don't abort because if they're a little bit late you'll still credit. At the moment you can turn in work that is 5 days overdue and still receive credit. (Note: At the moment means that this value does change over time without notice) Once your work unit is past the deadline though another work unit will be sent out to another host.

-Uplinger


They are all the first HFCC work units that were sent out - 0000*.
The earliest deadline is 3/18/2009 6:05:34 AM. The lastest deadline is 3/18/2009 7:27:59 AM. I have quite a few of those work units.

I should just let them run and not worry about the deadline, correct?

Edit: Most of them haven't started yet. 8 are currently running but they will return in time as well as several more that are waiting.



I set an alarm and got up at 1:00 AM (my time) to check on these. Watched the results until 2:30 AM then went back to bed. They were are running and validating just fine. When I checked the results I didn't see that they were being sent out to another machine even though they were late. Sometime this morning all work units that were past deadline and not already running were "aborted by project". Those work units have been sent out to other machines.

I still don't know what would have been the best thing to do. confused I don't want any work to be wasted through unnecessary duplication. Were the work units that I returned late sent out to other computers like the ones that were aborted by project were? As I said above, that doesn't appear to be the case looking at the results of my work units that were returned past deadline. Would the past deadline work units have been aborted if it had been on a weekend instead of Monday morning?

Sorry for all the questions but I just want to know what to do should this happen again.
----------------------------------------

[Mar 18, 2009 3:26:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

The "aborted by project" has the same effect as manually canceling before they expire by deadline. Difference is, only 2 were overdue from my quad and 1 was aborted probably automatic on the wingman's machine, since my result came in just after.

When relying on automated cancellation it's not certain when the client communicates with server, because only then will the abort message be fetched. For 24/7 connected machines it could not have been long though i.e. the loss of duplicate crunch time would have been limited.

The weekend question I don't know, but venture it's an automated process.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 18, 2009 3:37:38 PM]
[Mar 18, 2009 3:35:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

To add to the discussion, think to have read that the latest clients in test have a feature to cancel tasks, not yet started, when they are past deadline.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 18, 2009 10:24:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Unit Information

in 79 work units i have come across 1 error

HFCC_ 00013718_ TrkB_ 0000_ 4--  	jay9850  	  Error   	3/17/09 23:02:36  	3/18/09 02:19:44  	0.00  	0.0 / 0.0


Workunit Status  	

Project Name: Help Fight Childhood Cancer
Created: 3/15/09
Name: HFCC_00013718_TrkB_0000
Minimum Quorum: 1
Replication: 1


Result Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
HFCC_ 00013718_ TrkB_ 0000_ 4-- Error 3/17/09 23:02:36 3/18/09 02:19:44 0.00 0.0 / 0.0
HFCC_ 00013718_ TrkB_ 0000_ 3-- Error 3/17/09 19:19:50 3/17/09 22:59:46 0.00 0.0 / 0.0
HFCC_ 00013718_ TrkB_ 0000_ 2-- Error 3/17/09 14:25:19 3/17/09 19:09:41 0.00 0.0 / 0.0
HFCC_ 00013718_ TrkB_ 0000_ 1-- Error 3/17/09 08:04:34 3/17/09 14:24:31 0.00 0.0 / 0.0
HFCC_ 00013718_ TrkB_ 0000_ 0-- Error 3/15/09 13:40:53 3/17/09 08:02:53 0.00 0.0 / 0.0


Result Log  	

<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
A network adapter hardware error occurred. (0x39) - exit code 57 (0x39)
</message>
<stderr_txt>
Failed to get VersionInfo size: 1812
INFO:[22:18:35] Start AutoGrid...
ERROR: Unknown ligand atom type Si
add parameters for it to the parameter library first!
autogrid failed. rc = 57. Exiting
called boinc_finish

</stderr_txt>
]]>


not sure if anybody else has had errors with any other WUs, seems like this is just a bad WU.
[Mar 18, 2009 11:24:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 44   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread