Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 140
Posts: 140   Pages: 14   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 219126 times and has 139 replies Next Thread
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Finally I have reactivated my monster WU for a few more hours to collect some more measurements.

Just before killing it I thought of computing its computing speed between two consecutive measurements, i.e. the %/hour over this interval and it appears that this speed is more or less stable around 0.43 % per hour of runtime since the second interval between 6 and 8.56 %.

This is good news regarding its probability to reach completion ever, however at this speed the total runtime should be about 220 hours, i.e. 9.2 days and this is still far too much, first, and above the Kevin's new 7-day deadline anyway.

The ever increasing total runtime that I computed previously was the consequence of its "fast" start at 2.4 %/hour between 0 and 6 %.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 25, 2009 10:06:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Did your deadline change affect all beta WU's or just the ones with similar names as in the thread heading? My Beta WU originally had a three day deadline. Now it has an 7 hr deadline. What caused the deadline change with this WU?

Should I abort it? Based on it's current time spent (17:31h and 27.53%done), it will take 63.6 total hours to complete. It's running on a P4 HT against a Rice WU.

Darn! It seems that Kevin has had a finger jam and that it has changed the deadline for "in progress" CMD2 beta WUs to 7 hours instead of 7 days. I am seeing it too on mine. crying

Leave him some time to correct his mistake and let your WU run since it seems that it should be within both limits (normal deadline and max flops).

And thank you for reporting this mistake so fast. Jean.

*edited for inappropriate language - CIH
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 27, 2009 3:43:05 PM]
[Apr 25, 2009 10:19:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

It looks as if the projected time for this unit is pretty constant now at 127hrs!

Yep! So it will exceed the current maxflops limit much before the 67 hours that Kevin said (on your racing beast it will be about 50 hours or less).

Therefore when you think you have nothing more to learn from it you will better kill it.

Sorry about that, sprigo. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 25, 2009 10:34:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sprigo
Cruncher
England
Joined: Apr 30, 2007
Post Count: 37
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
shock Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

It looks as if the projected time for this unit is pretty constant now at 127hrs!

Yep! So it will exceed the current maxflops limit much before the 67 hours that Kevin said (on your racing beast it will be about 50 hours or less).

Therefore when you think you have nothing more to learn from it you will better kill it.

Sorry about that, sprigo. Jean.


Killed after nearly 17 hours. crying
----------------------------------------

[Apr 25, 2009 10:44:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
sad Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

BETA_ CMD2_ 0001-PP1BA.clustersOccur-PTN11.clustersOccur_ 5_ 1--
is my last in progress and is proceeding normally at 85% complete with another 1:20 or so to go. It was due back by 4/27 but now shows due back 05:52 (7 hours after it was sent) so it is now well past the deadline even though nearly finished. Fortunately, the log doesn't show that it has been reassigned yet due to no replies.
It is still crunching and I hope it won't be reassigned. I did have one pending a while ago (also due back 4/27) and changed to 4/25 and the third wing man showed too late after the minimum quorum of 2 was met so # 3 lost out sad
[Apr 25, 2009 11:35:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Luckily I didn't get any of these supersized WUs. My lot ranged from 1.2 - 12.5 hrs.
For those of you with long-running WUs that still have short deadlines, and which you want to complete, you may exceed your BOINC's CPU time limit.
This problem occurred in Aug 2008 with some errant batches of FAAH WUs, and knreed posted a possible workaround.
Sorry, Kevin, if you deliberately did not mention this in your posts here, but it is still right there at Re: Second really long work unit received
Posting time was [Aug 4, 2008 5:03:04 PM] UTC, or use your browser to search the page for rsc_fpops_bound
- HTH -
<<=== shiny new badge may go there in a day or 2 cool (PS: Hope the new forum software allows bookmarking individual posts, not just pages.)
[Apr 26, 2009 12:19:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Bad news. After almost 33 hours, and aroound 39% of work done, the WU just became a Computation Error.

It is a Beta risk when we accept Betas... no problem. Let get more. I am ready. coffee
----------------------------------------
Cheers ! GIB@ peace coffee
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Apr 26, 2009 12:25:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Did your deadline change affect all beta WU's or just the ones with similar names as in the thread heading? My Beta WU originally had a three day deadline. Now it has an 7 hr deadline. What caused the deadline change with this WU?

Should I abort it? Based on it's current time spent (17:31h and 27.53%done), it will take 63.6 total hours to complete. It's running on a P4 HT against a Rice WU.

Damned! It seems that Kevin has had a finger jam and that it has changed the deadline for "in progress" CMD2 beta WUs to 7 hours instead of 7 days. I am seeing it too on mine. crying

Leave him some time to correct his mistake and let your WU run since it seems that it should be within both limits (normal deadline and max flops).

And thank you for reporting this mistake so fast. Jean.


Fixed my screw up - sorry for the problems with that. Deadlines are now at 7 days.
[Apr 26, 2009 1:46:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

@Rickjb
The trick should still work but it is not really simple and there is a real risk of adding more mess than solving problems. Personally I would not recommend it and I will not use it. Anyway, at its latest speed, my monster would need more than 9 days of runtime and I would also have a deadline problem, even with the correct 7-day deadline. I have stopped it again after a few more measurements and I will kill it definitely when I am fed up of seeing it in my task list or when Kevin says he no longer needs it.

Regarding linking precisely to a particular post it is not the link which is broken or not allowed (look at it), it is a bug in the branch operation if the post is not in the first page. So that should soon be OK when the forum software is upgraded.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at Apr 26, 2009 2:13:13 AM]
[Apr 26, 2009 2:11:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Ok - try to fix some things related to this issue:

I have awarded credit to all results that errored out with -177 (0xffffff4f) ERR_RSC_LIMIT_EXCEEDED. As more results come in I will update them as well. You can tell if you have been awarded credit since you will see a value in the granted credit column. I will awarded it about once a day.

If you are currently running one of these massive workunits and want to help us out a bit more please do the following:

1) Note the name of the workunit that is running long

2) Shutdown BOINC

3) Open your client_state.xml file. It will be located at either C:\Documents and Settings\All Users\Application Data\BOINC or C:\Program Files\BOINC (on Windows).

4) Looking for something like the following. You will be looking for the workunit name that you noted above:


<workunit>
<name>BETA_CMD2_0001-1I7X_C.clustersOccur-1I7X_C.clustersOccur_0</name>
<app_name>beta8</app_name>
<version_num>610</version_num>
<rsc_fpops_est>40000000000000</rsc_fpops_est>
<rsc_fpops_bound>400000000000000</rsc_fpops_bound>
.....

5) Change this so that the field
<rsc_fpops_bound>400000000000000</rsc_fpops_bound>
becomes
<rsc_fpops_bound>4000000000000000</rsc_fpops_bound>
(note that I added one extra 0).

This increases the allowed runtime by a factor of 10. Feel free to go even longer if you wish.

Be careful to only change this one setting.

6) Save this file.

7) Delete the file client_state_prev.xml

8) Start BOINC again.


This change will give your computer upwards of 670 hours to complete the workunit (on the 'average' computer). Obviously the newer processors will have a lower limit since this number is based on the fpops benchmark.

Newer copies being issued this already applied.

The deadline is now set to 7 days for the results.

For those that were issued earlier today during the time they were marked to 7 hours and have been marked as late, I have changed so that they are back to being listed as still marked in progress.

We accept results for 7 days past the deadline and grant credit. So even if your result is going past the 7 days but still within the 14 days you can let it complete.

However, I will go ahead and start granting credit for user aborted results as well - so feel free to abort if you wish and I will manually grant you credit.

I think this covers all cases. Let me know if there is something missing.
[Apr 26, 2009 2:28:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 140   Pages: 14   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread