Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 140
Posts: 140   Pages: 14   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 219133 times and has 139 replies Next Thread
Dieter Matuschek
Advanced Cruncher
Germany
Joined: Aug 13, 2005
Post Count: 142
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

@jeanne95132

I have a monster running up to now 85 hours of elapsed time @ progress of 61% and let it running further.

The previous replications indeed errored out due to CPU time limit.
But I've checked max CPU time allowed for my WU as described above.
It's way enough.


Workunit Status

Project Name: Beta
Created: 24.04.09
Name: BETA_CMD2_0001-PP1BA.clustersOccur-TPM3A.clustersOccur_8
Minimum Quorum: 2
Replication: 3

Result Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 8_ 5-- In Progress 28.04.09 22:37:28 03.05.09 21:25:28 0.00 0.0 / 0.0
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 8_ 4-- In Progress 27.04.09 18:52:07 02.05.09 17:40:07 0.00 0.0 / 0.0
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 8_ 3-- In Progress 27.04.09 07:44:06 02.05.09 06:32:06 0.00 0.0 / 0.0
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 8_ 1-- Error 25.04.09 05:32:39 28.04.09 22:37:08 81.56 641.1 / 641.1
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 8_ 2-- Error 25.04.09 05:32:23 27.04.09 07:31:09 36.49 744.8 / 744.8
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 8_ 0-- Error 25.04.09 05:32:19 27.04.09 18:51:59 48.95 724.7 / 724.7
----------------------------------------

Ask not what the world can do for you - ask what you can do for the world.
----------------------------------------
[Edit 2 times, last edit by Dieter Matuschek at May 1, 2009 9:38:55 AM]
[May 1, 2009 9:35:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Jeanne,
I think that if the CPU time limit had not been already increased 10 times you would have already hit the max CPU time limit, or at least you have probably hit it since you have posted. So let's assume that this point is OK if your WU is still running.

If it does not bother you to let it run I would ask you to not abort it because that will be helpful for the techs and the scientists for sizing the future big WUs when the project really starts.

To be fair I must warn you that the TTC shown by Boinc is probably much below what it will actually be. A linear extrapolation would already give 18 hours to go after your measurement at 75.337 %, and a computation based on the progress speed between 74.920 % and 75.337 % gives 63 hours to go! I hope I am wrong, and it's possible since these two measurements have been made on a short interval, however the progress speed they are giving (0.389 % per hour) is close to what I have for my own monster WU.

Tkank you for your help, and good luck. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 1, 2009 4:15:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
petehardy
Senior Cruncher
USA
Joined: May 4, 2007
Post Count: 318
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi Everyone,

I wanted to update you all on the 2 monsters I managed to grab.

First:-
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 25_ 4

This one is running on a Phenom 9600, XP Pro SP3, Boinc 6.2.28.
It is 50.6% complete after 73hrs 40min, based on linear calculations it is speeding up slightly.
This came with a FPOP setting of 8000 Zillion.

Second:-
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 19_ 2

Running on a dual socket Athlon MP 2000+, W2K3 R2, Boinc 5.10.45.
It is 61.46% complete after 144hrs 50min, this one is also speeding up a bit.
I got this one first and changed the FLOP setting to 9000 Zillion.

You can see that the old 32 bit machine is really good from a badge hunting point of view, but not so good from the credit angle.

Both jobs will probably become "No Reply", which looks like good news for alert badge hunters:

If more copies are sent out and they get started before mine are returned, we can all get some good badge hours.

I'm hoping to get 15-16 DAYS worth.


Pete
----------------------------------------

"Patience is a virtue", I can't wait to learn it!
----------------------------------------
[Edit 3 times, last edit by petehardy at May 2, 2009 4:03:08 PM]
[May 1, 2009 7:03:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mathilde2006
Senior Cruncher
Germany
Joined: Sep 30, 2006
Post Count: 269
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hello,

today
BETA_ CMD2_ 0001-1RKC_ A.clustersOccur-1YDI_ A.clustersOccur_ 5_ 6--

arrived.
After 1 hour 1 % work was done, after 2 hours 2%- after 3 hours ~2.5% with current 8h20m to go (from 6h shown after download).

Well, I'm very curious, how long I'll crunch (Intel Quad 9400 -2.66 ghz with Vista)- looks like 100-120 hours with a return time of 5 days. thinking
----------------------------------------

[May 1, 2009 8:02:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi Jean,
I hope that the boundary actually was globally reset, because I finally found the client.xml file and nowhere could I find anything about fpop-boundary under the only beta WU I found (exported it to Word, so if Word can't find it, it doesn't exist). Curious as to whether the missing fpop line has anything to do with 64-bit cpu?--does it have different instruction set? But guess it doesn't really matter, if the end result is the same.

o.k., I will let it run to the bitter end so the scientists and techies can figure out WU sizes that won't make us anxious and pro-abortion. You would think there would be a set of standardized BOINC parameters so these things don't happen.

Was the random data I posted useful, or should I give it up, since I definitely will not be spending 24/7 in front of my computer for the next week.

In any case, we are now at
CPU PROGRESS TTC MYTIME
50:39:00 74.920 13:29:05 23:39, Apr30
61:15:31 78.458 13:24:20 15:02, may1
Was 67 CPU hours the previous boundary?

Followed someone else's lead and suspended all the other WUs to give it more space.

Thanks for your help.
Jeanne
[May 1, 2009 10:42:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TimAndHedy
Senior Cruncher
Joined: Jan 27, 2009
Post Count: 267
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

My long running unit was the _3.
Does the No Reply from _2 mean there is no one left.

Workunit Status

Project Name: Beta
Created: 4/24/09
Name: BETA_CMD2_0001-PP1BA.clustersOccur-TPM3A.clustersOccur_11
Minimum Quorum: 2
Replication: 3


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 11_ 5-- 610 Aborted 4/27/09 04:49:17 4/28/09 03:46:27 15.42 264.1 / 264.1
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 11_ 4-- - No Reply 4/27/09 02:08:12 5/2/09 00:56:12 0.00 0.0 / 0.0
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 11_ 3-- 610 Too Late 4/26/09 13:23:01 4/30/09 22:38:10 101.61 2,820.9 / 0.0
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 11_ 2-- 610 Error 4/25/09 05:26:57 4/27/09 04:45:46 38.47 709.5 / 709.5
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 11_ 1-- 610 Aborted 4/25/09 05:26:32 4/26/09 13:05:46 7.90 114.4 / 114.4
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM3A.clustersOccur_ 11_ 0-- 610 Error 4/25/09 05:26:28 4/27/09 01:18:26 42.32 733.7 / 733.7
[May 2, 2009 2:19:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi Jeanne!
I don't know if you have not found the fpops limit because Word is hiding something (I don't use Word) or if it is because you have not looked for the right thing. If you want to try again you can simply use Notepad and look for <workunt>. Right after you will probably find "your" WU, otherwise look for the next <workunt> line and so on. Once you have found the right <workunt> group the parameter that we are interested with is <rsc_fpops_bound>. To be safe it should have 15 zeroes before the decimal point.
But again, this is just for the fun, because with the original limit your WU would have already exceeded that limit. The 67 hours corresponding to this limit is for machines slower than yours.
You would think there would be a set of standardized BOINC parameters so these things don't happen.

Boinc is doing what it can, and it is precisely using the limits given with the WU to stop computing when these limits are reached. Boinc does not know what is done inside the WUs and it is of no help to set these parameters to appropriate values.
Was the random data I posted useful?

It helped me to make some calculations but, considering the slowness of the process, more spaced measurements are enough and more useful. What you give in your last post (one every 10 hours) is exactly what is needed to check that things are going as expected. And if you cannot do it the WU will still go on. smile
I have checked with your latest measurement at 61.25 hours and it confirms a total runtime of 110-115 hours, i.e. still 50-55 hours to go at the time it was taken.
Followed someone else's lead and suspended all the other WUs to give it more space

This was for a P4 HT where two threads are competing for a single physical processor. For a P4 HT it was a very good decision.
In your case you have four physical processors and one WU cannot use more than one, therefore you can go on running four WUs, it will not slow your big one at all. The only point which could matter would be if not enough RAM is available, but it is not relevant in this case because those monster WUs are using only 5 Mbytes of RAM, i.e. peanuts in our modern world of huge program sizes. If I wanted to make a naughty joke I would say that 5 Mbytes is just enough for Vista to tell you if your computer is switched on or off. smile

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 2, 2009 7:16:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

petehardy,
Things seem to go pretty well for this CMD2 beta test and for your Beta badge! smile
My own monster is going much like your second one and I am still on a 235-240 hours basis for total runtime, i.e. more or less 10 days! My job is already shown as "no reply" on the website together with another one. Two more copies have been sent and another one will be sent this afternoon probably. Like you I wonder if these extra copies should have been avoided?

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 2, 2009 7:25:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi Mathilde2006!
The only thing which is certain about your WU is that you can definitely ignore what Boinc has to say about time to completion for this WU! smile
As you said it started on a 100 hours basis, but progress between 2 and 3 hours makes me think that you might be gone for much longer, say around 200 hours, like many others in this thread.
Keep an eye on it if you want to know where you are going, and consider only the progress speed if you want your forecast to be close to reality.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 2, 2009 7:32:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

My long running unit was the _3.
Does the No Reply from _2 mean there is no one left.

Not exactly TimAndHedi. The "No reply" for copy _4 (not _2) still has chances to come back sometime if like most of us it is still computing after the deadline.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 2, 2009 7:39:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 140   Pages: 14   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread