Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 140
Posts: 140   Pages: 14   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 219164 times and has 139 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi again, Jean,
You'll be happy to hear that after 60 CPU hours, the
TTC clock started moving down again., though CPU time and TTC time are on different planets. Data at the end.

Re fpops...Word did find it for some other WUs but not for the monster. But I've survived this long without knowing about fpop boundaries, so am giving up the search (and someone already took care of it).

What apparently did impact me though, was that on 4/27, the day the monster downloaded, three other WUs erred out. The WUs are detailed below the info on the error code. The last line of the error code repeats for several pages. If the error codes are useful, I'll be glad to grab them and post them, but they'll probably reach to the moon and back. Just noticed the 4th WU was from 4/26, so not sure if it's related.

Thanks for the info re the cores. I sort of figured it out when I suspended the other 3 and quickly unsuspended them again when the monster still kept trudging along with no increase in speed. BOINC is the only program where I ever see which core is doing what...I mostly run a stock market program on this computer, and as long as it does what I want (heavy duty charts and financial info), I don't care how it does it. I just want to fill the tank and drive, not worry about which cylinders are firing or how how fast each wheel is rotating (good thing there are people who know abut those things!).

Yes, VISTA definitely knows a thing or two about hogging memory. Debated whether to go back to WinXP pro, but then I would lose the use of more than half my DRAM. But on May 5th I will be downloading Windows 7. Win7 will be a world-changing event (half+ of the PC world will be using bad language, no doubt), and I hope for the better. Is anyone going to track how it impacts BOINC?...faster, slower, more problems, less problems...

Thanks,
Jeanne

---(The error code for the WUs that got killed off)--
<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1241630240.000000
Skipping: /computation_deadline
In ExtractGlcmFeatures: End of 0 iteration of outer loop.
In ExtractGlcmFeatures: End of 1 iteration of outer loop.
In ExtractGlcmFeatures: End of 2 iteration of outer loop.
In ExtractGlcmFeatures: End of 3 iteration of outer loop.
In ExtractGlcmFeatures: End of 4 iteration of outer loop.
In ExtractGlcmFeatures: End of 5 iteration of outer loop.
In ExtractGlcmFeatures: End of 6 iteration of outer loop.
In ExtractGlcmFeatures: End of 7 iteration of outer loop.
In ExtractGlcmFeatures: End of 8 iteration of outer loop.
In ExtractGlcmFeatures: End of 9 iteration of outer loop.
In ExtractGlcmFeatures: End of 10 iteration of outer loop.
In ExtractGlcmFeatures: End of 11 iteration of outer loop.
In ExtractGlcmFeatures: End of 12 iteration of outer loop.
In ExtractGlcmFeatures: End of 13 iteration of outer loop.
In ExtractGlcmFeatures: End of 14 iteration of outer loop.
In ExtractGlcmFeatures: End of 15 iteration of outer loop.
In ExtractGlcmFeatures: End of 16 iteration of outer loop.
In ExtractGlcmFeatures: End of 17 iteration of outer loop.
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1241630240.000000
Skipping: /computation_deadline
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error

---(The four WUs which ERRed out)----
X0000097580244200802271202_ 0-- jeanne-PC Error 4/27/09 05:44:39 4/27/09 13:22:43 0.30 5.0 / 0.0
X0000097570518200803251243_ 0-- jeanne-PC Error 4/27/09 04:20:14 4/27/09 13:22:43 1.20 20.1 / 0.0
X0000097570843200803041046_ 1-- jeanne-PC Error 4/27/09 02:29:20 4/27/09 13:22:43 2.25 37.5 / 0.0
E000564_ 477A_ 006w7o00j_ 0-- jeanne-PC Error 4/26/09 22:49:36 4/27/09 13:22:43 4.74 79.0 / 0.0







---(The TTC starts decreasing after 60 hours!!!)-------
60:25:37 78.148 13:25:12 13:39
60:34:13 78.189 13:25:33 13:54
60:50:31 78.279 13:25:33 14:21
60:56:42 78.328 13:25:25 14:33
60:59:04 78.360 13:24:30 14:35
61:15:31 78.458 13:24:20 15:02
61:25:47 78.532 13:23:45 15:19
61:47:42 78.679 13:22:50 15:55
61:49:50 78.720 13:21:45 15:59
61:51:56 78.736 13:21:32 16:02
64:27:51 79.790 13:12:44 20:21
64:51:59 79.954 13:11:04 21:01
64:57:12 79.986 13:10:49 21:10
65:24:47 80.134 13:10:24 21:56
66:19:04 80.477 13:07:09 23:26
66:27:24 80.567 13:05:07 23:39
66:43:05 80.722 13:01:46 0:06
67:36:10 81.122 12:55:21 1:44
67:45:00 81.188 12:54:29 2:01
72:52:53 83.427 12:12:09 10:44
72:59:42 83.476 12:11:06 10:56
75:10:06 84.440 11:48:18 14:38
75:25:58 84.546 11:45:53 15:06
75:34:36 84.653 11:42:18 15:20
75:44:57 84.734 11:40:07 15:37
81:38:59 87.480 10:35:00 1:33
82:00:43 87.684 10:10:06 2:11

[May 3, 2009 10:22:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
petehardy
Senior Cruncher
USA
Joined: May 4, 2007
Post Count: 318
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi Everyone,



BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 25_ 4 Pending Validation 4/27/09 22:33:52 5/3/09 07:11:47 104.77 1,558.0 / 0.0


This one finished! It was at around 75% after 100hrs but it completed the last 25% in just 4hrs or so.

Result Log

<core_client_version>6.2.28</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>



The other one :-
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 19_ 2

is at 76.3% after 182hrs, still going at it's normal pace.

Nobody else has returned a completed WU in either of these sets.



Pete
----------------------------------------

"Patience is a virtue", I can't wait to learn it!
[May 3, 2009 11:11:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

I just found out I'm also running a huge WU:
BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_40_6

After 10:25 hours I'm at 5,753% so by these numbers it will take about 180 hours to complete. I guess i will try to edit the xml-file and see if I can can get this WU to 100%, it has already been errored out:

610 Error 25-4-09 05:18:43 29-4-09 01:56:29 57.09 681.5 / 681.5
610 Error 25-4-09 05:18:38 27-4-09 14:00:13 52.04 716.2 / 716.2
610 Error 25-4-09 05:17:38 27-4-09 16:41:39 55.16 662.6 / 662.6
[May 3, 2009 4:14:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Finished one small monster today: BETA_CMD2_0001-PP1BA.clustersOccur-TPM3A.clustersOccur_7

It ran for 111 hours (on an E6600 processor), and I am the first and only one to finish so far. Three others terminated with Error status, one Abort, and one still crunching for five days now.

Progress on this work unit was not smooth: during one 24-hour period in the middle, it progressed nearly 50%. Dividing these monsters into smaller work units doesn’t appear to be an easy task.
[May 3, 2009 4:17:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TBirdTheYuri
Advanced Cruncher
France
Joined: Mar 5, 2006
Post Count: 115
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
smile Trop tard / Too late

Je viens de terminer une unité Beta qui a occupé mon Q9450 pendant 92 heures pour finir à l'état "trop tard", pourtant terminée un jour avant la deadline.
voici ci-dessous le log de cette unité :

---

I just finished a Beta who held my Q9450 for 92 hours to finish in the state "too late", though completed one day before the deadline.
Below is the log of this unit :


Nom de l'unité / Unit name :
BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 23_ 3--

Log de l'unité / Unit log :
<core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
called boinc_finish

</stderr_txt>
]]>

[May 3, 2009 4:37:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

After 10:25 hours I'm at 5,753% so by these numbers it will take about 180 hours to complete. I guess i will try to edit the xml-file and see if I can can get this WU to 100%,

Xaverius,
Your estimate is consistent with what many of us are/have been experiencing.
If you received your WU as a replacement of one of the three in error in your post, then it has been sent after the 25th and its max_flops should already be correct (one digit followed by 15 zeroes).

Happy and patient crunching. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 3, 2009 7:26:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Trop tard / Too late

TBird,
Don't worry about the "too late" status. The deadlines do not really matter for this particular beta test, at least as long as WUs are returned within 14 days. Also I suspect that "too late" is used to flag some groups of WUs as "do not send any more copy" sometimes.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 3, 2009 7:32:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

What apparently did impact me though, was that on 4/27, the day the monster downloaded, three other WUs erred out. The WUs are detailed below the info on the error code. The last line of the error code repeats for several pages. If the error codes are useful, I'll be glad to grab them and post them, but they'll probably reach to the moon and back. Just noticed the 4th WU was from 4/26, so not sure if it's related.
.....
---(The four WUs which ERRed out)----
X0000097580244200802271202_ 0-- jeanne-PC Error 4/27/09 05:44:39 4/27/09 13:22:43 0.30 5.0 / 0.0
X0000097570518200803251243_ 0-- jeanne-PC Error 4/27/09 04:20:14 4/27/09 13:22:43 1.20 20.1 / 0.0
X0000097570843200803041046_ 1-- jeanne-PC Error 4/27/09 02:29:20 4/27/09 13:22:43 2.25 37.5 / 0.0
E000564_ 477A_ 006w7o00j_ 0-- jeanne-PC Error 4/26/09 22:49:36 4/27/09 13:22:43 4.74 79.0 / 0.0

Jeanne,
I don't think that the problem of these four WUs erroring at the same time has anything to do with your receiving your monster beta WU. If you want somebody to handle this incident specifically you should start a separate thread in the "Boinc Agent Support" forum (since it is not related to a single project). Here it will be ignored most probably and it will get buried rather quickly.

Regarding your beta WU it should be in its last day now. smile

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 3, 2009 7:43:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TBirdTheYuri
Advanced Cruncher
France
Joined: Mar 5, 2006
Post Count: 115
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
smile Re: Trop tard / Too late

Oh je ne m'inquiète pas pour cela, j'en suis à presque 100 unités Beta dans mon historique, je commence à les connaître. wink
C'est juste que je constatais que malgré l'upload avant la deadline elle est apparue en "trop tard". Avant toutes les unités Beta qui affichaient "Trop tard" avait été uploadées après la deadline, ce qui était logique. angel
Ok pour la suite de l'explication. biggrin
---
Oh I do not worry for that, I'm almost 100 Beta units in my history, I begin to know them. wink
It's just that I realized that despite the upload before the deadline she appeared in "too late". Before all units that showed Beta "Too late" was uploaded after the deadline, which was logical. angel
Ok for the rest of the explanation. biggrin
[May 3, 2009 8:02:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TimAndHedy
Senior Cruncher
Joined: Jan 27, 2009
Post Count: 267
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

These long ones sure mess up the estimated completion times. The algorithm for it looks like it needs some work.

One out of sync unit moved it up 100 hours but after somewhere around 50 units running at 2.5 hours its still at 45 hours.
[May 3, 2009 8:06:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 140   Pages: 14   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread