Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 31
Posts: 31   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5730 times and has 30 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

@Crystal Pellet.

Good question: that was my first thought 2.

I'm just running BOINC for a few days so i have no good ideas on troubleshooting it.. i'll need some time to figure this client out... It's rather different from the grid clients i've used before or am using.

I did have some probs with daemon tools lite 4.30.1 (still using it) with regards to confusing some (silly) games in this area. i.e. Some games search for the cd on the physical dvd-drive; sometimes (?) when i ignore the error message they find the iso mounted on the damon tools drive.

(i made these iso's myself from my own original disks to speed up the games and prevent the noise of the dvd drive... some of those iso's served me well for several years without any probs on daemon)

Just last week i had a game refusing to install because the "wrong volume (was) in (the) drive" surprising because there was only 1 volume. Switched off daemon tools and the game installed(?)

Any thoughts, pointers would be highly appreciated... As soon as i can at least pinpoint the cause i can solve the prob ... but for now i'm completely in the dark
(makes me humble after all these years of using computers... probably a beneficiary experience... but i'm not amused wink )
[Dec 20, 2008 10:39:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

We have just started the Clean Energy Project (CEP) and already we have identified a number of bugs. I will list them here so people will be aware of what is going on.

1) Some work units error out on every machine they are sent to. Apparently this can happen at any stage of progress. The project scientists are looking into this.

2) Sometimes a work unit will end early but return a non-error result code. This can cause anybody else working on the same work unit to get a very low number of points. Again, everybody is looking into this.
[Added] Sekerob suggests that I include a sample line from the error file on the Results Status page for one of these non-error errors:
'[ERROR] Failed to open either source or destination files while copying wcgrestart.rst to ../../projects/www.worldcommunitygrid.org/E000042_595A . . . . . . . .'

3) The CHARMM molecular mechanics package developed by the Karplus group at Harvard University and used by CEP is an enormous Fortran program, much larger than the HCC program. Just like the original HCC program, it is running a very large number of page faults under 'some' circumstances. Running more than one instance on a computer appears to exacerbate this problem. But because the program is so large, tracking these inefficiencies down and trying to correct them is probably going to be a last-priority issue for a while.

4) Figuring out work unit length is not easy. It seems to be very variable. Perhaps we shall have a blinding flash of insight that shows us how to determine the length of a work unit. Perhaps not.

Lawrence


Have all four of these problems been solved now?
[Jan 25, 2009 4:49:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

We have just started the Clean Energy Project (CEP) and already we have identified a number of bugs. I will list them here so people will be aware of what is going on.

1) Some work units error out on every machine they are sent to. Apparently this can happen at any stage of progress. The project scientists are looking into this.

Yes this has been fixed.


2) Sometimes a work unit will end early but return a non-error result code. This can cause anybody else working on the same work unit to get a very low number of points. Again, everybody is looking into this.
[Added] Sekerob suggests that I include a sample line from the error file on the Results Status page for one of these non-error errors:
'[ERROR] Failed to open either source or destination files while copying wcgrestart.rst to ../../projects/www.worldcommunitygrid.org/E000042_595A . . . . . . . .'

3) The CHARMM molecular mechanics package developed by the Karplus group at Harvard University and used by CEP is an enormous Fortran program, much larger than the HCC program. Just like the original HCC program, it is running a very large number of page faults under 'some' circumstances. Running more than one instance on a computer appears to exacerbate this problem. But because the program is so large, tracking these inefficiencies down and trying to correct them is probably going to be a last-priority issue for a while.

The scientists have a better understanding. See www.worldcommunitygrid.org/forums/wcg/viewthread?thread=24262

4) Figuring out work unit length is not easy. It seems to be very variable. Perhaps we shall have a blinding flash of insight that shows us how to determine the length of a work unit. Perhaps not.

Not yet.


Lawrence


Have all four of these problems been solved now?

[Jan 25, 2009 5:58:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

Hi,
Just had this workunit end with a compute error:

2/1/2009 6:46:46 AM|World Community Grid|Computation for task E000278_290A_001l2c004_3 finished
2/1/2009 6:46:46 AM|World Community Grid|Output file E000278_290A_001l2c004_3_2 for task E000278_290A_001l2c004_3 absent
2/1/2009 6:46:46 AM|World Community Grid|Output file E000278_290A_001l2c004_3_3 for task E000278_290A_001l2c004_3 absent

Hope this is of some use.
[Feb 1, 2009 12:12:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

I am not sure this is really fixed. Everone who processed this workunit recieved an error and lost processing time.

Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
E000231_ 526A_ 001g0k00s_ 6-- ASUS-i7-965 Error 1/30/09 22:50:51 1/31/09 02:42:11 3.81 92.6 / 0.0

Result Log

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
Calling initGraphics()
INFO: No state to restore. Start from the beginning.

</stderr_txt>
]]>
----------------------------------------



[Feb 2, 2009 3:38:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
p3nguin53
Advanced Cruncher
USA
Joined: Dec 8, 2008
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

I have a WU that jumped from around 20% (maybe lower, not sure) done to suddenly 100%. It uploaded and validated OK but there are error messages in the Result Log:

E000302_ 190A_ 001u0n006_ 1-- Valid 2/2/09 04:34:20 2/2/09 10:45:26 1.93 37.1 / 36.4
E000302_ 190A_ 001u0n006_ 0-- Valid 2/2/09 04:34:13 2/2/09 22:23:34 2.14 35.8 / 36.4 <------- MINE

Result Log:
<core_client_version>6.2.28</core_client_version>
<![CDATA[
<stderr_txt>
Calling initGraphics()
INFO: No state to restore. Start from the beginning.
[ERROR] Failed to open either source or destination files while copying wcgrestart.rst to ../../projects/www.worldcommunitygrid.org/E000302_190A_001u0n006_0_3. Error: 2
####Message = NORMAL STOP
called boinc_finish

</stderr_txt>
]]>
[Feb 2, 2009 10:38:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

There's an unannounced Window beta, version 6.28 running. Don't know the targeted fixes, but yours is one where it's known that premature ending results do not generate an error code flag for the validator. Probably both results incurred it at the same moment making them to actually "appear" as a "valid" quorum, when they are in fact incomplete.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 2, 2009 10:43:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
p3nguin53
Advanced Cruncher
USA
Joined: Dec 8, 2008
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

Thanks for the feedback.

I noticed I have another WU waiting to validate that has the same error. This one ran much longer. I'll watch to see what happens when the other copy reports back.

E000293_ 000A_ 001t0e00e_ 1-- Pending Validation 2/1/09 02:25:58 2/2/09 04:34:13 6.86 114.8 / 0.0
E000293_ 000A_ 001t0e00e_ 0-- In Progress 2/1/09 02:23:57 2/13/09 02:23:57 0.00 0.0 / 0.0


<core_client_version>6.2.28</core_client_version>
<![CDATA[
<stderr_txt>
Calling initGraphics()
INFO: No state to restore. Start from the beginning.
[ERROR] Failed to open either source or destination files while copying wcgrestart.rst to ../../projects/www.worldcommunitygrid.org/E000293_000A_001t0e00e_1_3. Error: 2
####Message = NORMAL STOP
called boinc_finish

</stderr_txt>
]]>
[Feb 2, 2009 10:57:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

Here is another Error, with a loss of 5.62 hours of processing, and again it looks like everyone processing this WU got an error, so it isn't me.

Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
E000284_ 830A_ 001s0t00y_ 4-- ASUS-i7-965 Error 2/2/09 22:17:38 2/3/09 04:00:56 5.62 137.1 / 0.0


Result Log

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
Calling initGraphics()
INFO: No state to restore. Start from the beginning.

</stderr_txt>

HERE IS EVERYONE ELSES RESULTS

Project Name: The Clean Energy Project
Created: 1/29/09
Name: E000284_830A_001s0t00y
Minimum Quorum: 2
Initial Replication: 2

Result Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
E000284_ 830A_ 001s0t00y_ 4-- Error 2/2/09 22:17:38 2/3/09 04:00:56 5.62 137.1 / 0.0
E000284_ 830A_ 001s0t00y_ 3-- Error 2/2/09 04:57:44 2/2/09 21:49:36 5.96 97.5 / 0.0
E000284_ 830A_ 001s0t00y_ 2-- Error 2/1/09 00:57:40 2/2/09 04:53:23 7.12 99.8 / 0.0
E000284_ 830A_ 001s0t00y_ 1-- In Progress 1/31/09 07:48:56 2/12/09 07:48:56 0.00 0.0 / 0.0
E000284_ 830A_ 001s0t00y_ 0-- Error 1/31/09 07:47:41 2/1/09 00:49:27 5.78 116.8 / 0.0
E000284_ 830A_ 001s0t00y_ 5-- Waiting to be sent — — 0.00 0.0 / 0.0
----------------------------------------



[Feb 3, 2009 4:08:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
p3nguin53
Advanced Cruncher
USA
Joined: Dec 8, 2008
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

I just got the same error as mclaver above on a different WU.

E000278_ 794A_ 001l2o004_ 4-- In Progress 2/3/09 04:11:45 2/7/09 03:14:09 0.00 0.0 / 0.0
E000278_ 794A_ 001l2o004_ 3-- Error 2/2/09 22:21:58 2/3/09 04:07:03 3.24 54.3 / 0.0 <------ MINE
E000278_ 794A_ 001l2o004_ 2-- Error 2/1/09 09:52:16 2/2/09 22:09:37 4.02 49.9 / 0.0
E000278_ 794A_ 001l2o004_ 1-- Error 1/30/09 23:47:39 2/1/09 09:50:21 6.99 54.3 / 0.0
E000278_ 794A_ 001l2o004_ 0-- Pending Validation 1/30/09 23:45:11 2/2/09 04:21:32 10.69 186.9 / 0.0

Result Log:
<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
Calling initGraphics()
INFO: No state to restore. Start from the beginning.

</stderr_txt>
]]>

Client Messages:
2/2/2009 10:05:59 PM|World Community Grid|Computation for task E000278_794A_001l2o004_3 finished
2/2/2009 10:05:59 PM|World Community Grid|Output file E000278_794A_001l2o004_3_2 for task E000278_794A_001l2o004_3 absent
2/2/2009 10:05:59 PM|World Community Grid|Output file E000278_794A_001l2o004_3_3 for task E000278_794A_001l2o004_3 absent
[Feb 3, 2009 4:40:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 31   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread