Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 31
Posts: 31   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5734 times and has 30 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Current CEP bugs / problems

We have just started the Clean Energy Project (CEP) and already we have identified a number of bugs. I will list them here so people will be aware of what is going on.

1) Some work units error out on every machine they are sent to. Apparently this can happen at any stage of progress. The project scientists are looking into this.

2) Sometimes a work unit will end early but return a non-error result code. This can cause anybody else working on the same work unit to get a very low number of points. Again, everybody is looking into this.
[Added] Sekerob suggests that I include a sample line from the error file on the Results Status page for one of these non-error errors:
'[ERROR] Failed to open either source or destination files while copying wcgrestart.rst to ../../projects/www.worldcommunitygrid.org/E000042_595A . . . . . . . .'

3) The CHARMM molecular mechanics package developed by the Karplus group at Harvard University and used by CEP is an enormous Fortran program, much larger than the HCC program. Just like the original HCC program, it is running a very large number of page faults under 'some' circumstances. Running more than one instance on a computer appears to exacerbate this problem. But because the program is so large, tracking these inefficiencies down and trying to correct them is probably going to be a last-priority issue for a while.

4) Figuring out work unit length is not easy. It seems to be very variable. Perhaps we shall have a blinding flash of insight that shows us how to determine the length of a work unit. Perhaps not.

There may be more errors. Reading the complaints posted indicate more possibilities but these 4 problems have been confirmed.

On a personal note, I am running all projects right now to minimize the aggravation caused by a new project that is just starting to get the bugs identified. Some people are running CEP-only, either in order to help locate the bugs quickly or to earn a badge quickly. CEP will be running for a long time since we will not start Phse 2 using Q-CHEM until much later.

Kawrence
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 20, 2008 12:32:43 AM]
[Dec 18, 2008 3:23:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nasher
Veteran Cruncher
USA
Joined: Dec 2, 2005
Post Count: 1423
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

good to hear that we have a while to get our badges

i stoped working these and shifted to the beta's
1) to get my badge
2) to hopefully find the bugs that have been noted as well as others

the biggest problem i have found is im normaly the person with the long running WU that gets short credited by the end too soon bug.
----------------------------------------

[Dec 18, 2008 3:45:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

Lawrence,

Thank you for keeping us updated about these problems. I personally have encountered each of the four problems you mention, and I feel much better knowing they have been identified, prioritized, and will be addressed.

Steve
[Dec 18, 2008 1:43:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

Thank you for the status update Lawrence. I had a unit that had an error on 4 of 5 computers with one still crunching - presumably crunching too little each day sad

JB
[Dec 18, 2008 2:40:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
eviltoad
Senior Cruncher
Australia
Joined: Nov 5, 2005
Post Count: 190
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

The matter of a number of WUs that get stuck in an endless loop and have to be aborted probably needs looking at, as well.
----------------------------------------

[Dec 18, 2008 6:53:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

Thank you Lawrence for let us know. Reading away in the CEP forum I felt that there are others issues more and don't know if are bugs or just simple problems. coffee
----------------------------------------
Cheers ! GIB@ peace coffee
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Dec 18, 2008 11:42:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mito7
Advanced Cruncher
Slovakia
Joined: Oct 12, 2008
Post Count: 58
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

I have problem with one Energy WU. E000030_234A_00041v000_4
It stops at 28.XXX percent and fall back to 27.XXX percent with an error message "Task E000030_234A_00041v000_4 exited with DLL initialization error". That happen 10 times until now. (even after computer restart)

Should I abort it or wait. Two replications finished with error. Deadline is 20.12. Thanks for advice.
----------------------------------------

[Dec 19, 2008 2:14:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

mito7,

I suspect that aborting it would be a wiser action considering that two replications finished with error. No point wasting processor cycles.

JB
[Dec 19, 2008 2:24:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

I have problem with one Energy WU. E000030_234A_00041v000_4
It stops at 28.XXX percent and fall back to 27.XXX percent with an error message "Task E000030_234A_00041v000_4 exited with DLL initialization error". That happen 10 times until now. (even after computer restart)

Should I abort it or wait. Two replications finished with error. Deadline is 20.12. Thanks for advice.


Mito
I have same problem last week and abort mine one. Consider abort it as better way. Don't take more time with this waste WU.

Will be a plus if you, before abort it, please copy the log stats from your My Page and post here, and if you can copy the status of the same WU from yours companions that are running or that ran the replications... it could help Techs for samething.
----------------------------------------
Cheers ! GIB@ peace coffee
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Dec 19, 2008 2:45:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mito7
Advanced Cruncher
Slovakia
Joined: Oct 12, 2008
Post Count: 58
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Current CEP bugs / problems

mito7,

I suspect that aborting it would be a wiser action considering that two replications finished with error. No point wasting processor cycles.

JB


I aborted it. This is my first aborted WU. smile

I hope that scientists soon resolve problems with energy WUs, so that we will continue crunching without problems. wink
----------------------------------------

[Dec 19, 2008 2:49:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 31   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread