| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 5
|
|
| Author |
|
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges:
|
We have noticed that on Boinc and UD agents, a few of the HPF2 work units are aborting early. We are investigating the cause of this. No action on the part of members is required. Thanks for your patience.
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We have released version 5.07 of the code for the Human Proteome Folding - Phase 2 project on BOINC. This version should resolve a number of the problems members have been experiencing. In particular it should significantly reduce the occurance of the exit code 1282 and and exit code -1073741819 errors.
BOINC users will automatically recieve the new version when the client connects to the server to download new workunits. If you wish to get the new version of the application immediately, then you can reset the project (open the BOINC Manager, go to the Projects tab, select 'World Community Grid', and then click on the 'Reset Project' button). We apologize for these problems and appreciate your patience and support while we resolve them. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
This release is working as expected. We have recieved 4330 results back that were run using the 5.07 release of Human Protoeme Folding - Phase 2. We have experienced no 1282 errors and only a few of the '-1073741819' errors. The -1073741819 errors that are occuring are on machines that are 'cyclers'. These machines typically have some problem that causes them to run the project incorrectly.
One item that users should be aware of. The update from 5.06 -> 5.07 has caused floating point values to be computed with a slightly different value. This has no effect on the scientific value of the data computed. However, it does mean that results returned by the 5.06 application will not compare with the a result returned by a 5.07 application. This means that we will experience a period where there is a higher then normal number of inconclusive and invalid results statuses assigned to results returned by the users. Users who recieve an invalid result will still recieve credit for their work. They will get full credit for the result and the time they spend working on the result. They will be awarded either the canonical credit or the claimed credit, whichever is less. We apologize for the inconvience and we appreciate your continued support and contribution to our efforts. |
||
|
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges:
|
A new version of Rosetta is being used starting today for the UD agents. It should behave better with regard to the throttle settings, but some more work is forthcoming on this. Also, it used a newer version of the compiler and larger stack size which seems to have reduced the incidence of aborted work units in our tests. Your agent will automatically download the updated code. The first time it communicates with the agent, it will take somewhat longer to download the updated Rosetta code. After that, work unit downloads will resume their normal size. Sorry for any inconvenience and your patience.
|
||
|
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges:
|
There have been various posts in the forums about long running, seemingly stuck, HPF2 work units, work units that quit early, and ones for which different agents get divergent answers. Most of the work units seem to be processing normally and are completing properly. But, we know that there are a few work units, which behave in unusual ways. There are different causes for this. For ones that seem stuck for a long time, the Rosetta program is probably trying to figure out if they are non-converging or not. Ones that quit early are probably subject to a subtle bug in Rosetta. To figure out how best to handle and fix these work units, we need to identify them so that we can do further testing and debugging on them. Instead of terminating problem work units, it would be useful to the tech team if the members identified the particular agent running the work unit (for example using the UD device ID number on the preferences window of the agent (checkmark icon)) and the UTC time and date at which it was running. We have asked the community advisors to help us collect information about these work units so we can use them in our investigations. We are unable to find all such unusual work units in our testing prior to launch because they are relatively rare. On the production grid, we process a tremendous amount of work each day and thus very subtle problems reveal themselves. Members who call attention to specific unusual work units will be doing a great favor to us. Our behind-the-scenes testing of problem work units is very time consuming. So if members simply let these unusual work units finish, we will be able to tell more about what was going on instead of losing that information.
We will probably be making some changes in Rosetta to speed up the detection of non-convergent work units, making the progress bar show finer progress increments or use some other means to show if the work unit is "stuck" or not. Finally, there seems to be a subtle bug, which aborts a few work units. Some of these work units have to run a long time to get to the point where the problem occurs and shortcuts seem to hide the bug in some cases. So the testing and debugging of these requires a lot of time. Please be patient with us as we take care of these problems. Furthermore, our team is extra busy, divided on project work, getting an additional research project ready for launch very soon. So, thank you for your patience and assistance. |
||
|
|
|