Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 17
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
;
My machine was steadily crunching on one slot/WU (as reflected in the "Cache" tab of the UDMonitor software), when the slot/WU aborted. I am curious: - What causes such abortion of WU? - What can one do to prevent such instances from happening? Also, the at least 12 hours of work on the slot/WU just went down the drain judging from the no change in points when I connected to have a new WU downloaded. What a waste, and such waste blows the air out of my enthusiam in participating in the grid!@# Someone from WCGrid please correct this situation! If the abortion of the WU cannot be avoided, at least have the points credited to the cruncher! ; |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Unfortunate.
I don't think the UD client has enough diagnostic information to ever determine exactly what happened. Maybe one of the techs can be more precise. Anyway, I can't see any fair way of awarding points for incomplete work units. After all, it's no use to the scientists. Some encouragement though: this situation is very rare, and it won't impact your score in the long run. |
||
|
Alther
Former World Community Grid Tech United States of America Joined: Sep 30, 2004 Post Count: 414 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please use the "contact us" link at the bottom of the home page and please send us your device ID and the approximate time this device was working on the aborted workunit.
----------------------------------------This is highly unusual and most of the time the machine either ran out of virtual memory or some hardware error (bad bits in RAM or heat problem) occurred. However, it could be a rare workunit that might actually cause a problem. What is your virtual memory limit set to?
Rick Alther
Former World Community Grid Developer |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Alther...
OK, I just got your post and I'll work on what you wanted to know. I do not want to answer right now as I want to take the time to check on the data. I think I need to be exact and precise as to what I shall give you later... Stay tuned... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Didactylos...
Thanks for taking the time; we cruchers have a passion in what we do here. Competition improves the breed, so I will challenge your ideas... " Anyway, I can't see any fair way of awarding points for incomplete work units. After all, it's no use to the scientists. " Well, to that I say: try this... Be prepared to receive zero dollars on your paycheck if after a day's work you were not able to achieve the objectives set forth for that day. After all, it has no use to everyone, not to you, not to your boss. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Alther...
I just tried to send you some data regarding your request. However, I got the following message after clicking the SEND button: [ServletException in:/contact_us.jsp] Exception thrown by getter for property problemTypeList of bean org.apache.struts.taglib.html.BEAN' Please inform me if you got the message. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Be prepared to receive zero dollars on your paycheck if after a day's work you were not able to achieve the objectives set forth for that day. After all, it has no use to everyone, not to you, not to your boss. The key word in my post was "fair". It would be nice if we could be compensated for our time when things go wrong, but the very fact that something went wrong means that any resulting metadata is suspect at best, and most likely meaningless. Your analogy doesn't fly. Perhaps you can get closer if you consider a job that pays a commission, but since WCG is a voluntary thing a completely different analogy may work better. And to take your analogy a little further (perhaps too far) - if you keep up that performance, you will be fired. Don't forget - you get points for all the negative results you send back, as well as the positive leads you give the scientists. It's all good.... :-) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Didactylos...
The key word in my post was "fair". It would be nice if we could be compensated for our time when things go wrong, but the very fact that something went wrong means that any resulting metadata is suspect at best, and most likely meaningless. You are taking a very mechanical view of things here and more importantly is missing a key point. First: there was actual output. It was not as if I wanted points in return for nothing. There was actual work done and even if that is valued to a naught, there was actual output as a result of that work. Only, that result was lost, and it was lost to none of my fault; it was lost because the UDMonitor software or something else screwed up. Up to a point, things were going fine; it was not as if things went wrong and I was not able to produce any output and now I am trying to get points from that. No sir, none of those. And it was not as if I had garbage results and try to get points out of it nonetheless. No sir again, none of those. Second: You and I know we are not working for dollars here, but ultimately there is something we want in return. For me, it is the points, for others - a pat in the back, or perhaps some cheering every now and then. All along we do it for a noble cause: find cure for diseases. Now, to try and subvert all those 'intangibles' into what I see as your tendency to view the world in a very mechanical no-money-no-honey if-you-don't-have-the-meat-then-get-out-of-here kind of arrangement, strikes me as out-of-line with what I see as the nature of what drives participants in this grid to contribute computing power--the argument for volunteerism. As to the analogy that I used, I did not think you would read meaning literally. I used the money thing as a symbol to portray the impersonal, the mechanical, the material; I could have used the intangibles (like, if your girlfriend leaves you.., and those kinds of things) but it probably would not strike a chord with you; not with what I understand of your way of thinking. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You are taking a very mechanical view of things here and more importantly is missing a key point. I would like to pursue the mechanical view for a moment. There are two recurring things that come to mind that can cause the loss of points.(1) excessive overclocking. Back off until things are stable. The projects give the machine a workout. What overclocks successfully elsewhere may not work correctly here. (2) bad memory. Run a memtest86+ memory test to completion with no errors. The point loss could still be from something else, but we would like to eliminate these two possibilities. [Edit 3 times, last edit by Former Member at Jan 27, 2006 2:30:21 PM] |
||
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
andzgrid wrote:
I am curious: - What causes such abortion of WU? - What can one do to prevent such instances from happening? I looked through the logs at the servers and could only find one entry for you which indicates something about what has happened on your machine. One of the agent files was altered by something other than the agent. The agent reacts by discarding everything and going to the server for all new files and fresh work unit. The file may have been damaged by a virus, bad disk sector, memory error, some device driver wrting to the wrong place in memory while the agent was also checking the file, or it could have been some problem with UD monitor and the way you are using it. So, at a minimum, you might want to run a full virus scan of your system just to be on the safe side. Also, a disk scan for surface errors might be useful. Other crashes may or may not return anything to the server. In your case there were none that came back with anything. Crashes can happen from a large number of reasons, possibly due to any number of other software or hardware components on your system. However the most common are probably: - Running out of total virtual memory because too much was allocated by all of the running applications on your machine, or that one or more of them had a memory leak. The cure is to run fewer applications or increase the maximum virtual memory limit. - The DEP setting in XP SP2 or later, and certain firewall / antivirus products which attempt to prevent stack overflows can cause a crash. - Running out of free disk space or losing permissions to write to the World Community Grid install directory can crash the agent. - Deliberately ending one of the agent processes using task manager or similar tool - Surprisingly often there might be a subtle hardware problem and sometimes only in the floating point portion of the processor, which is normally not used in common applicaitons. A thorough CPU test might find the problem. Increased CPU heat can make latent problems show up too. - Memory errors can also cause unpredictable crashes. Try a thorough memory tester. - Tampering with the agent and its data in some manner. - Probably some other factors that I am forgetting right now. There are faq entries about the above topics as well as quite a few forum threads. |
||
|
|
![]() |