Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 17
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1457 times and has 16 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Lost points from aborted WU (Work Unit)

;
My machine was steadily crunching on one slot/WU (as reflected in the "Cache" tab of the UDMonitor software), when the slot/WU aborted.

I am curious:
- What causes such abortion of WU?
- What can one do to prevent such instances from happening?

Also, the at least 12 hours of work on the slot/WU just went down the drain judging from the no change in points when I connected to have a new WU downloaded. What a waste, and such waste blows the air out of my enthusiam in participating in the grid!@#

Someone from WCGrid please correct this situation! If the abortion of the WU cannot be avoided, at least have the points credited to the cruncher!
;
[Jan 22, 2006 6:17:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU

Unfortunate.

I don't think the UD client has enough diagnostic information to ever determine exactly what happened. Maybe one of the techs can be more precise.

Anyway, I can't see any fair way of awarding points for incomplete work units. After all, it's no use to the scientists.

Some encouragement though: this situation is very rare, and it won't impact your score in the long run.
[Jan 22, 2006 8:45:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU (Work Unit)

Please use the "contact us" link at the bottom of the home page and please send us your device ID and the approximate time this device was working on the aborted workunit.

This is highly unusual and most of the time the machine either ran out of virtual memory or some hardware error (bad bits in RAM or heat problem) occurred. However, it could be a rare workunit that might actually cause a problem.

What is your virtual memory limit set to?
----------------------------------------
Rick Alther
Former World Community Grid Developer
[Jan 22, 2006 11:54:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU (Work Unit)

Alther...

OK, I just got your post and I'll work on what you wanted to know. I do not want to answer right now as I want to take the time to check on the data. I think I need to be exact and precise as to what I shall give you later...

Stay tuned...
[Jan 23, 2006 11:48:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU

Didactylos...

Thanks for taking the time; we cruchers have a passion in what we do here. Competition improves the breed, so I will challenge your ideas...

"
Anyway, I can't see any fair way of awarding points for incomplete work units. After all, it's no use to the scientists.
"

Well, to that I say:

try this...

Be prepared to receive zero dollars on your paycheck if after a day's work you were not able to achieve the objectives set forth for that day. After all, it has no use to everyone, not to you, not to your boss.
[Jan 23, 2006 11:58:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU (Work Unit)

Alther...

I just tried to send you some data regarding your request.

However, I got the following message after clicking the SEND button:

[ServletException in:/contact_us.jsp] Exception thrown by getter for property problemTypeList of bean org.apache.struts.taglib.html.BEAN'

Please inform me if you got the message.
[Jan 24, 2006 11:09:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU

Be prepared to receive zero dollars on your paycheck if after a day's work you were not able to achieve the objectives set forth for that day. After all, it has no use to everyone, not to you, not to your boss.

The key word in my post was "fair". It would be nice if we could be compensated for our time when things go wrong, but the very fact that something went wrong means that any resulting metadata is suspect at best, and most likely meaningless.

Your analogy doesn't fly. Perhaps you can get closer if you consider a job that pays a commission, but since WCG is a voluntary thing a completely different analogy may work better. And to take your analogy a little further (perhaps too far) - if you keep up that performance, you will be fired.

Don't forget - you get points for all the negative results you send back, as well as the positive leads you give the scientists.

It's all good.... :-)
[Jan 25, 2006 3:53:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU

Didactylos...


The key word in my post was "fair". It would be nice if we could be compensated for our time when things go wrong, but the very fact that something went wrong means that any resulting metadata is suspect at best, and most likely meaningless.


You are taking a very mechanical view of things here and more importantly is missing a key point.

First: there was actual output.
It was not as if I wanted points in return for nothing. There was actual work done and even if that is valued to a naught, there was actual output as a result of that work. Only, that result was lost, and it was lost to none of my fault; it was lost because the UDMonitor software or something else screwed up. Up to a point, things were going fine; it was not as if things went wrong and I was not able to produce any output and now I am trying to get points from that. No sir, none of those. And it was not as if I had garbage results and try to get points out of it nonetheless. No sir again, none of those.

Second:
You and I know we are not working for dollars here, but ultimately there is something we want in return. For me, it is the points, for others - a pat in the back, or perhaps some cheering every now and then. All along we do it for a noble cause: find cure for diseases. Now, to try and subvert all those 'intangibles' into what I see as your tendency to view the world in a very mechanical no-money-no-honey if-you-don't-have-the-meat-then-get-out-of-here kind of arrangement, strikes me as out-of-line with what I see as the nature of what drives participants in this grid to contribute computing power--the argument for volunteerism.

As to the analogy that I used, I did not think you would read meaning literally. I used the money thing as a symbol to portray the impersonal, the mechanical, the material; I could have used the intangibles (like, if your girlfriend leaves you.., and those kinds of things) but it probably would not strike a chord with you; not with what I understand of your way of thinking.
[Jan 26, 2006 9:38:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU

You are taking a very mechanical view of things here and more importantly is missing a key point.
I would like to pursue the mechanical view for a moment. There are two recurring things that come to mind that can cause the loss of points.

(1) excessive overclocking. Back off until things are stable. The projects give the machine a workout. What overclocks successfully elsewhere may not work correctly here.

(2) bad memory. Run a memtest86+ memory test to completion with no errors.

The point loss could still be from something else, but we would like to eliminate these two possibilities.
----------------------------------------
[Edit 3 times, last edit by Former Member at Jan 27, 2006 2:30:21 PM]
[Jan 27, 2006 2:27:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lost points from aborted WU (Work Unit)

andzgrid wrote:

I am curious:
- What causes such abortion of WU?
- What can one do to prevent such instances from happening?


I looked through the logs at the servers and could only find one entry for you which indicates something about what has happened on your machine. One of the agent files was altered by something other than the agent. The agent reacts by discarding everything and going to the server for all new files and fresh work unit. The file may have been damaged by a virus, bad disk sector, memory error, some device driver wrting to the wrong place in memory while the agent was also checking the file, or it could have been some problem with UD monitor and the way you are using it. So, at a minimum, you might want to run a full virus scan of your system just to be on the safe side. Also, a disk scan for surface errors might be useful.

Other crashes may or may not return anything to the server. In your case there were none that came back with anything. Crashes can happen from a large number of reasons, possibly due to any number of other software or hardware components on your system. However the most common are probably:

- Running out of total virtual memory because too much was allocated by all of the running applications on your machine, or that one or more of them had a memory leak. The cure is to run fewer applications or increase the maximum virtual memory limit.

- The DEP setting in XP SP2 or later, and certain firewall / antivirus products which attempt to prevent stack overflows can cause a crash.

- Running out of free disk space or losing permissions to write to the World Community Grid install directory can crash the agent.

- Deliberately ending one of the agent processes using task manager or similar tool

- Surprisingly often there might be a subtle hardware problem and sometimes only in the floating point portion of the processor, which is normally not used in common applicaitons. A thorough CPU test might find the problem. Increased CPU heat can make latent problems show up too.

- Memory errors can also cause unpredictable crashes. Try a thorough memory tester.

- Tampering with the agent and its data in some manner.

- Probably some other factors that I am forgetting right now.

There are faq entries about the above topics as well as quite a few forum threads.
[Jan 27, 2006 9:14:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread