World Community Grid - View Thread - Lost points from aborted WU (Work Unit)

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: Lost points from aborted WU (Work Unit)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 17

[ ]

Author

This topic has been viewed 2161 times and has 16 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Lost points from aborted WU (Work Unit)

;
My machine was steadily crunching on one slot/WU (as reflected in the "Cache" tab of the UDMonitor software), when the slot/WU aborted.

I am curious:
- What causes such abortion of WU?
- What can one do to prevent such instances from happening?

Also, the at least 12 hours of work on the slot/WU just went down the drain judging from the no change in points when I connected to have a new WU downloaded. What a waste, and such waste blows the air out of my enthusiam in participating in the grid!@#

Someone from WCGrid please correct this situation! If the abortion of the WU cannot be avoided, at least have the points credited to the cruncher!
;

[Jan 22, 2006 6:17:52 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU

Unfortunate.

I don't think the UD client has enough diagnostic information to ever determine exactly what happened. Maybe one of the techs can be more precise.

Anyway, I can't see any fair way of awarding points for incomplete work units. After all, it's no use to the scientists.

Some encouragement though: this situation is very rare, and it won't impact your score in the long run.

[Jan 22, 2006 8:45:10 PM]

Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

45 day badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

1 year badge for Microbiome Immunity Project

5 year badge for OpenPandemics - COVID-19


Re: Lost points from aborted WU (Work Unit)

Please use the "contact us" link at the bottom of the home page and please send us your device ID and the approximate time this device was working on the aborted workunit.

This is highly unusual and most of the time the machine either ran out of virtual memory or some hardware error (bad bits in RAM or heat problem) occurred. However, it could be a rare workunit that might actually cause a problem.

What is your virtual memory limit set to?

----------------------------------------

Rick Alther
Former World Community Grid Developer

[Jan 22, 2006 11:54:27 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU (Work Unit)

Alther...

OK, I just got your post and I'll work on what you wanted to know. I do not want to answer right now as I want to take the time to check on the data. I think I need to be exact and precise as to what I shall give you later...

Stay tuned...

[Jan 23, 2006 11:48:49 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU

Didactylos...

Thanks for taking the time; we cruchers have a passion in what we do here. Competition improves the breed, so I will challenge your ideas...

"
Anyway, I can't see any fair way of awarding points for incomplete work units. After all, it's no use to the scientists.
"

Well, to that I say:

try this...

Be prepared to receive zero dollars on your paycheck if after a day's work you were not able to achieve the objectives set forth for that day. After all, it has no use to everyone, not to you, not to your boss.

[Jan 23, 2006 11:58:44 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU (Work Unit)

Alther...

I just tried to send you some data regarding your request.

However, I got the following message after clicking the SEND button:

[ServletException in:/contact_us.jsp] Exception thrown by getter for property problemTypeList of bean org.apache.struts.taglib.html.BEAN'

Please inform me if you got the message.

[Jan 24, 2006 11:09:33 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU

Be prepared to receive zero dollars on your paycheck if after a day's work you were not able to achieve the objectives set forth for that day. After all, it has no use to everyone, not to you, not to your boss.

The key word in my post was "fair". It would be nice if we could be compensated for our time when things go wrong, but the very fact that something went wrong means that any resulting metadata is suspect at best, and most likely meaningless.

Your analogy doesn't fly. Perhaps you can get closer if you consider a job that pays a commission, but since WCG is a voluntary thing a completely different analogy may work better. And to take your analogy a little further (perhaps too far) - if you keep up that performance, you will be fired.

Don't forget - you get points for all the negative results you send back, as well as the positive leads you give the scientists.

It's all good.... :-)

[Jan 25, 2006 3:53:44 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU

Didactylos...

You are taking a very mechanical view of things here and more importantly is missing a key point.

First: there was actual output.
It was not as if I wanted points in return for nothing. There was actual work done and even if that is valued to a naught, there was actual output as a result of that work. Only, that result was lost, and it was lost to none of my fault; it was lost because the UDMonitor software or something else screwed up. Up to a point, things were going fine; it was not as if things went wrong and I was not able to produce any output and now I am trying to get points from that. No sir, none of those. And it was not as if I had garbage results and try to get points out of it nonetheless. No sir again, none of those.

Second:
You and I know we are not working for dollars here, but ultimately there is something we want in return. For me, it is the points, for others - a pat in the back, or perhaps some cheering every now and then. All along we do it for a noble cause: find cure for diseases. Now, to try and subvert all those 'intangibles' into what I see as your tendency to view the world in a very mechanical no-money-no-honey if-you-don't-have-the-meat-then-get-out-of-here kind of arrangement, strikes me as out-of-line with what I see as the nature of what drives participants in this grid to contribute computing power--the argument for volunteerism.

As to the analogy that I used, I did not think you would read meaning literally. I used the money thing as a symbol to portray the impersonal, the mechanical, the material; I could have used the intangibles (like, if your girlfriend leaves you.., and those kinds of things) but it probably would not strike a chord with you; not with what I understand of your way of thinking.

[Jan 26, 2006 9:38:26 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Lost points from aborted WU

You are taking a very mechanical view of things here and more importantly is missing a key point.

I would like to pursue the mechanical view for a moment. There are two recurring things that come to mind that can cause the loss of points.

(1) excessive overclocking. Back off until things are stable. The projects give the machine a workout. What overclocks successfully elsewhere may not work correctly here.

(2) bad memory. Run a memtest86+ memory test to completion with no errors.

The point loss could still be from something else, but we would like to eliminate these two possibilities.

----------------------------------------
[Edit 3 times, last edit by Former Member at Jan 27, 2006 2:30:21 PM]

[Jan 27, 2006 2:27:59 PM]

Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

180 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

180 day badge for Influenza Antiviral Drug Search

14 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

180 day badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Lost points from aborted WU (Work Unit)

andzgrid wrote:

I am curious:
- What causes such abortion of WU?
- What can one do to prevent such instances from happening?

I looked through the logs at the servers and could only find one entry for you which indicates something about what has happened on your machine. One of the agent files was altered by something other than the agent. The agent reacts by discarding everything and going to the server for all new files and fresh work unit. The file may have been damaged by a virus, bad disk sector, memory error, some device driver wrting to the wrong place in memory while the agent was also checking the file, or it could have been some problem with UD monitor and the way you are using it. So, at a minimum, you might want to run a full virus scan of your system just to be on the safe side. Also, a disk scan for surface errors might be useful.

Other crashes may or may not return anything to the server. In your case there were none that came back with anything. Crashes can happen from a large number of reasons, possibly due to any number of other software or hardware components on your system. However the most common are probably:

- Running out of total virtual memory because too much was allocated by all of the running applications on your machine, or that one or more of them had a memory leak. The cure is to run fewer applications or increase the maximum virtual memory limit.

- The DEP setting in XP SP2 or later, and certain firewall / antivirus products which attempt to prevent stack overflows can cause a crash.

- Running out of free disk space or losing permissions to write to the World Community Grid install directory can crash the agent.

- Deliberately ending one of the agent processes using task manager or similar tool

- Surprisingly often there might be a subtle hardware problem and sometimes only in the floating point portion of the processor, which is normally not used in common applicaitons. A thorough CPU test might find the problem. Increased CPU heat can make latent problems show up too.

- Memory errors can also cause unpredictable crashes. Try a thorough memory tester.

- Tampering with the agent and its data in some manner.

- Probably some other factors that I am forgetting right now.

There are faq entries about the above topics as well as quite a few forum threads.

[Jan 27, 2006 9:14:28 PM]

[ ]