World Community Grid - View Thread - HPF2 Error computing [Closed]

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: HPF2 Error computing [Closed]

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 32

[ ]

Author

This topic has been viewed 3964 times and has 31 replies

CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:

45 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

45 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

180 day badge for Uncovering Genome Mysteries

180 day badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

90 day badge for Microbiome Immunity Project


Re: HPF2 Error computing

As I didn't have enough time when I first asked my question about this last error, I will try and detail what I think happened:
- the job ran and finished succesfully;
- the result was then immediately uploaded (as with all results);
- the PC was scheduled for shutdown and it did just that;
- next morning, after boot up, BOINC manager tried to report the finished task, but the result had already been uploaded so it just shrugged when the server asked for result => BAM! Error!

Question: this scenario looks very plausible to me, is this type of thing possible? If so, is anyone working on fixing that?

Thanks in advance!

----------------------------------------

Knowledge is limited. Imagination encircles the world! - Albert Einstein

[Jan 18, 2011 7:58:09 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: HPF2 Error computing

Implausible as Implausible comes. Between Result file uploading (phase 1) and confirmation reporting of success to the scheduler (phase 2 - Ready to Report clearing) can be days. All depends when the device is connecting to the servers. An FAQ in the Start Here forum explains some of the conditions when phase 2 is undertaken. The latest BOINC version actually has even more variations on reporting than mentioned in the FAQ... some 10 versus 7.

--//--

[Jan 18, 2011 8:21:49 AM]

CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:


Re: HPF2 Error computing

What is implausible?! From what I could see in your reply, on the FAQ link you provided and from my personal experience, it just confirms the fact that between the time of the task being completed (the night before in my case) and the time the BOINC manager reported it (next morning) there's a lag.

So what if the result was sent and the report didn't connect the dots with the server to point in the right direction? Fact is, I ran a task for more than 5 hours and then there was no result to upload. How can you explain that?

----------------------------------------

Knowledge is limited. Imagination encircles the world! - Albert Einstein

----------------------------------------
[Edit 1 times, last edit by CandymanWCG at Jan 18, 2011 8:38:57 AM]

[Jan 18, 2011 8:36:47 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: HPF2 Error computing

Why would that dotting not happen and only for the HPF2 jobs that for some fail to complete? /711 is an execution fail of the science app, nothing at all to do with result file transmission. If there were, there'd be entirely different messages in the client log (stdoutdae.txt file)

The result was uploaded as else there'd be still a "In progress" on the Result Status page. There's so many safeties in the process, that if comms fails, the output is transmitted until completes successfully (client will try for up to 14 days). ONLY THEN, is the Ready to Report scheduler connect made to confirm all has transferred.

--//--

[Jan 18, 2011 8:54:51 AM]

CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:


Re: HPF2 Error computing

I feel we're not on the same page here, but heck, just for the sake of the argument I want to ask you something: why would tasks finish downloading, just to error out in the very next second and only for the HPF2 project? Why is it that only this project has a minimum quorum of 15 and the tasks get sent to 19 different machines? I think I have my answer right in the detailed view of the task: it's because this is a wink

"special"

project...

Cheers!

PS: I'm passed the point of looking for an explanation. Just too sick of it. Whatever will be, will be. Let the projects run. I just hope I don't get anymore situations like this and that the error disappears from my Result status tab...it's such a sore eye! Seems that this forum is run by other users and people from the back end don't share their expert advice with us or try to find a way and help us all. All we can do is sit here and whine about error logs and suppositions. I'd very much like to see a techie go in the server back log and see what the heck happened than sit here and wonder or listen to other educated guesses. Thanks for trying, but to no avail...

----------------------------------------

Knowledge is limited. Imagination encircles the world! - Albert Einstein

----------------------------------------
[Edit 1 times, last edit by CandymanWCG at Jan 18, 2011 9:56:25 AM]

[Jan 18, 2011 9:52:06 AM]

sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

20 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

45 day badge for OpenPandemics - COVID-19


Re: HPF2 Error computing

ERROR:: Exit at: .\nblist.cc line:711
If this error was to do with opening a folder in order to create a file, then the “output file absent” message would make sense, and if you have no file to report you can't report it:
1/16/2011 12:08:01 PM World Community Grid Output file ob337_00040_5_0 for task ob337_00040_5 absent

[Jan 19, 2011 6:16:18 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: HPF2 Error computing

CandymanWCG, you have missed several important points

Secondly, HPF2 has had a very long standing issue of WU's completing (aborting) virtually straight away. The Techs know about this, and have attempted on numerous occasions to try and fix it - although it's one of those issues which is extremely hard to pin down (some times it happens, some times not...). Very frustrating, I know, but thankfully, no time is generally wasted on these WU's. If you get numerous WU's for HPF2 which go the similar way, I'd suggest de-selecting that project from your selection.

You can certainly continue to select this project, but ............ Next you ask a question:

...why would tasks finish downloading, just to error out in the very next second and only for the HPF2 project?

Let me just re-quote the short part of the answer to you (already made):

Secondly, HPF2 has had a very long standing issue of WU's completing (aborting) virtually straight away

As you can see there is a theme here. HPF2 has a history of not running on some machines. There is no solution to the problem. The only choices are to continue to run the project and live with the aborts, or select another project to run. Another question from you:

Why is it that only this project has a minimum quorum of 15 and the tasks get sent to 19 different machines?

The answer is that the scientists, who have responsibility for the project, made this decision. If you think this decision is incorrect, please feel free to contact them. Another question from you:

Fact is, I ran a task for more than 5 hours and then there was no result to upload. How can you explain that?

This is virtually the same question, asked and answered.

Finally, I suggest that you play nice with all of the people who have tried to help you. By the way, there were over 64,000 HPF2 WU validated in the last day, so the project is working. Finally, I have chosen not to run HPF2 as I find the error ratio among my computers is higher that I am willing to accept, although I have validated 1,978 WU. In or out, that is what you get to choose. I can tell you now, you are not going to get the answer you want. That's the way life is sometimes.

[Jan 19, 2011 7:21:42 PM]

CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:


Re: HPF2 Error computing

@skgiven I know I said I don't care about it anymore, but can you please break it down for me? I don't really know what you mean by

If this error was to do with opening a folder in order to create a file

. Does this have to do with the program trying to open a folder on the server or local?

@astrolab Quoting and unquoting, answering and throwing my questions back at me does not help. Maybe you have missed the point, but I was just saying that this project is a bit more peculiar than others. In fact, you really missed some points when you failed to notice my rhetorical questions amongst the real issues that I was questioning about.

I really resent the "play nice" part. What is this? Kindergarten? All I did was express my frustration and maybe you were too busy busting my chops to notice, but I thanked every time someone "bothered" giving me their version of what happened. Also, that "you can just skip this project" alternative is a bunch of BS. That's not what I want. If I wanted that, I would have done it already, it's not like I need somebody's permission or anything. And if this brings me a forum ban, fine, I'll have to live with that, but don't you take that patronizing attitude with me like I don't know which way the pants go on, ok?

In the end, all you have done was just what I said: throwing back answers at me from "past experiences" and other user feedback. I haven't seen anyone in this thread quoting a piece of server log that shows what really happened. That's what I was after. Not suppositions, no matter how well documented they are.

If I choose to complain about it, that is my right (at least until someone decides to ban me for airing out my problems) and it doesn't mean I'm hurting anyone's feelings. So you'll just have to live with that too, I guess.

Peace out!

----------------------------------------

Knowledge is limited. Imagination encircles the world! - Albert Einstein

[Jan 20, 2011 1:03:58 PM]

gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: HPF2 Error computing

CandymanWCG,

In answer to your question

Why is it that only this project has a minimum quorum of 15 and the tasks get sent to 19 different machines?

, the answer is (albeit, slightly hidden - it took me a little while to find it), in the FAQ's here -> Re: Results Status page - HPF2 project

----------------------------------------

[Jan 20, 2011 2:05:35 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: HPF2 Error computing

CandymanWCG: You are new here so we're going to give you some slack.

In the end, all you have done was just what I said: throwing back answers at me from "past experiences" and other user feedback. I haven't seen anyone in this thread quoting a piece of server log that shows what really happened

See, I am pretty sure there is nobody that is going to do this for you. as for 'busting your chops', you are the one who is "sick", "sore", "whine", frustrated, demanding, inflexible and threatening. Your question, which was Why?, was answered. Just admit that you heard the answer and move on.

I'll pass on my opinion on the hierarchy of things and if you read it a few times, it might help you understand how things really work.

The most important group within WCG is the scientists, so their needs are filled first. Everything they want done, gets done first. Since you are not a scientist, do not waste our time by demanding anything. Second comes the system. If a hard drive needs to be swapped out or a Linux upgrade installed, it gets done next. After that comes the Techs and Admin. They have to work weekends to install a Linux upgrade (higher priority), but they get a day off before they do anything for us (lower). Finally come us, the volunteers. What we want done has the very lowest priority. That means that if you as a single volunteer want something, like a server log, you will get your answer after every other single need by the scientists, system and staff is done and after every other higher priority request by the volunteers.

Here is the catch, since there are just enough Techs and Admin to keep up with the needs of the scientists and system, the volunteers get the slimmest piece of the pie. and a server log for a single volunteer, who can easily switch to a different project to make their problem go away, is NEVER going to appear. Like it or not, that is the reality here at WCG. I am not expecting you, or anyone else, to like the way things are, but that does not change reality. and yes, you do have to play nice since that is seen as a common courtesy to others. Welcome to WCG.

[Jan 20, 2011 2:25:25 PM]

[ ]