Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 49
Posts: 49   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5542 times and has 48 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Well, it is the weekend. I expect the errors will be examined bright and early on Monday.

New projects are always going to have some teething troubles, but we should get past it in a couple of days. We have a great team at WCG supporting the project scientists' efforts.
[Jun 25, 2006 9:08:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
debrouxl
Advanced Cruncher
France
Joined: Dec 31, 2004
Post Count: 61
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

observed at one time several hours of 'frozen' percentage.

I did notice that too. I'm running the BOINC client, version 5.4.9, under GNU/Linux on a P4 2.6 GHz with 512 MB of RAM.
I'm posting here because I've just seen a freeze, and this printed in my terminal:
*** glibc detected *** corrupted double-linked list: 0x0984cf90 ***
*** glibc detected *** corrupted double-linked list: 0x09606228 ***

Corrupted double-linked lists are indeed likely to cause a freeze and leave the application in an incorrect state. Closing the BOINC client is the only way to fix the freeze: several days ago, I left the agent without having a look at it for at least 24h, and the particular WU wouldn't de-freeze (same name, same time spent, same percentage).

Freezes did seldom happen with FAAH, but it does happen at nearly every WU switch that shuts down a HPF2 WU. What's more, I had never seen such a message in my terminal before today... These are not proofs that the issue is in the HPF2 application, though.


For now, I have set up my preferences so as NOT to receive HPF2 WUs until this is sorted out, since it happens mostly with HPF2 WUs... I know this temporary measure is not satisfactory, but I'd rather not participate (passed the 1 year mark several weeks ago) than return erroneous results...
----------------------------------------
[Jun 26, 2006 9:34:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Debroux Lionel, let me qualify that freezing comment which can be 3 things i know of and maybe a fourth.

1. The "Freezing of percent" means, that in the case of non linear computing and the WU's being chopped up in little internal segments, it can happen that one segment takes a very long time, thus the percent indicator not progressing....patience required.

2. The BOINCmanager (the front end) loosing contact with the science backend. In my case i've killed it many times. The science continues in the background and killing & restarting BOINC manager AND BOINCmgr.exe only, picks up where the science had progressed. Resolution in my case was adding the ports 443, 1043 and 31406 to the firewall exceptions for the BOINC.exe, BOINCmgr.exe and the science parts (the latter have no exe extension and may be overkill to add).

3. The true freezing. You should be able to see in the Taskmanager if the Science has become non-responsive. The CPU time counter in that case is likely frozen....then you can kill and hope it has not damaged the work unit.....else it gets the completed message send back and awarded the 'error' lable.

4. DonNo

One resolution mentioned was to set in your WCG BOINC profile to retain the WU in memory while pre-empting. Emperically, i have no need for it, but others have.

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 26, 2006 10:06:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
debrouxl
Advanced Cruncher
France
Joined: Dec 31, 2004
Post Count: 61
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Thanks for your reply.


My problem is clearly 3, true freezing ("same name, same time spent, same percentage" in my previous message).
I have hardly ever seen non-linear computing (1) on my 1000+ WUs; no firewall exceptions have ever been required for local ports (1043, 31416) on my GNU/Linux (and 443 - HTTPS - is allowed, obviously) (2).

And those messages:
*** glibc detected *** corrupted double-linked list: ... ***
indicate a bug, they never appeared before and should never appear.


BTW, since I posted my previous message, I had a freeze again... and a new
*** glibc detected *** corrupted double-linked list: ... ***
message in the terminal...
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by DEBROUX Lionel at Jun 26, 2006 1:43:05 PM]
[Jun 26, 2006 1:42:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Hello DEBROUX Lionel,
If you are getting a double-linked list, have you checked your file system on your disk using RUN chkdsk /F ?

Lawrence
[Jun 26, 2006 3:02:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
teletran
Senior Cruncher
Joined: Jul 27, 2005
Post Count: 378
Status: Offline
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Out of 3 work units I have one inconclusive, one error and one valid. Hope we hear something today about the new work units and any problems with them.
----------------------------------------
[Jun 26, 2006 4:18:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
debrouxl
Advanced Cruncher
France
Joined: Dec 31, 2004
Post Count: 61
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

If you are getting a double-linked list, have you checked your file system on your disk using RUN chkdsk /F ?

Good idea.
I just ran fsck.vfat -r -v, since I'm using GNU/Linux, and the BOINC agent is on a FAT32 partition: no problems found.

I have stumbled across only one other application that used to generate "corrupted double-linked list" errors, and it was a bug in the application.
----------------------------------------
[Jun 26, 2006 5:24:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Out of 3 work units I have one inconclusive, one error and one valid. Hope we hear something today about the new work units and any problems with them.


I have had 3 error out on me in the last 2 days
I never had 1 error while I was running FAAH exclusively
I am seriously considering opting out of HPF2 if this continues
[Jun 26, 2006 5:30:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Got 5 inconclusives but no errors.

the log says....
Result Log

<core_client_version>5.4.9</core_client_version>
<stderr_txt>
Failed to open wcg_checkpoint.dat for reading. rc: 2. File doesn't exist?
Failed to open wcg_hpf2.random for reading. rc: 2. File doesn't exist?
Failed to open wcg_hpf2.random for reading. rc: 2. File doesn't exist?
Rosetta finishing with return code: 0

</stderr_txt>


I have switched off HPF2 for now till the techs see what is going on.
[Jun 26, 2006 5:39:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
teletran
Senior Cruncher
Joined: Jul 27, 2005
Post Count: 378
Status: Offline
Reply to this Post  Reply with Quote 
Re: Anybody Else Seeing Inconclusive Status on HPF2 Results?

Graham,
I've already opted out of HPF2 until we hear something. Luckily we have other work here to keep us going :)
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by teletran at Jun 26, 2006 5:48:14 PM]
[Jun 26, 2006 5:47:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 49   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread