World Community Grid - View Thread - Error- Finish file present too long

World Community Grid Forums

Category: Active Research

Forum: OpenPandemics - COVID-19 Project

Thread: Error- Finish file present too long

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 11

[ ]

Author

This topic has been viewed 4155 times and has 10 replies

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Error- Finish file present too long

I just put a HPG7 rack mount server into production with 2 AMD 6234 cpu's. They are 12 core so it is running 24 work units at a time. This unit was pretty finicky to finally get running, but I finally succeeded yesterday. It is running exclusively OPN work units. It has returned about 11 valid units, 10 pending validation units, and 13 units which have errored out with the message "Finish file present too long."
Does anyone have any clue why this happens ? I have returned over 100,000 units for this project and not seen this problem on any of my other machines. The OS is Linux Mint 18, the same OS as all my other Linux machines.
If I can not find a solution I will take this machine out of the mix because I don't want to be wasting time returning this many units which are in error. I got the machine for nothing, but maybe there was a reason it was free. I may also try MCM on it to see if it also occurs with that project.
Thanks for any suggestions.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 15, 2020 3:53:46 PM]

Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:

90 day badge for Outsmart Ebola Together

1 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Error- Finish file present too long

Dear Sgt Joe,
on github, I found the following on the topic "finish file too long"

https://github.com/BOINC/boinc/pull/3019

As far as I understand, the timout limit must be increased.

Citation

"When an app finishes, it writes a "finish file",
which ensures the client that the app really finished.

If the app process is still there N seconds after the finish file appears,
the client assumes that something went wrong, and it aborts the job.

Previously N was 10.
This was too small during periods of heavy paging.
I increased it to 300.

It has been pointed out that if the app creates the finish file,
and its output files are present,
it should be treated as successful regardless of whether it exits.
This is probably true, but right now we don't have a mechanism
for killing a job and marking it as success.
The longer timeout makes this moot."

I do not know a real good solution, but would prpose to try and
uncheck the option "leave BOINC in memory when it pauses"

This is only an more or less intelligent guess, on a try and error basis.

All the best in these times.

I think, we should be able to fix this problem, as it is apparently not a new one, but it is
old.
Greetings
M

[Dec 15, 2020 4:10:16 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:


Re: Error- Finish file present too long

Thanks, I will try that.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 15, 2020 4:21:42 PM]

Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:


Re: Error- Finish file present too long

Hello,
additonal Info:

Problem has been deeply discussed here:
https://boinc.bakerlab.org/forum_thread.php?id=13860&postid=95357#95357

It seens, that changes in cache size could help.

Citation:

Linux has its own built-in cache, you just need to set the size. 1 GB of cache and 1/2 hour write-delay should work wonders;
probably half that amount or even less would fix this problem; 5 minutes should be more than enough.
https://lonesysadmin.net/2013/12/22/better-li...rformance-vm-dirty_ratio/

[Dec 15, 2020 6:27:44 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:


Re: Error- Finish file present too long

Thank you. I will investigate.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 15, 2020 8:01:40 PM]

geophi
Advanced Cruncher
U.S.
Joined: Sep 3, 2007
Post Count: 113
Status: Offline
Project Badges:

1 year badge for Help Fight Childhood Cancer

45 day badge for The Clean Energy Project - Phase 2

14 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

14 day badge for Uncovering Genome Mysteries

180 day badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

5 year badge for OpenPandemics - COVID-19


Re: Error- Finish file present too long

I used to occasionally get this message on some climateprediction.net tasks, especially when interrupting them for any reason when heavy disk writes were ongoing. I know in the main boinc support forums (and seti) this error was talked about quite a bit and some newer version of boinc fixed it for me. Since upgrading in April, I've had no problems, no matter how the task was interrupted. The linux version of boinc that has this fix is 7.16.6 https://boinc.berkeley.edu/forum_thread.php?id=13562&postid=97382, which would be in the repository for Ubuntu 20.04 or Linux Mint 20. Or you could run it from the boinc version hosted at berkeley that may run on Mint 18, but certainly runs on 19 and 20. https://boinc.berkeley.edu/dl/boinc_ubuntu_7.16.6_x86_64-pc-linux-gnu.sh

[Dec 15, 2020 8:22:59 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:


Re: Error- Finish file present too long

I changed the mix from all OPN to half OPn and half MCM. There have been no more errors since 12:00 UTC Dec. 15.
Thanks to all for their suggestions.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 16, 2020 3:29:20 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:


Re: Error- Finish file present too long

Well, I did not fix the entire problem. I have gotten the incidence down to about 1 to 2 errors per hundred units. I will do some more tweaking to try to eliminate them entirely.
Once again, thank all for your suggestions.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 18, 2020 9:56:03 PM]

Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 384
Status: Offline
Project Badges:

50 year badge for Mapping Cancer Markers

14 day badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

45 day badge for Africa Rainfall Project


Re: Error- Finish file present too long

The problem is exacerbated by the fact that this is a new machine running just one application which means that you’re likely to have 24 WUs finishing at pretty much the same time.

As the tasks spread out the box will process the output and send it in a smooth flow rather than having the backing up and hanging around.

[Dec 18, 2020 10:35:45 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:


Re: Error- Finish file present too long

The problem is exacerbated by the fact that this is a new machine running just one application which means that you’re likely to have 24 WUs finishing at pretty much the same time.
As the tasks spread out the box will process the output and send it in a smooth flow rather than having the backing up and hanging around.

You may very well have a point. I also had thought I may be saturating my bandwidth as I have 144 threads running through an "N" connection on my range extender. However, the errors were only specific to one machine which had been a bit finicky to set up in the first place. At any rate, with some tweaking of the work unit mix, I seem to have alleviated most if not all of the problem. I am still going to try to optimize the mix a bit more if needed. So far today I have zero errors.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 19, 2020 4:37:52 PM]

[ ]