Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4155 times and has 10 replies Next Thread
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Error- Finish file present too long

I just put a HPG7 rack mount server into production with 2 AMD 6234 cpu's. They are 12 core so it is running 24 work units at a time. This unit was pretty finicky to finally get running, but I finally succeeded yesterday. It is running exclusively OPN work units. It has returned about 11 valid units, 10 pending validation units, and 13 units which have errored out with the message "Finish file present too long."
Does anyone have any clue why this happens ? I have returned over 100,000 units for this project and not seen this problem on any of my other machines. The OS is Linux Mint 18, the same OS as all my other Linux machines.
If I can not find a solution I will take this machine out of the mix because I don't want to be wasting time returning this many units which are in error. I got the machine for nothing, but maybe there was a reason it was free. I may also try MCM on it to see if it also occurs with that project.
Thanks for any suggestions.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 15, 2020 3:53:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

Dear Sgt Joe,
on github, I found the following on the topic "finish file too long"

https://github.com/BOINC/boinc/pull/3019

As far as I understand, the timout limit must be increased.

Citation

"When an app finishes, it writes a "finish file",
which ensures the client that the app really finished.

If the app process is still there N seconds after the finish file appears,
the client assumes that something went wrong, and it aborts the job.

Previously N was 10.
This was too small during periods of heavy paging.
I increased it to 300.

It has been pointed out that if the app creates the finish file,
and its output files are present,
it should be treated as successful regardless of whether it exits.
This is probably true, but right now we don't have a mechanism
for killing a job and marking it as success.
The longer timeout makes this moot."

I do not know a real good solution, but would prpose to try and
uncheck the option "leave BOINC in memory when it pauses"

This is only an more or less intelligent guess, on a try and error basis.

All the best in these times.

I think, we should be able to fix this problem, as it is apparently not a new one, but it is
old.
Greetings
M
[Dec 15, 2020 4:10:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

Thanks, I will try that.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 15, 2020 4:21:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

Hello,
additonal Info:

Problem has been deeply discussed here:
https://boinc.bakerlab.org/forum_thread.php?id=13860&postid=95357#95357

It seens, that changes in cache size could help.

Citation:

Linux has its own built-in cache, you just need to set the size. 1 GB of cache and 1/2 hour write-delay should work wonders;
probably half that amount or even less would fix this problem; 5 minutes should be more than enough.
https://lonesysadmin.net/2013/12/22/better-li...rformance-vm-dirty_ratio/
[Dec 15, 2020 6:27:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

Thank you. I will investigate.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 15, 2020 8:01:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
geophi
Advanced Cruncher
U.S.
Joined: Sep 3, 2007
Post Count: 113
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

I used to occasionally get this message on some climateprediction.net tasks, especially when interrupting them for any reason when heavy disk writes were ongoing. I know in the main boinc support forums (and seti) this error was talked about quite a bit and some newer version of boinc fixed it for me. Since upgrading in April, I've had no problems, no matter how the task was interrupted. The linux version of boinc that has this fix is 7.16.6 https://boinc.berkeley.edu/forum_thread.php?id=13562&postid=97382, which would be in the repository for Ubuntu 20.04 or Linux Mint 20. Or you could run it from the boinc version hosted at berkeley that may run on Mint 18, but certainly runs on 19 and 20. https://boinc.berkeley.edu/dl/boinc_ubuntu_7.16.6_x86_64-pc-linux-gnu.sh
[Dec 15, 2020 8:22:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

I changed the mix from all OPN to half OPn and half MCM. There have been no more errors since 12:00 UTC Dec. 15.
Thanks to all for their suggestions.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 16, 2020 3:29:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

Well, I did not fix the entire problem. I have gotten the incidence down to about 1 to 2 errors per hundred units. I will do some more tweaking to try to eliminate them entirely.
Once again, thank all for your suggestions.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 18, 2020 9:56:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

The problem is exacerbated by the fact that this is a new machine running just one application which means that you’re likely to have 24 WUs finishing at pretty much the same time.

As the tasks spread out the box will process the output and send it in a smooth flow rather than having the backing up and hanging around.
[Dec 18, 2020 10:35:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Error- Finish file present too long

The problem is exacerbated by the fact that this is a new machine running just one application which means that you’re likely to have 24 WUs finishing at pretty much the same time.
As the tasks spread out the box will process the output and send it in a smooth flow rather than having the backing up and hanging around.

You may very well have a point. I also had thought I may be saturating my bandwidth as I have 144 threads running through an "N" connection on my range extender. However, the errors were only specific to one machine which had been a bit finicky to set up in the first place. At any rate, with some tweaking of the work unit mix, I seem to have alleviated most if not all of the problem. I am still going to try to optimize the mix a bit more if needed. So far today I have zero errors.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 19, 2020 4:37:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread