Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 4
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1199 times and has 3 replies Next Thread
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 123
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Explanation of an error unit

Hello friends,
Workunit MCM1_0196328_9192 retorned as an error for me.

My result log is;
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
finish file present too long
</message>
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0196328_9192.txt -DatabaseFile dataset-sarc1.txt
Settings File
DateOfDesign = 20200218
Designer = Krembil/cubes
WorkOrderID = 0196328_9192
DatasetID = sarc1
RSeed = 360009193
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
RunPermutationAlgorithm = 0
FitnessFn = 0
NumberOfGenesInStartingSignature = 20
NumberOfGenesInSignatureMin = 20
NumberOfGenesInSignatureMax = 20
SearchAlgorithmNumberToCreate = 12071
MinFitness = 0.497
VMethod = NFCV
NFolds = 20
SvmArgs = "-v 0 -t 0 -c 1000"
SvmLearnLimit = 250000



[02:59:40] Initializing
[03:00:13] Running
[03:00:13] EvaluateFitnessOfStartingGeneSignatures 12071
[20:08:59] Writing final output
[20:08:59] Closing Output Stream
[20:08:59] Cleaning up
Result.out = 27420.000000
Run complete, CPU time: 6559.779650
20:09:14 (9056): called boinc_finish(0)

</stderr_txt>
]]>

My wingman has the result as Pending validation his result log being


<core_client_version>7.20.2</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0196328_9192.txt -DatabaseFile dataset-sarc1.txt
Settings File
DateOfDesign = 20200218
Designer = Krembil/cubes
WorkOrderID = 0196328_9192
DatasetID = sarc1
RSeed = 360009193
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
RunPermutationAlgorithm = 0
FitnessFn = 0
NumberOfGenesInStartingSignature = 20
NumberOfGenesInSignatureMin = 20
NumberOfGenesInSignatureMax = 20
SearchAlgorithmNumberToCreate = 12071
MinFitness = 0.497
VMethod = NFCV
NFolds = 20
SvmArgs = "-v 0 -t 0 -c 1000"
SvmLearnLimit = 250000



[01:18:33] Initializing
[01:18:38] Running
[01:18:38] EvaluateFitnessOfStartingGeneSignatures 12071
[02:03:25] Writing final output
[02:03:25] Closing Output Stream
[02:03:25] Cleaning up
Result.out = 27420.000000
Run complete, CPU time: 101.406250
02:03:25 (17644): called boinc_finish(0)

</stderr_txt>
]]>


The result out in both cases is identical:

If the result out is the same, both should be counted identicalle

It is true, the calculation took very very long in my case, too long maybe?

Can anyone explain please
Thank you very much

Good bye
[Feb 13, 2023 8:45:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2084
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Explanation of an error unit

Martin Schnellinger, this is your problem:
<message>
finish file present too long
</message>

As to why that happened, is another story.

It's a timing error, BOINC timing out waiting for results, not a problem with the actual results.
From a message posted by Richard Haselgrove:
"finish file present too long" is a 10 second limit hard coded into the BOINC client: when the science is all over and done, the science application should write a file to say that it's completed and BOINC can clean up and report it: then the science application should shut itself down and get out of the way so that BOINC can move on to the next job.


UPDATE: I've found an interesting addition to this, from post 530714, posted by Rickjb, about system responsiveness:
I tried searching the WCG forum for "Finish file too long"
I was interpreting "file too long" as meaning that there were extra data being scribbled onto some WU result file, but Richard Hasselgrove's post at https://boinc.berkeley.edu/dev/forum_thread.php?id=10354&postid=62717 gave me the clue:
"I think the "too long" message refers to any delay between signalling that the app has finished, and the process finally quitting."
Rickjb finishes with a practical case:
For some months I've been running the machine that's now getting errors under Linux x64 from a 16GB USB stick. Until recently, it was a high-speed USB3.0 stick in a USB3.0 port, and system responsiveness including the Xfce GUI was very good. There were no errors crunching WCG.

Then BOINC crashed and would not restart, and I could not reboot Linux. The USB stick has set itself to read-only and it cannot be mounted under either Linux or Windows. I was able to dd the entire 16GB device image to a new USB2.0 stick without any read errors. On another machine I ran fsck -f -y on the newly-copied Linux partition, put the USB2.0 stick into the first machine, booted it and continued. WUs that were in progress restarted and continued successfully. System responsiveness is pretty terrible, and it's generating FFPTL errors, which I guess are happening when BOINC activities are forced to wait for higher-priority Linux filesystem activities.
(FFPTL = finish file present too long)

Also, it happened many years ago to me, too, when my system was getting sluggish and unresponsive, BOINC would generate the error finish file present too long during the last moments of a task (at 100%).

Adri
----------------------------------------
[Edit 3 times, last edit by adriverhoef at Feb 14, 2023 11:09:22 AM]
[Feb 13, 2023 10:23:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7578
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Explanation of an error unit

I concur with Rickjb's findings. I, too, run several systems with 16gb USB drives and usually have not had any problems. However, a couple of times there have been power glitches of one sort or another which have toasted the drives and caused them to be turned into read only bricks.(Not the cause of the error here in question.) The "finish file too long" errors have come in conjunction with systems which became sluggish for unknown reasons, possibly non fatal degradation of the USB drives. Replacing the drive cured that problem. I have suspected this error could also have been caused by an internet connection which had become a bit flaky, but I have no real basis for this other than speculation. The thought was the file was existing for a time period which was too long between the time it was written and the time it should have been transmitted by BOINC.
The good part is this error has not recurred for a considerable length of time.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 13, 2023 11:49:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 869
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Explanation of an error unit

The latest clients allow a longer time for clean-up; I've not had this error in ages, but sometimes I see a wingman with a recent client returning an Error with
Process still present 5 min after writing finish file.
in the error log. I don't see as many of these as I used to see of the shorter duration ones...

I note that Martin Schnellinger is using client 7.2.47; if there's no good reason he can't upgrade, I'd suggest that he gets something newer :-) -- if he can't then it's likely to happen again...

Cheers - Al.
[Feb 14, 2023 8:43:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread