Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3119 times and has 17 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Martin,
Thanks for the three links.
We cannot see your results for your first two tasks (MCM1_0219032_6512_0 and MCM1_0219012_6622_0), as they haven't finished yet.

About the "finish file present too long" error: if you search the Internet, a lot can be found.

First off, you are running an old BOINC-client (7.2.47) on an old operating system, probably on an old computer, while your wingmen are showing more recent versions (7.14.2 and 7.24.1). If you can, try Linux, this will make/get recent programs on your old computer running faster.

And now your problem. In general, when a task finishes, the produced data needs to be written to the result file to be uploaded; then all of its files need to be cleaned up so the next task can start. If this doesn't happen fast enough to BOINC's liking on your computer, you will receive this error.

Faster storage makes the error less likely to occur. Also, upgrading to the latest BOINC-version could make it less likely that you will see the error.

Adri
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jun 19, 2024 9:21:48 AM]
[Jun 19, 2024 9:20:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Dear Adri,
there is news in this case:
https://www.worldcommunitygrid.org/contribution/workunit/540896435
and
https://www.worldcommunitygrid.org/contribution/workunit/540896480

have the status "no reply". They are both still running

with a eclapsed time of 63 h 20 minuts and 65 hours 31 minuts

Wingmen have done the units with a valid result in under one hour in one case and
a pending validation result in 1 hour 4 minutes in the other.

Something must have gone wrong with the units on my computer:)

I will abort them.

As the problem seems to be only on my machine, others having valid result, I do not know
what to think
Thanks for your opinion
Bye
Martin
[Jun 21, 2024 5:17:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hi Martin,
If all your other MCM1-tasks are running properly and the two that you mentioned in your former post behave in the slowest fashion like you have ever seen, then it will probably be hard to put your finger on the sore spot where this 'slowness'-problem is concerned.

Remember the point that TigerLily raised: "the difference in time between elapsed and CPU time is likely due to your settings or your computer having other things to do."

Would it help to run MCM1 on only half of the available threads on your computer?

Adri
[Jun 21, 2024 2:34:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 396
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Suggestion ... reboot your machine and see if the problem happens again.

If you're seeing work taking more than a few hours, it's time to either reboot to see if that clears the issue or abort the task.
----------------------------------------

  • i5-10400 (Comet Lake, 6C/12T) @ 2.9 GHz
  • i5-7400 (Kaby Lake, 4C/4T) @ 3.0 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3330 (Ivy Bridge, 4C/4T) @ 3.0 GHz

----------------------------------------
[Edit 6 times, last edit by AgrFan at Jun 21, 2024 10:00:30 PM]
[Jun 21, 2024 9:46:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hello,
to make analysis and search for errors within Workunits I can provide info about Wus with very big difference between eclapsed time and CPU time

https://www.worldcommunitygrid.org/contribution/workunit/544772142

26,44 hours versus 2:12 hours
Please note, that in this case the wingman with WIN 10 also has a difference in
eclapsed versus CPU time
1:42 ecplapsed time versus 7:38 CPU time

It seems, that the reason of the problem is not necessarily my machine, if wingman also has significant differece in eclapsed versus cPU time

Onother WU with differce 61 eclased versus 2:12 is workunit

https://www.worldcommunitygrid.org/contribution/workunit/544772231

Sixty one hours, that is almost three days

I oserved that the uniits stop for a long time at a certain point and than suddenly go on in little time steps

Thanks for analysis.

One Unit is still fighting.

Thank you for help, such long thimes make contributing hard and the machine fights hard too.

Bye
Martin
[Jun 28, 2024 2:46:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Martin,

First up, if you are using "percent of cpu time" to control fan noise or temperature, problems can arise regarding how well [some/all] MCM1 tasks will run. Problems can also arise if you are letting BOINC have access to all CPUs on a machine that is either slow or also doing non-BOINC things!

Unfortunately, your client didn't return anything useful in the stderr log for the WU you referenced (544772231) so it's not possible to see what the task was doing while it was running...

I oserved that the uniits stop for a long time at a certain point and than suddenly go on in little time steps
I'm not sure whether what you are describing is about when a task starts running; if it is, the following might explain what's going on.

When a task starts running, there may be a period where it doesn't pass progress statistics to the BOINC environment.

In some cases that manifests as the progress counting up for a while then [usually] falling back and counting up at a more reasonable rate; in other cases there might appear to be no progress then suddenly it'll leap forwards and start counting... Typically, it gets itself organized at the first checkpoint...

If it keeps restarting for some reason, it may take it a long time to hit a checkpoint that publishes accurate progress...

Without something useful in the returned log, it's a bit difficult to do anything other than guess what's going on. I suspect Adri's remarks about the "age" of the system and BOINC client may go some way towards explaining occasional issues, but without a lot more detail about how your client is configured, we can only speculate :-(

If you are willing to have a look at the BOINC "slot" for a running task to examine the log file being built, you might find something useful -- advice on that should come from a Windows user (which I'm not...)

Sorry I can't be of more help...

Cheers - Al.
[Jun 28, 2024 5:02:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hello Al,
thank you for trying to help

And: No, the stops without obvious reason are inot in the startup phase of the units,

They occour jus somewhre when the task is already runnig for a long time

Here are the logs of one more task,

Difference between CPU time and eclapsed time is 2.13 / 98.26 my unit,
1.42 / 6.92 that of the wingman (duplication)

My log:

https://www.worldcommunitygrid.org/contribution/results/974208390/log

Results log


<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219377_4142.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0219377_4142
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 40
NumberOfGenesInSignatureMin = 40
NumberOfGenesInSignatureMax = 40
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 4076
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.257266
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 142894242


[12:36:27] Initializing
[12:53:43] Running
[12:53:55] EvaluateFitnessOfStartingGeneSignatures 4076
[23:05:13] Writing final output
[23:05:13] Closing Output Stream
[23:05:13] Cleaning up
Result.out = 1540.000000
Run complete, CPU time: 7681.177238
23:05:21 (5768): called boinc_finish(0)

</stderr_txt>
]]>


The log of the wingman (duplicaytoin)

https://www.worldcommunitygrid.org/contribution/results/974208389/log

Results log


<core_client_version>8.0.2</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219377_4142.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0219377_4142
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 40
NumberOfGenesInSignatureMin = 40
NumberOfGenesInSignatureMax = 40
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 4076
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.257266
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 142894242


[06:13:13] Initializing
[06:13:54] Running
[06:13:55] EvaluateFitnessOfStartingGeneSignatures 4076
[13:14:57] Writing final output
[13:14:57] Closing Output Stream
[13:14:57] Cleaning up
Result.out = 1540.000000
Run complete, CPU time: 5103.171875
13:14:57 (7216): called boinc_finish(0)

</stderr_txt>
]]>

Friendly greetings
Martin
[Jun 29, 2024 3:46:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Maybe this has been mentioned before, but you should look at Task Manager to see what else is running and if BOINC is using 100% of at least one core for each instance. I suspect you have some other process(es) which are eating your cpu cycles.

Edit: Or you could have an overheating problem where the machine is throttling itself to maintain correct temperatures.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Jun 29, 2024 1:33:45 PM]
[Jun 29, 2024 1:32:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread