| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Martin,
----------------------------------------Thanks for the three links. We cannot see your results for your first two tasks (MCM1_0219032_6512_0 and MCM1_0219012_6622_0), as they haven't finished yet. About the "finish file present too long" error: if you search the Internet, a lot can be found. First off, you are running an old BOINC-client (7.2.47) on an old operating system, probably on an old computer, while your wingmen are showing more recent versions (7.14.2 and 7.24.1). If you can, try Linux, this will make/get recent programs on your old computer running faster. And now your problem. In general, when a task finishes, the produced data needs to be written to the result file to be uploaded; then all of its files need to be cleaned up so the next task can start. If this doesn't happen fast enough to BOINC's liking on your computer, you will receive this error. Faster storage makes the error less likely to occur. Also, upgrading to the latest BOINC-version could make it less likely that you will see the error. Adri [Edit 1 times, last edit by adriverhoef at Jun 19, 2024 9:21:48 AM] |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Dear Adri,
there is news in this case: https://www.worldcommunitygrid.org/contribution/workunit/540896435 and https://www.worldcommunitygrid.org/contribution/workunit/540896480 have the status "no reply". They are both still running with a eclapsed time of 63 h 20 minuts and 65 hours 31 minuts Wingmen have done the units with a valid result in under one hour in one case and a pending validation result in 1 hour 4 minutes in the other. Something must have gone wrong with the units on my computer:) I will abort them. As the problem seems to be only on my machine, others having valid result, I do not know what to think Thanks for your opinion Bye Martin |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Hi Martin,
If all your other MCM1-tasks are running properly and the two that you mentioned in your former post behave in the slowest fashion like you have ever seen, then it will probably be hard to put your finger on the sore spot where this 'slowness'-problem is concerned. Remember the point that TigerLily raised: "the difference in time between elapsed and CPU time is likely due to your settings or your computer having other things to do." Would it help to run MCM1 on only half of the available threads on your computer? Adri |
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 396 Status: Offline Project Badges:
|
Suggestion ... reboot your machine and see if the problem happens again.
----------------------------------------If you're seeing work taking more than a few hours, it's time to either reboot to see if that clears the issue or abort the task.
[Edit 6 times, last edit by AgrFan at Jun 21, 2024 10:00:30 PM] |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Hello,
to make analysis and search for errors within Workunits I can provide info about Wus with very big difference between eclapsed time and CPU time https://www.worldcommunitygrid.org/contribution/workunit/544772142 26,44 hours versus 2:12 hours Please note, that in this case the wingman with WIN 10 also has a difference in eclapsed versus CPU time 1:42 ecplapsed time versus 7:38 CPU time It seems, that the reason of the problem is not necessarily my machine, if wingman also has significant differece in eclapsed versus cPU time Onother WU with differce 61 eclased versus 2:12 is workunit https://www.worldcommunitygrid.org/contribution/workunit/544772231 Sixty one hours, that is almost three days I oserved that the uniits stop for a long time at a certain point and than suddenly go on in little time steps Thanks for analysis. One Unit is still fighting. Thank you for help, such long thimes make contributing hard and the machine fights hard too. Bye Martin |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Martin,
First up, if you are using "percent of cpu time" to control fan noise or temperature, problems can arise regarding how well [some/all] MCM1 tasks will run. Problems can also arise if you are letting BOINC have access to all CPUs on a machine that is either slow or also doing non-BOINC things! Unfortunately, your client didn't return anything useful in the stderr log for the WU you referenced (544772231) so it's not possible to see what the task was doing while it was running... I oserved that the uniits stop for a long time at a certain point and than suddenly go on in little time steps I'm not sure whether what you are describing is about when a task starts running; if it is, the following might explain what's going on.When a task starts running, there may be a period where it doesn't pass progress statistics to the BOINC environment. In some cases that manifests as the progress counting up for a while then [usually] falling back and counting up at a more reasonable rate; in other cases there might appear to be no progress then suddenly it'll leap forwards and start counting... Typically, it gets itself organized at the first checkpoint... If it keeps restarting for some reason, it may take it a long time to hit a checkpoint that publishes accurate progress... Without something useful in the returned log, it's a bit difficult to do anything other than guess what's going on. I suspect Adri's remarks about the "age" of the system and BOINC client may go some way towards explaining occasional issues, but without a lot more detail about how your client is configured, we can only speculate :-( If you are willing to have a look at the BOINC "slot" for a running task to examine the log file being built, you might find something useful -- advice on that should come from a Windows user (which I'm not...) Sorry I can't be of more help... Cheers - Al. |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Hello Al,
thank you for trying to help And: No, the stops without obvious reason are inot in the startup phase of the units, They occour jus somewhre when the task is already runnig for a long time Here are the logs of one more task, Difference between CPU time and eclapsed time is 2.13 / 98.26 my unit, 1.42 / 6.92 that of the wingman (duplication) My log: https://www.worldcommunitygrid.org/contribution/results/974208390/log Results log <core_client_version>7.2.47</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219377_4142.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0219377_4142 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 40 NumberOfGenesInSignatureMin = 40 NumberOfGenesInSignatureMax = 40 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 4076 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.257266 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 142894242 [12:36:27] Initializing [12:53:43] Running [12:53:55] EvaluateFitnessOfStartingGeneSignatures 4076 [23:05:13] Writing final output [23:05:13] Closing Output Stream [23:05:13] Cleaning up Result.out = 1540.000000 Run complete, CPU time: 7681.177238 23:05:21 (5768): called boinc_finish(0) </stderr_txt> ]]> The log of the wingman (duplicaytoin) https://www.worldcommunitygrid.org/contribution/results/974208389/log Results log <core_client_version>8.0.2</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219377_4142.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0219377_4142 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 40 NumberOfGenesInSignatureMin = 40 NumberOfGenesInSignatureMax = 40 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 4076 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.257266 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 142894242 [06:13:13] Initializing [06:13:54] Running [06:13:55] EvaluateFitnessOfStartingGeneSignatures 4076 [13:14:57] Writing final output [13:14:57] Closing Output Stream [13:14:57] Cleaning up Result.out = 1540.000000 Run complete, CPU time: 5103.171875 13:14:57 (7216): called boinc_finish(0) </stderr_txt> ]]> Friendly greetings Martin |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Maybe this has been mentioned before, but you should look at Task Manager to see what else is running and if BOINC is using 100% of at least one core for each instance. I suspect you have some other process(es) which are eating your cpu cycles.
----------------------------------------Edit: Or you could have an overheating problem where the machine is throttling itself to maintain correct temperatures. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Jun 29, 2024 1:33:45 PM] |
||
|
|
|