| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Hello,
I worry that Workunit MCM1_0218350_8862_0 might give a bad scientific result and ths might be unnoticed, as it has been marked as valdid by BOINC. Their are two reasons for me thinking that the result could be incorrect First: My output log is incomplete and different from that of my wingman (replication 2) My log is as follows: Results log <core_client_version>7.2.47</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0218350_8862.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0218350_8862 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 100 NumberOfGenesInSignatureMin = 100 NumberOfGenesInSignatureMax = 100 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 2820 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.294839 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 132628962 [00:56:01] Initializing [01:13:15] Running [01:13:39] EvaluateFitnessOfStartingGeneSignatures 2820 </stderr_txt> ]]> The log of MCM1_0218350_8862_1 is longer: <core_client_version>7.24.1</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0218350_8862.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0218350_8862 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 100 NumberOfGenesInSignatureMin = 100 NumberOfGenesInSignatureMax = 100 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 2820 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.294839 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 132628962 [12:16:40] Initializing [12:16:45] Running [12:16:45] EvaluateFitnessOfStartingGeneSignatures 2820 [13:39:25] Writing final output [13:39:25] Closing Output Stream [13:39:25] Cleaning up Result.out = 1486.000000 Run complete, CPU time: 4960.343750 13:39:25 (52520): called boinc_finish(0) </stderr_txt> ]]> In other words: In my log, the lines [13:39:25] Writing final output [13:39:25] Closing Output Stream [13:39:25] Cleaning up Result.out = 1486.000000 Run complete, CPU time: 4960.343750 13:39:25 (52520): called boinc_finish(0) are missing. The second reason that makes the workunit somehow "fishy" and strange is that the replicaton calculated by my machine has a huge difference between eclapsed time and CPU time 2.15 CPU time versus 46.25 eclapsed time. 46 hours, nealy two hole days for one workunit, this stands out from the rest of the workunits. Can anyome who has a bit deeper knowledge of the Marking cancer Markers projekt try to give me an explanation for this strange state of this workunit, please? I do not want the scientists to get a bad, incorrect result destroying their work... Thank you MS |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Hi Martin,
If you can, always supply the Workunit-ID, please. Thank you, Adri |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Hello Adi,
well, the WorkOrderID = 0218350_8862 is already included. I could not find any other ID, not even in "Properties" Where can I find the Workunit-ID, please? Thanks for your collaboration Martin |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Hi Martin,
You wrote: "The log of MCM1_0218350_8862_1 is longer". That is a tiny clue for you. If you re-read the posts from 696764 in that thread you'll notice that the name of a workunit is not the same as its ID and without the ID we cannot check anything. Now, about the clue, go to your Results (My Contribution --> Results) on the website, find your workunit with the name MCM1_0218350_8862, and click the name of your task (your task is named MCM1_0218350_8862_0). You will then see all tasks within that workunit. In the location bar of your browser you'll find the URL of that workunit and this is the important part: it contains the workunit-ID. Example: if the URL is https://www.worldcommunitygrid.org/contribution/workunit/510764828, then the workunit-ID is 510764828. Easy! Adri |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Hello Adri,
the URL of the Workunit is as follows: https://www.worldcommunitygrid.org/contribution/workunit/533540742 This means, the Unit has the ID 533540742 Hope that this is correct and the techs can check, whether something is wrong with the WU. Thanks for your advice. Greetings Martin |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Hi Martin,
The only other remarkable and weird thing that I noticed about this case, apart from your short Result Log and the total running time (46ΒΌ hours!), is the time that it took to get from 'Initializing' to 'Running'. As it happens, if we look at your wingman, they are showing (that what may be considered as) a normal startup time: [12:16:40] Initializing(That's 5 seconds.) Your task however took considerably much longer to get to 'Running': [00:56:01] InitializingThat's more than 17 minutes! What can I say? Not much more, I guess. At least you returned your task before the deadline. Since you already supplied both Error Logs, I'll be showing the remaining data that I retrieved from the link that you provided; see below: workunit 533540742 App: Mapping Cancer MarkersDetails: --------------------------------------------------------------------------------------------------------------------------------------- MCM1_0218350_8862_0 MSWin 7 Valid 2024-06-02T21:45:39 2024-06-08T04:02:51 2.15/46.25 65.2/70.9 Adri |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
One more thing, Martin.
I reconsidered the time between 'Running' and EvaluateFitnessOfStartingGeneSignatures' from the workunit for the both of you. It took your wingman less than 1 second: [12:16:45] RunningWhile at the same time, relatively speaking, your device dragged itself in 24 seconds from 'Running' to EvaluateFitnessOfStartingGeneSignatures': [01:13:15] Running The next step in the Result Log is to write the line 'Writing final output'; also, the final output should be written to the result file. I don't know in what order this happens, but some order is suggested (first 'Writing final output' to the Result Log, then actually writing the output to the result file). (The Result Log and the outputfile with the result of the computation itself are two different entities.) Now, it is possible, at least it's what I think, that your device didn't have the time to write the last few lines of Result Log anymore before it was reported back to the server, while the 'final output' was successfully written to the result file (and also reported back to the server at the same time). On the positive side however, I'm assuming that your task must have finished successfully, since it was declared Valid. My final conclusion is that nothing really bad happened with the task (that was yours) in question. Adri |
||
|
|
Link64
Senior Cruncher Joined: Feb 19, 2021 Post Count: 206 Status: Offline Project Badges:
|
2.15 CPU time versus 46.25 eclapsed time. There was either something else eating up all CPU time while this WU was running or it got stuck for some period of time. Whatever it was, it doesn't really matter, your result file was exactly the same as the one retuned by your wingman. It's extremely unlikely, that two unstable computers get exactly the same result, as that would require them to make EXACTLY the same mistake(s) in billions of computing steps. That will pretty much never happen, they will rather get either different results (if they are just a bit unstable), which won't match, or in case they are a bit more unstable, simply garbage data, that has nothing in common with a result. In both cases next task(s) for this WU will be send out until the validator gets back two, that match and those will be marked as valid. Std_err output isn't important, at least not here, on Milkyway for example it is, as it contains the result. Here we upload result files which are compared by the validator, std_err is actually completely ignored and if it was missing completely or contained random text, it would still not be an issue. ![]() [Edit 2 times, last edit by Link64 at Jun 10, 2024 2:12:21 PM] |
||
|
|
TigerLily
Senior Cruncher Joined: May 26, 2023 Post Count: 280 Status: Offline Project Badges:
|
Hi Martin Schnellinger,
Thanks for bringing your concerns to our attention. I passed your post along to a member of the tech team. They agree with Link64 that it is very unlikely that the result was invalid as the wingman returned an equivalent result, and the difference in time between elapsed and CPU time is likely due to your settings or your computer having other things to do. |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
Hello,
Thank you for explaining the log file and you readiness to help Unfortunatley, I have some more units taking exeptionally long: https://www.worldcommunitygrid.org/contribution/workunit/540896480 Log of the wingman: <core_client_version>7.14.2</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219032_6512.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0219032_6512 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 80 NumberOfGenesInSignatureMin = 80 NumberOfGenesInSignatureMax = 80 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 3167 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.284341 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 139446612 [18:00:47] Initializing [18:00:52] Running [18:00:52] EvaluateFitnessOfStartingGeneSignatures 3167 [19:24:23] Writing final output [19:24:23] Closing Output Stream [19:24:23] Cleaning up Result.out = 4395.000000 Run complete, CPU time: 2266.234375 19:24:23 (18544): called boinc_finish(0) </stderr_txt> ]]> And https://www.worldcommunitygrid.org/contribution/workunit/540896435 Log of wingman <core_client_version>7.24.1</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219012_6622.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0219012_6622 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 60 NumberOfGenesInSignatureMin = 60 NumberOfGenesInSignatureMax = 60 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 3576 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.274771 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 139246722 [04:26:40] Initializing [04:26:44] Running [04:26:44] EvaluateFitnessOfStartingGeneSignatures 3576 [05:35:41] Writing final output [05:35:41] Closing Output Stream [05:35:41] Cleaning up Result.out = 2784.000000 Run complete, CPU time: 3728.906250 05:35:41 (20740): called boinc_finish(0) </stderr_txt> ]]> A third one, which differs a lot from the two mantionde above gave an error: https://www.worldcommunitygrid.org/contribution/workunit/540896457 <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> finish file present too long </message> <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219011_7188.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt Settings File DateOfDesign = 08/05/2014 Designer = Krembil-cubes-2023-09-22 WorkOrderID = 0219011_7188 DatasetID = curatedOvarian_EarlyLate_v1.0 NumberOfGenesInStartingSignature = 60 NumberOfGenesInSignatureMin = 60 NumberOfGenesInSignatureMax = 60 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 3576 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.274771 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10" SvmLearnLimit = 500000 RSeed = 139237288 [03:59:01] Initializing [03:59:58] Running [03:59:58] EvaluateFitnessOfStartingGeneSignatures 3576 [06:07:29] Writing final output [06:07:29] Closing Output Stream [06:07:29] Cleaning up Result.out = 1933.000000 Run complete, CPU time: 7607.295164 06:07:33 (6156): called boinc_finish(0) </stderr_txt> ]]> The wingmen in this case have a valid result Thanks for deeper investigation Good bye Martin |
||
|
|
|