Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3116 times and has 17 replies Next Thread
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hello,

I worry that Workunit MCM1_0218350_8862_0 might give a bad scientific result
and ths might be unnoticed, as it has been marked as valdid by BOINC.

Their are two reasons for me thinking that the result could be incorrect

First:

My output log is incomplete and different from that of my wingman (replication 2)

My log is as follows:

Results log


<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0218350_8862.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0218350_8862
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 100
NumberOfGenesInSignatureMin = 100
NumberOfGenesInSignatureMax = 100
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 2820
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.294839
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 132628962


[00:56:01] Initializing
[01:13:15] Running
[01:13:39] EvaluateFitnessOfStartingGeneSignatures 2820

</stderr_txt>
]]>


The log of MCM1_0218350_8862_1 is longer:


<core_client_version>7.24.1</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0218350_8862.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0218350_8862
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 100
NumberOfGenesInSignatureMin = 100
NumberOfGenesInSignatureMax = 100
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 2820
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.294839
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 132628962


[12:16:40] Initializing
[12:16:45] Running
[12:16:45] EvaluateFitnessOfStartingGeneSignatures 2820
[13:39:25] Writing final output
[13:39:25] Closing Output Stream
[13:39:25] Cleaning up
Result.out = 1486.000000
Run complete, CPU time: 4960.343750
13:39:25 (52520): called boinc_finish(0)

</stderr_txt>
]]>

In other words:
In my log, the lines

[13:39:25] Writing final output
[13:39:25] Closing Output Stream
[13:39:25] Cleaning up
Result.out = 1486.000000
Run complete, CPU time: 4960.343750
13:39:25 (52520): called boinc_finish(0)

are missing.

The second reason that makes the workunit somehow "fishy" and strange is
that the replicaton calculated by my machine has a huge difference between
eclapsed time and CPU time

2.15 CPU time versus 46.25 eclapsed time.

46 hours, nealy two hole days for one workunit, this stands out from the rest of the workunits.

Can anyome who has a bit deeper knowledge of the Marking cancer Markers projekt
try to give me an explanation for this strange state of this workunit, please?

I do not want the scientists to get a bad, incorrect result destroying their work...

Thank you
MS
[Jun 8, 2024 8:01:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hi Martin,
If you can, always supply the Workunit-ID, please.
Thank you,
Adri
[Jun 8, 2024 8:29:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hello Adi,
well, the WorkOrderID = 0218350_8862
is already included.

I could not find any other ID, not even in "Properties"

Where can I find the Workunit-ID, please?

Thanks for your collaboration
Martin
[Jun 8, 2024 3:10:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hi Martin,
You wrote: "The log of MCM1_0218350_8862_1 is longer". That is a tiny clue for you. wink

If you re-read the posts from 696764 in that thread you'll notice that the name of a workunit is not the same as its ID and without the ID we cannot check anything.

Now, about the clue, go to your Results (My Contribution --> Results) on the website, find your workunit with the name MCM1_0218350_8862, and click the name of your task (your task is named MCM1_0218350_8862_0). You will then see all tasks within that workunit. In the location bar of your browser you'll find the URL of that workunit and this is the important part: it contains the workunit-ID.

Example: if the URL is https://www.worldcommunitygrid.org/contribution/workunit/510764828, then the workunit-ID is 510764828. Easy! cool

Adri
[Jun 8, 2024 5:10:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hello Adri,

the URL of the Workunit is as follows: https://www.worldcommunitygrid.org/contribution/workunit/533540742

This means, the Unit has the ID 533540742

Hope that this is correct and the techs can check, whether something is wrong with the WU.

Thanks for your advice.
Greetings
Martin
[Jun 9, 2024 11:42:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hi Martin,
The only other remarkable and weird thing that I noticed about this case, apart from your short Result Log and the total running time (46ΒΌ hours!), is the time that it took to get from 'Initializing' to 'Running'. As it happens, if we look at your wingman, they are showing (that what may be considered as) a normal startup time:
[12:16:40] Initializing
[12:16:45] Running
(That's 5 seconds.)

Your task however took considerably much longer to get to 'Running':
[00:56:01] Initializing
[01:13:15] Running
That's more than 17 minutes!

What can I say? Not much more, I guess. biggrin At least you returned your task before the deadline. blushing
Since you already supplied both Error Logs, I'll be showing the remaining data that I retrieved from the link that you provided; see below:

workunit 533540742
App: Mapping Cancer Markers
Workunit: MCM1_0218350_8862
Created: 2024-06-02T21:45:04
Quorum: 2
Replication: 2

MCM1_0218350_8862_0 MSWin 7 Valid 2024-06-02T21:45:39 2024-06-08T04:02:51 2.15/46.25 65.2/70.9
MCM1_0218350_8862_1 MSWin 10 Valid 2024-06-02T21:45:39 2024-06-05T18:41:10 1.38/1.38 76.5/70.9
Details: ---------------------------------------------------------------------------------------------------------------------------------------
MCM1_0218350_8862_0  MSWin 7       Valid  2024-06-02T21:45:39  2024-06-08T04:02:51    2.15/46.25     65.2/70.9
Sent Time: 2024-06-02T21:45:39+0000
Due Time: 2024-06-08T21:45:39+0000
Returned: 2024-06-08T04:02:51+0000
Result-ID: 951574651
MCM1_0218350_8862_1 MSWin 10 Valid 2024-06-02T21:45:39 2024-06-05T18:41:10 1.38/1.38 76.5/70.9
Sent Time: 2024-06-02T21:45:39+0000
Due Time: 2024-06-08T21:45:39+0000
Returned: 2024-06-05T18:41:10+0000
Result-ID: 951574652

Adri
[Jun 9, 2024 4:15:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

One more thing, Martin. biggrin
I reconsidered the time between 'Running' and EvaluateFitnessOfStartingGeneSignatures' from the workunit for the both of you.

It took your wingman less than 1 second:
[12:16:45] Running
[12:16:45] EvaluateFitnessOfStartingGeneSignatures 2820
While at the same time, relatively speaking, your device dragged itself in 24 seconds from 'Running' to EvaluateFitnessOfStartingGeneSignatures':
[01:13:15] Running
[01:13:39] EvaluateFitnessOfStartingGeneSignatures 2820


The next step in the Result Log is to write the line 'Writing final output'; also, the final output should be written to the result file. I don't know in what order this happens, but some order is suggested (first 'Writing final output' to the Result Log, then actually writing the output to the result file). (The Result Log and the outputfile with the result of the computation itself are two different entities.)
Now, it is possible, at least it's what I think, that your device didn't have the time to write the last few lines of Result Log anymore before it was reported back to the server, while the 'final output' was successfully written to the result file (and also reported back to the server at the same time). On the positive side however, I'm assuming that your task must have finished successfully, since it was declared Valid. My final conclusion is that nothing really bad happened with the task (that was yours) in question.

Adri
[Jun 9, 2024 5:51:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Link64
Senior Cruncher
Joined: Feb 19, 2021
Post Count: 206
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

2.15 CPU time versus 46.25 eclapsed time.

There was either something else eating up all CPU time while this WU was running or it got stuck for some period of time.

Whatever it was, it doesn't really matter, your result file was exactly the same as the one retuned by your wingman. It's extremely unlikely, that two unstable computers get exactly the same result, as that would require them to make EXACTLY the same mistake(s) in billions of computing steps. That will pretty much never happen, they will rather get either different results (if they are just a bit unstable), which won't match, or in case they are a bit more unstable, simply garbage data, that has nothing in common with a result. In both cases next task(s) for this WU will be send out until the validator gets back two, that match and those will be marked as valid.

Std_err output isn't important, at least not here, on Milkyway for example it is, as it contains the result. Here we upload result files which are compared by the validator, std_err is actually completely ignored and if it was missing completely or contained random text, it would still not be an issue.
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Link64 at Jun 10, 2024 2:12:21 PM]
[Jun 10, 2024 1:58:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hi Martin Schnellinger,

Thanks for bringing your concerns to our attention. I passed your post along to a member of the tech team. They agree with Link64 that it is very unlikely that the result was invalid as the wingman returned an equivalent result, and the difference in time between elapsed and CPU time is likely due to your settings or your computer having other things to do.
[Jun 10, 2024 3:50:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Incomplete Result Log for Workunit MCM1_0218350_8862_0

Hello,
Thank you for explaining the log file and you readiness to help

Unfortunatley, I have some more units taking exeptionally long:


https://www.worldcommunitygrid.org/contribution/workunit/540896480

Log of the wingman:
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219032_6512.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0219032_6512
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 80
NumberOfGenesInSignatureMin = 80
NumberOfGenesInSignatureMax = 80
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 3167
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.284341
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 139446612


[18:00:47] Initializing
[18:00:52] Running
[18:00:52] EvaluateFitnessOfStartingGeneSignatures 3167
[19:24:23] Writing final output
[19:24:23] Closing Output Stream
[19:24:23] Cleaning up
Result.out = 4395.000000
Run complete, CPU time: 2266.234375
19:24:23 (18544): called boinc_finish(0)

</stderr_txt>
]]>


And

https://www.worldcommunitygrid.org/contribution/workunit/540896435

Log of wingman

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219012_6622.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0219012_6622
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 60
NumberOfGenesInSignatureMin = 60
NumberOfGenesInSignatureMax = 60
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 3576
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.274771
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 139246722


[04:26:40] Initializing
[04:26:44] Running
[04:26:44] EvaluateFitnessOfStartingGeneSignatures 3576
[05:35:41] Writing final output
[05:35:41] Closing Output Stream
[05:35:41] Cleaning up
Result.out = 2784.000000
Run complete, CPU time: 3728.906250
05:35:41 (20740): called boinc_finish(0)

</stderr_txt>
]]>

A third one, which differs a lot from the two mantionde above gave an error:


https://www.worldcommunitygrid.org/contribution/workunit/540896457


<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
finish file present too long
</message>

<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_windows_x86_64 -SettingsFile MCM1_0219011_7188.txt -DatabaseFile dataset-curatedOvarian_EarlyLate_v1.0.txt
Settings File
DateOfDesign = 08/05/2014
Designer = Krembil-cubes-2023-09-22
WorkOrderID = 0219011_7188
DatasetID = curatedOvarian_EarlyLate_v1.0
NumberOfGenesInStartingSignature = 60
NumberOfGenesInSignatureMin = 60
NumberOfGenesInSignatureMax = 60
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 3576
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 0
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy
PermutationsNumIterations = 0
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
FitnessFn = 0
MinFitness = 0.274771
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = LOO
ModelType = SVM
SvmArgs = "-v 0 -t 1 -d 3 -s 0.03 -r 10"

SvmLearnLimit = 500000
RSeed = 139237288


[03:59:01] Initializing
[03:59:58] Running
[03:59:58] EvaluateFitnessOfStartingGeneSignatures 3576
[06:07:29] Writing final output
[06:07:29] Closing Output Stream
[06:07:29] Cleaning up
Result.out = 1933.000000
Run complete, CPU time: 7607.295164
06:07:33 (6156): called boinc_finish(0)

</stderr_txt>
]]>

The wingmen in this case have a valid result

Thanks for deeper investigation
Good bye
Martin
[Jun 19, 2024 5:18:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread