Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 120
Posts: 120   Pages: 12   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12777 times and has 119 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,35971

Please post issues here.

Since this is testing checkpoint, if possible, please force your agent to checkpoint. To do this, turn "Leave Application In memory" off and then suspend the work unit. To make sure the work unit is out of memory, you can open 'Task Manager' or 'ps' to make sure the application does not show. Then resume it. Another easy method to force a restore from checkpoint is to reboot your machine.

Thanks,
-Uplinger
[Dec 9, 2013 6:40:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

I wanted to give everyone an update on the issues addressed and not addressed in this beta.

  • Short running workunits, under a minute cpu time, are not fixed. We are unable to recreate this locally and need more information. This new build has more output to stderr that will help track this issue down. Therefore if you have a very short running workunit please post the stderr to this thread.
  • Workunits that continue to run past 100% are not fixed. Again we are unable to recreate this issue locally but have added outpt to stderr to help track this one down so if you have a workunit that continues to run past 100% complete please post stderr.
  • The invalid issue may imporove but is not completely fixed. There are changes to checkpointing but some issues remain that will still cause invalids for some types of workunits . We are able to recreate this issue locally so a complete fix should be coming soon.
  • The OS X build has been modified to attempt to address this issue with MCM not running on OS X 10.5. Unfortunately we do not have access currently to a 10.5 machine so this will have to be verified in the beta.

Thanks,
armstrdj
[Dec 9, 2013 6:59:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

OK, so in general accepted terms, LAIM OFF :D

... and the tasks are flooding in. 100,000 originals would [idle hope] promise a second reload after completing the first set received.

P.S. Now the dilemma... got 7.26 running on one machine on 8 cores. Unloading the client or rebooting is not an option or take an 8 invalid hit, so will just LAIM OFF > Suspend 7.27 Beta tasks and do the TM check to verify they've unloaded. d oh
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 9, 2013 7:04:31 PM]
[Dec 9, 2013 7:01:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

One day deadline.
BETA_ MCM1_ 0000089_ 5561_ 0-- xxxxxxxxx In Progress 12/9/13 19:09:51 12/10/13 19:09:51 0.00 / 0.00 0.0 / 0.0
----------------------------------------

[Dec 9, 2013 7:13:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

Here a sample stderr.txt of an 'unloaded' 7.27 after resume. 2 wcg_learn_limit = 50000 entries. The initialization time looks new... hope it sticks in the production version so folk have a feel how much 'Elapsed' was involved to permit a verification of the true wallclock the task ran versus recorded CPU time (some having impressions of disparity there). The BOINC client log file stdoutdae.txt would have revealed that too.

Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.27_windows_x86_64 -SettingsFile MCM1_0000089_4921.txt -DatabaseFile dataset-17_72_SDG_v1.txt
Settings File
DateOfDesign = 11/08/2013
Designer = PMCC_OCI
WorkOrderID = 0000089_4921
DatasetID = 17_72_SDG_v1
NumberOfGenesInStartingSignature = 16
NumberOfGenesInSignatureMin = 10
NumberOfGenesInSignatureMax = 20
GroupVectorValues = {A}{B}{C}{D}{E}{F}
ExplicitStartingGeneSignatures = A B D F
StartingGeneSignatureAlgorithm = randomFixedLengthSearch
SearchAlgorithmNumberToCreate = 1
SearchAlgorithmSequentialStartPosition = 5
RunPermutationAlgorithm = 1
PermutationGroups = A
PermutationGroupsForReplacement = G
PermutationAlgorithm = replaceFromRandomlyToRandomlySimulatedAnnealing
PermutationsNumIterations = 7738
OptimizationAlgorithmFrequency = 0 0 1
FBeta = 1.5
SimAnnealIMax = 20000
SimAnnealAlpha = 0.9996
NReps = 10
TrainFrac = 0.7
NFolds = 10
VMethod = NFCV
ModelType = SVM
FitnessFn = 0
MinFitness = -1
SimAnnealEConv = 2
SvmArgs = "-v 0 -c 0.01 -t 1 -d 3 -r 0"
SvmLearnLimit = 500000
RSeed = 344921


[20:07:35] Initializing
wcg_learn_limit = 500000
[20:07:46] Running
[20:07:46] EvaluateFitnessOfStartingGeneSignatures 1
[20:07:48]: Computing pass 0
Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.27_windows_x86_64 -SettingsFile MCM1_0000089_4921.txt -DatabaseFile dataset-17_72_SDG_v1.txt
[20:26:18] Initializing
wcg_learn_limit = 500000
[20:26:29] Running
[20:26:29] EvaluateFitnessOfStartingGeneSignatures 1
[20:26:31]: Computing pass 0

edit: The above was after 2 checkpoints had registered [Courtesy easyview with BOINCTasks]
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 9, 2013 7:44:25 PM]
[Dec 9, 2013 7:34:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

One day deadline.
BETA_ MCM1_ 0000089_ 5561_ 0-- xxxxxxxxx In Progress 12/9/13 19:09:51 12/10/13 19:09:51 0.00 / 0.00 0.0 / 0.0


This setting was left by accident. I have changed it to a 4 day deadline.

Thanks,
-Uplinger
[Dec 9, 2013 7:35:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

Not an issue, just a couple of comments:

  • (I'm assuming that) There's no point in restarting the WUs in the first 10 minutes as they'll restart from the beginning, not the checkpoint file.

  • Without knowing what's in the black box, e.g. whether there are multiple passes / steps / phases in the calculations, I'm going to restart mine near the end as well, in case it's somehow process dependent.

    Any comments from the techs on this? How can we best help?

Let's hope we give them what they need to sort this one out!
[Dec 9, 2013 7:36:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

As edited into my previous post, did do a suspend/resume test after 2 checkpoints. Now done a second round of interrupts after the 10th+ checkpoint i.e. multiple interrupts.

A little devious: If you run BOINC as service, you wont see the science tasks running [In Task Manager], then mistakingly assume they were unloaded. You need to hit the "Show all user processes" button left bottom. Science apps have WCG or wcgrid at front.

Also, one machine I'm not going to interrupt tasks on... just to make [quasi] sure when uplinger switches on the validator, that interrupted and non-interrupted get to meet [although, how many of the 100,000 will get intentionally interrupted] ;D
[Dec 9, 2013 8:48:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

No issues to report so far, just good news.. Restarted BETA_ MCM1_ 0000089_ 1255_ 1-- and it's already Validated OK.
----------------------------------------

[Dec 9, 2013 10:05:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Mapping Cancer Markers Beta Test - Version 7.27 [ Issues Thread ]

Apis,

Restarting anywhere in the middle of the work unit should be fine. Since I have validation turned on and assimilation turned off all of the results will stay around longer allowing us time to investigate issues with the increased logging.

Thanks,
-Uplinger
[Dec 9, 2013 11:12:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 120   Pages: 12   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread