| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 14
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, possible, but very unlikely. The last one lasted for 35s so... yeah. Gotten yet another Beta WU since my last post, I'm feeling rather lucky, took about 35min. So again, I'm feeling uncertain whether those minute-long WUs can be considered grand children. And aren't there posts about some WUs taking several hours? ... Ah. http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=25710 I'd say this is one of the grand children, even if it's reported from a slow computer. From the naming of the WUs I saw even the 4th generation in my most recent Beta WUs: BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 496_ 905378_ 905595_ 905560_ 905581_ 0-- <- grand children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 359_ 655651_ 655861_ 655759_ 655779_ 2-- <- grand children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 325_ 593524_ 593718_ 593605_ 593624_ 1-- <- grand children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 313_ 571676_ 572048_ 571944_ 571978_ 1-- <- grand children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM2A.clustersOccur_ 407_ 141444_ 141477_ 141467_ 141469_ 141469_ 141469_ 0-- <- great grand children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 334_ 610396_ 610552_ 610414_ 610429_ 610422_ 610423_ 1-- <- great grand children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 357_ 651640_ 651839_ 0-- <- children BETA_ CMD2_ 0001-GPDAA.clustersOccur-ITB5A.clustersOccur_ 4679_ 116991_ 116993_ 1-- <- children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 298_ 544217_ 544432_ 544391_ 544432_ 544425_ 544428_ 1-- <- great grand children BETA_ CMD2_ 0001-TNR1AA.clustersOccur-UGPA2A.clustersOccur_ 84_ 85355_ 85594_ 0-- <- children BETA_ CMD2_ 0001-GPDAA.clustersOccur-TPM1A.clustersOccur_ 262_ 478777_ 478946_ 1-- <- children BETA_ CMD2_ 0001-GPDAA.clustersOccur-PYGM.clustersOccur_ 304_ 0-- <- original BETA_ CMD2_ 0001-GPDAA.clustersOccur-PYGM.clustersOccur_ 20_ 3-- <- original BETA_ CMD2_ 0001-GPDAA.clustersOccur-PYGM.clustersOccur_ 305_ 1-- <- original BETA_ CMD2_ 0001-GPDAA.clustersOccur-PYGM.clustersOccur_ 302_ 0-- <- original OK, the WU names get very long, but I would suggest to split the remaining positions of a WU in not so many WUs (I would take only 2), and therefor create more generations if necessary. This "splitting factor" depends on the number of hard-to-calculate positions and their distribution inside of the original WU. Only someone who has more insight into the WUs could decide what is the right value. Greetings Thorsten |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
A beta is a beta to find out exactly what best method is for this project and knreed posted somewhere he'd done the full circle and figured out how to do this without great-great-grand children and not end up with single position 6 second work units, in large numbers ;>)
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The fifth WU in my list above was a great-grandchild and lasted only 9 sec on my relatively slow machine. It was a one position WU. I think it might not be possible to avoid this short one position WUs if two adjacent positions differ from very hard-to-calculate (>1h) to very easy (some seconds).
Another suggestion: A WU should always complete at least one position. From this thread it seems that WUs get aborted even if the calculation of first position has not been finished. Only my 2ct. Greetings Thorsten |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Workunits do always compute at least one position. We only check whether or not to exit the workunit after the computation of a position (think of it as the position being the smallest indivisible unit of work - after each position % complete is updated and the termination check is made).
One of the the issues that is huge for us but not readily apparent to the members is the time it takes a batch to complete. We get the data from the researchers in a batch and return a completed batch back to them. For many reasons it is good to minimize the time it takes for us to complete a batch. Each generation of workunits that is created based on an original workunit significantly extends the length of time that it will take to complete the batch. In this beta test we wound up having 5 generations of workunits. This was with a limit of 10 children created for each workunit. On the production system, the workunits will have a deadline of 10 days and it is likely to take 5*10+ days to complete a batch (the time it takes to complete a batch is determined by the hard luck case workunit that winds up going to computers that don't return the result and is the hardest to compute). 50-60 days is a very long time to wait for the results to complete. If we had placed a limit of 2 children, then it is very likely we would have had considerably more generations and thus have an even longer time for for the batch to complete. We are likely to leave the 10 children as the max split for workunit in place. In addition, we are likely going to mark workunits that are great grandchildren or lower as needing reliable hosts to finish. We expect that this will limit batch completion time to around 30 days. |
||
|
|
|