| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 24
|
|
| Author |
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
It's not a matter of understanding in that case, it's a typical case of reading what one has in mind instead of reading what is written.
---------------------------------------- Indeed there are several "strange" things regarding this distribution. First the delay, then the deadline of 14 days for your wingman while parent deadlines had already been reduced to 10 days on this date. And top of it I think that the distribution priority rule described by knreed should apply only for the first copy of a WU. Once one copy has been distributed the other ones should get a higher distribution priority to ensure a reasonable turnaround time for the WU. Cheers. Jean. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Recently got another one of these:
CMD2_ 0016-ATPB.clustersOccur-XPO2A.clustersOccur_ 14_ 0-- - In Progress 4/07/09 11:56:18 14/07/09 11:56:18 0.00 0.0 / 0.0 CMD2_ 0016-ATPB.clustersOccur-XPO2A.clustersOccur_ 14_ 1-- - Waiting to be sent — — 0.00 0.0 / 0.0 |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Let us know what happens when the 0-- is back.
----------------------------------------Theory [how I would design this, and feel free to do a ROFL, for I don't care too much about those that have a low threshold for that compulsion]: When 0-- is back, the child copy can already speculatively be cut to have the next X positions. The 1-- copy can be given what the 0-- has and run it to the end, not limited to the 4 hour run time... since it's probably to reach 60% of total by then. This way, the child goes out soonest, with zero duplication of positions (I've typed about that somewhere else, as a none original thought, me thinks).
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Let us know what happens when the 0-- is back. It's back. Nothing in particular has happened. I've had well over a dozen of these. If the scheduler doesn't send the second out close to the first, it never comes back to do it until the original deadline is reached. As we are dealing with a large number of homogeneous redundancy groups, I wonder if there's some time period after which WUs that nobody asks for fall off the scheduler and get lost? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A little over a day after the 0-- result was returned, the 1-- was sent out. Other missing 1-- WUs were also sent out about the same time. Maybe some tech intervention involved?
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Let us know when you see this again. The '1' record was already deleted (download error) and '2' had been sent out and returned by the time I saw the record so I couldn't see what happened to '1'.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Oh, yes, I did have one which ended up showing the 0-- and 2-- with no 1--. And another permutation: The 1-- sent out first and the 0-- sent out 14 days later:
CMD2_ 0014-GRP75A.clustersOccur-MLE1.clustersOccur_ 233_ 0-- - In Progress 5/07/09 20:05:39 15/07/09 20:05:39 0.00 0.0 / 0.0 CMD2_ 0014-GRP75A.clustersOccur-MLE1.clustersOccur_ 233_ 1-- 614 Pending Validation 21/06/09 11:43:25 21/06/09 19:56:15 4.21 19.8 / 0.0 |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Apart from this and the known PV jail issue on complete quorums due a skipped flag, I don't know what the HCMD2 share is on Linux platform. That may add also to a longer stay... i.e. it to take longer before the next step can be determined. Think Jean posted that the Parent tasks have in meantime be given a deadline too of 10 days, down from 14.
----------------------------------------As for 0-- or 1-- going first, can't think of a necessity to have either of the 2 having priority over the other. Observing RICE and HPF2 init distro 19 it seems a rather random function for the way the 'waiting to be sent' order of transmission, hence I actually hit the header to sort the sequence they went out.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
I don't know what the HCMD2 share is on Linux platform. That may add also to a longer stay... i.e. it to take longer before the next step can be determined. I have not noticed anything abnormal in this area. When I run with 0.15 day of extra work buffer I have more or less one day of work in PV, which is more or less what I have always had with other projects. Because of the server maintenance I had raised the buffer to 0.5 day and within one day the number of PVs decreased by 25-30 %, which tends to show that these WUs are returning quite fast in general. Cheers. Jean. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Kremmen,
Thanks for the reference. What you are seeing is the impact of a finer grained homogeneous redundancy policy. In particular, for the non-SSE2 processors, they are in their own small group of workers and the result has to be matched with another non-SSE2 computer to complete the workunit. Because of the small size of the pool, this can take an extended period. We have tried to stay away from fine grained hr for this reason. thanks, Kevin |
||
|
|
|