Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: "Waiting to be sent" - Please adopt a CMD2 work unit [Resolved] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 30
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Many thanks Alessandra for the update
----------------------------------------And at the same time , and on behalf from all of us here , congratulate you for being honored last year as "Woman Scientist of the Year" by the jury of the 9th Irene Joliot-Curie Prize from the French Department of Higher Education and Research. Félicitations !! [Edit 1 times, last edit by Former Member at Aug 14, 2011 8:10:45 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Count me in Jean Pierre, congrats to Alessandra Carbone !
; |
||
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges: |
What we now see, is typical project end-game behavior... hours going up, credit per hour going down, brief hikes when the repairs on "No Reply" (NR) go out... the statistics of small numbers. BUT - that is the case when the project is operating normally, not when the output of work has been abruptly stopped. BOINC hasn't yet stabilised on my new machine and I'm still micromanaging it. When the supply of HCMD2 work was turned off, it was turned OFF - there were no repair units being sent, No-replies or completion work - NONE. I had a few tasks in my cache that I knew were going to be returned late. I noted in this very thread on August 2nd that the system had generated a repair unit but on arrival of the late unit, that repair had been marked with a status of "Other" and sent/received dates of 1/01/70 00:00:00. I'll presume that when the queue is re-started these units will be deleted. The reason I did this was curiosity (I have to do it nowadays - the cat died a long time ago.) I knew that the work had been turned off because I'd received no more in the prior 7 days - and it had been documented in this forum. There was therefore no danger of the job being sent to another machine. The point is that every unit that was returned after August 4th was returned late. Every last one. Regardless of the numbers involved, the chart indicates that the runtime on these units was significantly higher than the average before the switch was thrown. Unsurprisingly, this would be because the slower processors given the same mix of units would take longer to process them, and would likely hit the wall more often. The issue is to get those slower processors to do useful work and not simply slow down the system as they are currently doing. On NR, long as those tasks are on the live system, they can come in, but if a make-up copy was send even though the project is on hold [did they send copies?], then these 23 day overdue returns are redundant. All very well for a live system. This is not a situation that has occurred on a live system. It is a situation that occurred when the system was turned off. No make-up copies have actually been sent, so those very late results are not redundant. They will be used. P.S. I'd suggest the 10-day deadline needs a closer look if this published data is to be trusted. Give us a good reason, but for the progress percent / remaining days, why it should not be trusted to a very large degree? I can't control the data published on the HCMD project status page. There doesn't seem to be any data from WCG about how much data has actually been delivered. Since the only data visible seems to show that data-delivery for batches 1785+ is incomplete, and that was being crunched what - two months or more ago - then there must still be incomplete work from that time outstanding. It's probably not significant at this stage of the project, but extending the deadline would allow results from slower crunchers to be used where they are currently discarded and delaying the work. I'd suggest that the deadline for any project, not just HCMD2 should be a minimum of (maximum-expected-runtime + maximum-BOINC-cache-size-in-days.) Very difficult to estimate max-runtime I'd suppose - but matching the jobsize to the individual processor capacity available would aid significantly - and the 6/12-hour HCMD2 wall would set a limit, too. Perhaps at the start of a project, the deadline could be let out significantly and then brought in as the project nears completion. The downside would be that there would be more credit sitting in PV. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Suppose that my English is not up to your snuff... it's fine. Do compare the curves with past projects in end phases... they're quite similar, makeup copies send out or not.
--//-- |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Mysteron347,
----------------------------------------I suspect that these tasks returned after so many days have nothing to do with the speed of these machines. Because even for the slowest machine the 6-hour limit will be reached in time if the machine is running an average one hour a day, whatever the speed of the processor. Considering that it is currently Summer holiday time in many countries my guess would rather be that these tasks are coming back from machines which have been switched off one or two weeks for holidays and, probably to the surprise of their owners, they are allowed to finish the job and be granted credits normally when starting crunching again, simply because the time has been suspended for this project. |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Agreed, the speed of the processor can have no effect on a project that is limited to a six hour cutoff. (The 12 hour cutoff would only come into effect on a pretty hefty processor or a very small work load.)
----------------------------------------I know I have a machine that has gone down for the vacation count. Those results will eventually hit NR and be resent but if I had loaded them up with HPF2, they'd take off right from where they left off as soon as the machine is turned back on and report for valid credits - even if that is two months from now. Distributed computing volunteer since September 27, 2000 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Typically the slower devices are the later returners, so when the normal mix in production gives 4.35 hours and the residuals in the field come in, you'll see exactly what was observed, a creep up, the difference being here that it would be capped at somewhere between 6 and 12 hours... the fewer results get returned, the more weight a long running result has on the averages and sure enough, for the 54 results that came in this morning the mean is 5.54 hours, more than an hour more than in full production, with actually a declining trend as the batches get lighter to process.
----------------------------------------As for the 12 hours hard-cut, I see no reason why a slow device pair would not hit upon a task that would go the full stretch. All depends on the positions packed in a result and their difficulty. WCG does as yet not distribute tasks based on CPU power; as I mentioned in a post above, that's one of the long term goals. You'd have to visit posts by knreed for some elaboration. --//-- [Edit 1 times, last edit by Former Member at Aug 15, 2011 12:33:36 AM] |
||
|
BSD
Senior Cruncher Joined: Apr 27, 2011 Post Count: 224 Status: Offline |
Come on wingman, crunch, crunch, crunch.
CMD2_ 2052-2I85_ A.clustersOccur-2IGQ_ B.clustersOccur_ 0_ 12499_ 16476_ 2-- - In Progress 8/18/11 09:32:52 8/22/11 09:32:52 0.00 0.0 / 0.0 <--- CMD2_ 2052-2I85_ A.clustersOccur-2IGQ_ B.clustersOccur_ 0_ 12499_ 16476_ 1-- 640 Pending Validation 7/22/11 13:01:05 7/23/11 13:00:31 2.67 35.3 / 0.0 <-- Mine CMD2_ 2052-2I85_ A.clustersOccur-2IGQ_ B.clustersOccur_ 0_ 12499_ 16476_ 0-- 640 Error 7/22/11 13:00:52 8/3/11 11:18:30 0.00 0.0 / 0.0 |
||
|
BSD
Senior Cruncher Joined: Apr 27, 2011 Post Count: 224 Status: Offline |
CMD2_ 2052-2I85_ A.clustersOccur-2IGQ_ B.clustersOccur_ 0_ 12499_ 16476_ 1-- 640 Valid 7/22/11 13:01:05 7/23/11 13:00:31 2.67 35.3 / 35.5
|
||
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges: |
Hmm - this seems to be the thread where the infamous "Legend of September" was created.
----------------------------------------Maybe September's right, but we got the year wrong? [Edit 1 times, last edit by Mysteron347 at Sep 21, 2011 1:14:44 AM] |
||
|
|