Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: Do not think it helps to stock up large caches... |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 39
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sorry Sek, was not intended as it came across, got pulled away and did not finish my thought train.
----------------------------------------I also have seen this, and also suffer with a PV list that seems to always have a few wingmen go no reply after waiting a week or longer to validate. had 3 clear today 2 from retreads, 1 late return. My personal feels are that this happens most on mondays on jobs with weekend complete by times. maybe boxes get shut down friday night and don't get restarted until monday am??? I know we have a good amount of corp machines crunching, that follow this pattern. Would like to see some type of check in system for jobs that are uploaded, but not yet "Reported", not sure if all data needed is uploaded at completion or not. Could have saved at least one job that I know of from getting reissued. "Server Abort" does work on WCG, only if client reports to server, AND job not yet started. also, on reissues, if late job comes in, reissue will get "Server Abort" IF contacts server AND not started. Systems that are only on part of the day, day workers etc, the client does adjust the cache accordingly. ie on 12 hrs, off 12 hrs, cache setting of 1.5 days, client only downloads approx 18 hrs of work. within the constrains of the DCF that is ;) I have been crunching here almost 3 years, never ran out of work. I have intermittant systems, some even on dial up that only connect when they need to. still never had a need for large cache. only reason I use 1.5 days is becuase it gives me smallest PV list. 95% of all WU return in 2-3 days. EDIT: only thing I see a 10 day cache doing is holding jobs on that computer for 8-9 days before they get started, where they could have gone to someone else and been back a week sooner [Edit 1 times, last edit by Former Member at Apr 20, 2010 12:25:51 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I set my cache for 0.1 days.
I can't remember a time when I've run out of work. There is that check box to also prevent such a thing...you know what box. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7595 Status: Offline Project Badges: |
I have machines which are only intermittently connected to the internet. On these machines I use 4 and 5 day cache depending on the speed of the machine. This gives room for some margin of error. I take my modem and and connect these machines about that often. I expect these machines to crunch 24/7, but once in awhile something will happen, such as a power outage, hard drive failure, etc. and I lose jobs. Technical glitches so to speak.
----------------------------------------Given the number of machines on WCG it is statistically probable that there will be some small percentage everyday which do not return work for various reasons. The system is designed to take this factor into account with re-issues etc. So, as the song says "Don't worry, be happy." The work will get crunched, just not as fast as we would all like it to. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges: |
I'd go along with Sgt. Joe on this, and I've reset my cache to 5 days.
If a 10-day cache "near guarantees" late-reports, then this indicates that the job run-time is being consistently underestimated. With such a large cache, the long jobs should cancel the short. Certainly, manual cancellation simply delays the WU completion overall. I found I was cancelling a few jobs once a month or so, and reducing the cache should make that unnecessary. Again, we should be careful not to generalise too much. Since I'm unemployed, I have plenty of time to monitor progress and compensate - gives me something to do. An unattended installation will be different no doubt. As for "How often do these floods/rains/snow/fires ISP collapses occur on your end and what has been their max days calamitous duration?" Er, well, floods aren't a problem here for me. Gets a bit hard even for Mother Nature to flood the Indian Ocean - and we'd have a few more urgent problems than WCG runtimes if that occurred. Rain? That's another matter. The first rains of winter tend to wash out the dust accumulated on the insulators over Summer. Normally leads to a short power-outage, sometimes hours but can be a day or more. Power is more often interrupted by the latest speeding clown wrapping themselves around a power pole though. That has been known to happen every Saturday night sometimes for weeks on end. Snow? Snow within 1000Km would be all over the news. More of a problem in Canadia and Russia and Northern Europe, I hear. Did have a bit of a hailstorm a few days ago though. Didn't affect me, but some places not too far away were blacked out for a couple of days. Same places were affected similarly by fires a few months back. Not as bad as the Victorian fires last year which coincided with the Queensland floods - and there have been more floods in the East this year. Priority for some reason is to get the people and animals safe first, then make sure they're supplied with beer and food. Inexplicably, restoring internets is further down on the list... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So, as the song says "Don't worry, be happy." The work will get crunched, just not as fast as we would all like it to. Yes it will get crunched, but the point here is we are unnecessarily wasting other users' CPU time with make-up work because the 10 day cache = 10 day deadline. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
another "Feature" that is frustrating is that the job I was No Reply on, was actually completed and uploaded 8 days prior, but sat in "Ready to Report" as client machine was turned off. I've never understood this behavior in BOINC. I don't see the point in this 2-step activity to report a completed WU (Finish WU -> Ready to Report -> Sent). |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
another "Feature" that is frustrating is that the job I was No Reply on, was actually completed and uploaded 8 days prior, but sat in "Ready to Report" as client machine was turned off. I've never understood this behavior in BOINC. I don't see the point in this 2-step activity to report a completed WU (Finish WU -> Ready to Report -> Sent). The 2 step is for a good reason... to make absolutely sure that what the client sent is the same as what the servers received for one and the second being, the first is just a flat database taking in the data records. The second part goes to the highly taxed scheduler that determines what you have / had / are receiving. You don't want like yesterday 635,000 unique hits on the scheduler doing all the calculations for each (7.4 times per second). You like to combine multiple Ready to Reports (RtR) so the scheduler can handle that as a single transaction. In the exampled case, the RtR would have at latest reported within 24 hours, but absent any contact, nothing can be done. If one knows these road trips to happen, briefly look in and hit the Update button again. That's what I do before closing the lid. Very much in a nutshell. Plz read the BOINC Wiki's and FAQ's for more info if interested. PS: fredski..., there's incentive. The fast returners are as much as possible for quorum requirements matched to fast returners... at least, I'm observing dramatic improvement from what it was before. Not allot of work in PV jail at any one time. Those that don't get matched is simply when a job was send to number 1, the scheduler will only hold it a limited time, then send number 2 to anyone who asks for a job. Even HPF2, which requires 15 minimum to start quorum checking is usually complete here within 24 hours. Who can resist that? For WCG it means hundreds of thousands less in PV jail held on the scheduler and at least an equal number In Progress less if we can get that cache number down collectively.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Apr 20, 2010 5:58:46 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Mysteron347 wrote snip
----------------------------------------If a 10-day cache "near guarantees" late-reports, then this indicates that the job run-time is being consistently underestimated. With such a large cache, the long jobs should cancel the short. With HFCC/FAAH/HCMD2/DDDT2 is near impossible to get exact estimates... remember most we calculate is non-deterministic. Statistically one would expect long and short to cancel out, but the controls work on the current series of work to guesstimate the TTC of the rest so if there's a short series of HCMD2 grand children that came with a 4.5 hour estimate [project running average] and they take just 1-3 hours, the backfill is affected. Then when the parents come in again with 6-12 hours run time... panic and that changes client behavior for longer and we see posts about "Why is my client not getting work?" Not going to expand on this: All forums frequenters have seen many discussions on Duration Correction Factor (DCF) blow outs on client side due the variability. I'm reading of future developments (following the developers check-in list and saw a few cheers by knreed elsewhere) that will further mitigate, but that will be next year earliest.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Apr 20, 2010 6:19:45 AM] |
||
|
PecosRiverM
Veteran Cruncher The Great State of Texas Joined: Apr 27, 2007 Post Count: 1053 Status: Offline Project Badges: |
My problem I just bumped my cache (going out of town). Got 6 repair jobs on a slow duo (30hr/WU). I sure hope they run alittle faster then listed.
---------------------------------------- |
||
|
nasher
Veteran Cruncher USA Joined: Dec 2, 2005 Post Count: 1422 Status: Offline Project Badges: |
yes that is the bigest problem with this job is the inability to judge ahead of time how big a work unit should be. if at all posible keep your machine connected at all times and a very small catch and that should keep you crunchig well. if you cant then you cant.
----------------------------------------honestly if a work unit dosnt get returned then it dosnt get returned and someone else will crunch it. i cant remember what project it was but at one time i had over 8 pages of Pending Validations with just 4 cpu cores total running. back when there were RICE units to crunch it was easy to figure out how many you needed cause they were basicaly a set time. crunch and be happy... and go for the next level of badge |
||
|
|