Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: BOINC Agent Support Thread: Unable to download tasks for a week |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 29
|
Author |
|
Magiceye04
Cruncher Joined: Jul 5, 2008 Post Count: 36 Status: Offline Project Badges: |
Another week with broken download server. When will Krembil grab the phone and call a specialist for such problems?
|
||
|
Link64
Advanced Cruncher Joined: Feb 19, 2021 Post Count: 116 Status: Offline Project Badges: |
Yeah, it is the best for the system, server and bandwidth when we all stop crunching We don't need to stop crunching, but we definitely shouldn't do stupid things, which generate more load on the servers without being useful in any way. This applies in particular to people aborting all work units or even detaching and reataching and trying to get other work units like that would change anything. |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1928 Status: Offline Project Badges: |
@CrystalPellet & @Link64
----------------------------------------First of all, let me apologize for a somewhat belated reply. I have been working 17-18h days all week long. Second, I want to apologize for my assertion that the file with that random file name was NOT part of MCM1. It apparently is. But... While busy at work, I had even less time to baby sit my hosts than I usually have. On Thursday (don't recall the time of the day), I had four (4) hosts run out of WCG work, due to a (couple of) results stuck being uploaded, hence no report, hence no new WUs been able to downloaded. On one (1) of these hosts, when kicking the tire to get a stuck upload going again, I noticed that 4x MB MCM1 related file being download and not only those 912/915 byte sized MCM1 WU files. One time, on one host. Not on any of the other stuck hosts, nor on any of previous times where MCM1 downloads had been stuck. Not on any of the 590,000 MCM1 WUs that I have returned ever since MCM1 first started. The only time I have seen a file of this size being download was when I had added MCM1 to a new host, like a new laptop of mine a couple of weeks ago, as part of the initial set of files when the first WUs are being downloaded. Or a couple of times when WCG was down for a prolonged period of time since Krembil took over, usually more than 24h continuously. And all of this time, even after the move to Toronto, the size of this file has never been a problem. I don't know how long this file is "valid" and MCM1 WUs are referring to it, but this is in my experience/observation that this is likely for at least a couple of months. It is definitely NOT the case that this being downloaded excessively when you just run out of work, rather the result of people likely resetting the project or de-attaching/re-attaching, which is rather questionable what this would fix in the first place. And hence by objection that making this file "sticky" doesn't really help to alleviate the current (bad) situation. All the current problems with downloads, and apparently since some time yesterday (Friday, my time zone), uploads are due to the very nature of the ARP1 project. And this was already the obvious case 2 years back, when the same kind of symptoms, stuck downloads (don't recall excessive upload issues back then) appeared at the same time when new ARP1 WUs were released together with a huge number of rather short OPNG WUs. It certainly is not a MCM1 problem. It is (well, should have been) clear from the onset that ARP1 is vastly different from all the other projects on WCG within the last 3-4 years, even when this was still run under IBM's auspice. From the start, it has been made clear that this project has serious resource requirements, not only for the hosts that are trying to crunch it, but for the overall infrastructure at WCG as well. And it is that later issue that unfortunately a lot of folks, which is clearly evident by them posting about, simply don't care, just ignoring those restrictions. It is their selfish attitude, that isn't like to CAUSE the general problem, but definitely contributing to make a bad situation worse. And apparently, some of those folks are simply not willing in their ignorance to see this and do their part to try and ease the overall situation, for everyone.. Ralf |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7545 Status: Offline Project Badges: |
It is readily apparent from the total work units returned that many hosts are not getting enough work units to crunch and when they are crunched, are hung on the upload portion. And this slowdown has happened with the most ARP units being returned in the 6,000 range.
----------------------------------------Perhaps, until the infrastructure at the hosting site can actually handle the traffic and bandwidth requirements, the number of ARP work units being released should be rationed. They could start with a level of 60 per hour and work their way up to what ever level does not bog the system down. This would be one per minute which should hamper anyone from hoarding or stockpiling too many. If the connection from the download is terminated after the the one download of ARP and the next try from that IP address would go to the back of the queue and wait for its next turn, it would also probably free up enough bandwidth to alleviate the problems with MCM downloads and uploads. This would probably involve some tweaking on Krembil's part, but it would probably alleviate a lot of user frustration. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
merboy
Cruncher CA Joined: Nov 17, 2004 Post Count: 7 Status: Offline Project Badges: |
Everyone should switch to 'No New Tasks' until this is fixed, seriously they dont deserve any of our efforts until this server side garbage is resolved. Again. for the 82nd time since conversion. smh.
|
||
|
Craig S.
Cruncher Joined: Nov 19, 2005 Post Count: 2 Status: Offline Project Badges: |
It's either a load-balancer issue (Eg:if you hit retry a lot it'll eventually go.) or a storage issue on their back end either way the fact that this has occurred before,and or that no one on their administrative side either knows about the problem or can fix it tells me that their IT staff are lackadaisical or incompetent or both.
----------------------------------------[redacted] I could half [redacted] their infrastructure better than they are doing at present. at-least I know how to run an enterprise class data-center. [Edit 1 times, last edit by savas at Nov 19, 2024 10:39:57 PM] |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1928 Status: Offline Project Badges: |
It's either a load-balancer issue (Eg:if you hit retry a lot it'll eventually go.) or a storage issue on their back end either way the fact that this has occurred before,and or that no one on their administrative side either knows about the problem or can fix it tells me that their IT staff are lackadaisical or incompetent or both. The problem is that they do not have the financial backing for an enterprise class data center setup. That includes simply not have the financial means to have 24/7 staff on hand, which results in rather frustrating weekends. And right now, they actually don't have a "communications intern" like the last two years, so at times the previous two weren't very responsive either...[redacted] I could half [redacted] their infrastructure better than they are doing at present. at-least I know how to run an enterprise class data-center. Ralf |
||
|
Eric Pohlke
Cruncher Canada Joined: Feb 4, 2006 Post Count: 15 Status: Offline Project Badges: |
My WU started getting deferred last week around Nov 15. Yesterday, my twin EPYCs were waiting on uploading results for over 620 WU which will reach expiry in 6 hours.
The problem started when the last director took on the World Community Grid Project at Krembil and was excited as they heard and saw a lot of potential for free medical research crunching. However, the new Director (Jaideep Bains) is not so inclined with the project and fails to realize its potential and thus has cut funding to it. It's a real shame. As some clients are not just using a cell phone or laptop, but have a small data center server to handle impressive workloads at great speeds. But yet, there's only 1 or 2 assigned to the project on shift and only there a few days a week for a few hours. Weekends, Holidays, forget it. It's a real shame that the institute, that specializes in Medical research doesn't see this project's full potential when working. |
||
|
Eric Pohlke
Cruncher Canada Joined: Feb 4, 2006 Post Count: 15 Status: Offline Project Badges: |
In their annual reports, there is nothing about the World Community Grid Project.
Yet their financial Sponsor list is huge. 36 37 Sponsors Abbott AbbVie Aerie AGE-WELL American Academy of Neurology American Foundation for Surgery of the Hand American Medical Systems American Society of Regional Anesthesia and Pain Medicine Amgen Anavex AOSpine Aria Pharmaceuticals Arthritis Research Foundation Arthritis Society Canada AstraZeneca Atuka Aurinia Pharmaceuticals Autism Speaks Avicanna Avir Axoltis Azure Banting Research Foundation Bayer Biogen Bioness BlueRock Therapeutics Boston Scientific Brain Aneurysm Foundation of Canada Brain Canada Bright Focus Foundation Bristol Myers Squibb Canada Research Chairs Canadian Blood Services Canadian Cancer Society Canadian Initiative for Outcomes in Rheumatology cAre Canadian Institutes of Health Research Canadian Pain Society Canadian Rheumatology Association Canadian Stroke Consortium CannScience Innovations CDLK5 Canada Celgene Celixir CENTOGENE Centre for Addiction and Mental Health Centre hospitalier de l'Université de Montréal Cerenovus Cerevel Cervical Spine Research Society Christopher & Dana Reeve Foundation Competitive Drug Development Corindus CorNeat Vision Cure PSP Dravet Syndrome Foundation Dystonia Medical Research Foundation EBT Medical Eli Lilly Endogena Therapeutics Epilepsy Canada Epygenix Therapeutics ES Therapeutics Fighting Blindness Canada Fisher & Paykel Healthcare Fresenius Kabi Functional Neuromodulation Fusmobile GE Canada Genentech Gilead Sciences Glaucoma Research Society of Canada GlaxoSmithKline GRAPPA Green Valley Harrington Discovery Institute Healios Health Canada Heart and Stroke Foundation of Canada HebeCell InnoCentive INSIGHTEC Insmed International Parkinson and Movement Disorders Society International Spine Study Group Foundation Iqvia Jaeb Center for Health Research Janssen Johns Hopkins University Lahey Clinic Foundation Lawson Health Research Institute LifeArc Lucid Lung Health Foundation Lupus Canada Lupus Foundation of America Lupus Ontario Lupus Research Alliance Massachusetts General Hospital External Sponsors McMaster University Medical Decision Modeling Medpace Medtronic Merck Milken Institute Mount Sinai Hospital Multiple System Atrophy Coalition National Institutes of Health National Organization for Rare Disorders National Psoriasis Foundation Natural Sciences and Engineering Research Council of Canada Neuraly New Frontiers in Research Fund Novartis Novo Nordisk Omeract Ontario Brain Institute Ontario Centre of Innovation Ontario Institute for Cancer Research Ontario Ministry of Health Ontario Ministry of Long-Term Care Ontario Ministry of Research and Innovation Oregon Health & Science University Organon panCELLa Paralyzed Veterans of America PAREXEL Parkinson Canada Parkinson's Foundation Passage Bio Patient-Centered Outcomes Research Institute Penumbra Pfizer Pharma Two B Philips Physicians' Services Incorporated Foundation Population Health Research Insitiute Praxis Spinal Cord Institute PROCEPT BioRobotics Prodeon Medical Queen's University ResMed Foundation Revance Rick Hansen Foundation Roche Rush University Medical Center Sanofi Savoy Foundation Scientus Pharma Sharon Francis Institute Shire Shoppers Drug Mart SickKids Foundation Spinal CSF Leak Canada Spinal Research Spondyloarthritis Research and Treatment Network Stanley Medical Research Institute Stem Cell Network Steminent Biotherapeutics Stryker Sunnybrook Health Sciences Centre Systemic Lupus Erythematosus International Collaborating Clinics The Aneurysm and AVM Foundation The Foundation of the American Society of Neuroradiology The MAYDAY Fund The Michael J. Fox Foundation for Parkinson's Research The Plastic Surgery Foundation The Princess Margaret Cancer Foundation The War Amps Theranexus Theravance Toronto Metropolitan University Transport Canada UCB United States Department of Defense Unity Health Université de Sherbrooke University Medical Centre Utrecht University of Alberta University of British Columbia University of Calgary University of Guelph University of Manchester University of Ottawa University of Pennsylvania University of Texas University of Toronto University of Virginia Vertex Pharmaceuticals Weston Family Foundation Wings for Life Women's College Hospital WSIB Zenflo |
||
|
|