Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 29
Posts: 29   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1055 times and has 28 replies Next Thread
Magiceye04
Cruncher
Joined: Jul 5, 2008
Post Count: 36
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

Another week with broken download server. When will Krembil grab the phone and call a specialist for such problems?
[Nov 16, 2024 8:10:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Link64
Advanced Cruncher
Joined: Feb 19, 2021
Post Count: 116
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

Yeah, it is the best for the system, server and bandwidth when we all stop crunching
We don't need to stop crunching, but we definitely shouldn't do stupid things, which generate more load on the servers without being useful in any way. This applies in particular to people aborting all work units or even detaching and reataching and trying to get other work units like that would change anything.
----------------------------------------

[Nov 16, 2024 10:14:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1928
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

@CrystalPellet & @Link64

First of all, let me apologize for a somewhat belated reply. I have been working 17-18h days all week long.

Second, I want to apologize for my assertion that the file with that random file name was NOT part of MCM1. It apparently is. But...

While busy at work, I had even less time to baby sit my hosts than I usually have. On Thursday (don't recall the time of the day), I had four (4) hosts run out of WCG work, due to a (couple of) results stuck being uploaded, hence no report, hence no new WUs been able to downloaded.
On one (1) of these hosts, when kicking the tire to get a stuck upload going again, I noticed that 4x MB MCM1 related file being download and not only those 912/915 byte sized MCM1 WU files. One time, on one host. Not on any of the other stuck hosts, nor on any of previous times where MCM1 downloads had been stuck. Not on any of the 590,000 MCM1 WUs that I have returned ever since MCM1 first started. The only time I have seen a file of this size being download was when I had added MCM1 to a new host, like a new laptop of mine a couple of weeks ago, as part of the initial set of files when the first WUs are being downloaded. Or a couple of times when WCG was down for a prolonged period of time since Krembil took over, usually more than 24h continuously.

And all of this time, even after the move to Toronto, the size of this file has never been a problem. I don't know how long this file is "valid" and MCM1 WUs are referring to it, but this is in my experience/observation that this is likely for at least a couple of months.
It is definitely NOT the case that this being downloaded excessively when you just run out of work, rather the result of people likely resetting the project or de-attaching/re-attaching, which is rather questionable what this would fix in the first place. And hence by objection that making this file "sticky" doesn't really help to alleviate the current (bad) situation.

All the current problems with downloads, and apparently since some time yesterday (Friday, my time zone), uploads are due to the very nature of the ARP1 project. And this was already the obvious case 2 years back, when the same kind of symptoms, stuck downloads (don't recall excessive upload issues back then) appeared at the same time when new ARP1 WUs were released together with a huge number of rather short OPNG WUs. It certainly is not a MCM1 problem.

It is (well, should have been) clear from the onset that ARP1 is vastly different from all the other projects on WCG within the last 3-4 years, even when this was still run under IBM's auspice. From the start, it has been made clear that this project has serious resource requirements, not only for the hosts that are trying to crunch it, but for the overall infrastructure at WCG as well. And it is that later issue that unfortunately a lot of folks, which is clearly evident by them posting about, simply don't care, just ignoring those restrictions. It is their selfish attitude, that isn't like to CAUSE the general problem, but definitely contributing to make a bad situation worse. And apparently, some of those folks are simply not willing in their ignorance to see this and do their part to try and ease the overall situation, for everyone..


Ralf
----------------------------------------

[Nov 16, 2024 7:09:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7545
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

It is readily apparent from the total work units returned that many hosts are not getting enough work units to crunch and when they are crunched, are hung on the upload portion. And this slowdown has happened with the most ARP units being returned in the 6,000 range.
Perhaps, until the infrastructure at the hosting site can actually handle the traffic and bandwidth requirements, the number of ARP work units being released should be rationed. They could start with a level of 60 per hour and work their way up to what ever level does not bog the system down. This would be one per minute which should hamper anyone from hoarding or stockpiling too many. If the connection from the download is terminated after the the one download of ARP and the next try from that IP address would go to the back of the queue and wait for its next turn, it would also probably free up enough bandwidth to alleviate the problems with MCM downloads and uploads.
This would probably involve some tweaking on Krembil's part, but it would probably alleviate a lot of user frustration.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 16, 2024 11:56:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
merboy
Cruncher
CA
Joined: Nov 17, 2004
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

Everyone should switch to 'No New Tasks' until this is fixed, seriously they dont deserve any of our efforts until this server side garbage is resolved. Again. for the 82nd time since conversion. smh.
[Nov 17, 2024 11:38:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Craig S.
Cruncher
Joined: Nov 19, 2005
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

It's either a load-balancer issue (Eg:if you hit retry a lot it'll eventually go.) or a storage issue on their back end either way the fact that this has occurred before,and or that no one on their administrative side either knows about the problem or can fix it tells me that their IT staff are lackadaisical or incompetent or both.

[redacted] I could half [redacted] their infrastructure better than they are doing at present.

at-least I know how to run an enterprise class data-center.
----------------------------------------
[Edit 1 times, last edit by savas at Nov 19, 2024 10:39:57 PM]
[Nov 19, 2024 10:29:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1928
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

It's either a load-balancer issue (Eg:if you hit retry a lot it'll eventually go.) or a storage issue on their back end either way the fact that this has occurred before,and or that no one on their administrative side either knows about the problem or can fix it tells me that their IT staff are lackadaisical or incompetent or both.

[redacted] I could half [redacted] their infrastructure better than they are doing at present.

at-least I know how to run an enterprise class data-center.
The problem is that they do not have the financial backing for an enterprise class data center setup. That includes simply not have the financial means to have 24/7 staff on hand, which results in rather frustrating weekends. And right now, they actually don't have a "communications intern" like the last two years, so at times the previous two weren't very responsive either...


Ralf
----------------------------------------

[Nov 19, 2024 11:40:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric Pohlke
Cruncher
Canada
Joined: Feb 4, 2006
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

My WU started getting deferred last week around Nov 15. Yesterday, my twin EPYCs were waiting on uploading results for over 620 WU which will reach expiry in 6 hours.
The problem started when the last director took on the World Community Grid Project at Krembil and was excited as they heard and saw a lot of potential for free medical research crunching. However, the new Director (Jaideep Bains) is not so inclined with the project and fails to realize its potential and thus has cut funding to it. It's a real shame. As some clients are not just using a cell phone or laptop, but have a small data center server to handle impressive workloads at great speeds. But yet, there's only 1 or 2 assigned to the project on shift and only there a few days a week for a few hours. Weekends, Holidays, forget it.
It's a real shame that the institute, that specializes in Medical research doesn't see this project's full potential when working.
[Nov 20, 2024 2:39:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric Pohlke
Cruncher
Canada
Joined: Feb 4, 2006
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unable to download tasks for a week

In their annual reports, there is nothing about the World Community Grid Project.
Yet their financial Sponsor list is huge.

36 37
Sponsors
Abbott
AbbVie
Aerie
AGE-WELL
American Academy of Neurology
American Foundation for Surgery of the Hand
American Medical Systems
American Society of Regional Anesthesia and
Pain Medicine
Amgen
Anavex
AOSpine
Aria Pharmaceuticals
Arthritis Research Foundation
Arthritis Society Canada
AstraZeneca
Atuka
Aurinia Pharmaceuticals
Autism Speaks
Avicanna
Avir
Axoltis
Azure
Banting Research Foundation
Bayer
Biogen
Bioness
BlueRock Therapeutics
Boston Scientific
Brain Aneurysm Foundation of Canada
Brain Canada
Bright Focus Foundation
Bristol Myers Squibb
Canada Research Chairs
Canadian Blood Services
Canadian Cancer Society
Canadian Initiative for Outcomes in
Rheumatology cAre
Canadian Institutes of Health Research
Canadian Pain Society
Canadian Rheumatology Association
Canadian Stroke Consortium
CannScience Innovations
CDLK5 Canada
Celgene
Celixir
CENTOGENE
Centre for Addiction and Mental Health
Centre hospitalier de l'Université de Montréal
Cerenovus
Cerevel
Cervical Spine Research Society
Christopher & Dana Reeve Foundation
Competitive Drug Development
Corindus
CorNeat Vision
Cure PSP
Dravet Syndrome Foundation
Dystonia Medical Research Foundation
EBT Medical
Eli Lilly
Endogena Therapeutics
Epilepsy Canada
Epygenix Therapeutics
ES Therapeutics
Fighting Blindness Canada
Fisher & Paykel Healthcare
Fresenius Kabi
Functional Neuromodulation
Fusmobile
GE Canada
Genentech
Gilead Sciences
Glaucoma Research Society of Canada
GlaxoSmithKline
GRAPPA
Green Valley
Harrington Discovery Institute
Healios
Health Canada
Heart and Stroke Foundation of Canada
HebeCell
InnoCentive
INSIGHTEC
Insmed
International Parkinson and Movement
Disorders Society
International Spine Study Group Foundation
Iqvia
Jaeb Center for Health Research
Janssen
Johns Hopkins University
Lahey Clinic Foundation
Lawson Health Research Institute
LifeArc
Lucid
Lung Health Foundation
Lupus Canada
Lupus Foundation of America
Lupus Ontario
Lupus Research Alliance
Massachusetts General Hospital
External Sponsors McMaster University
Medical Decision Modeling
Medpace
Medtronic
Merck
Milken Institute
Mount Sinai Hospital
Multiple System Atrophy Coalition
National Institutes of Health
National Organization for Rare Disorders
National Psoriasis Foundation
Natural Sciences and Engineering Research
Council of Canada
Neuraly
New Frontiers in Research Fund
Novartis
Novo Nordisk
Omeract
Ontario Brain Institute
Ontario Centre of Innovation
Ontario Institute for Cancer Research
Ontario Ministry of Health
Ontario Ministry of Long-Term Care
Ontario Ministry of Research and Innovation
Oregon Health & Science University
Organon
panCELLa
Paralyzed Veterans of America
PAREXEL
Parkinson Canada
Parkinson's Foundation
Passage Bio
Patient-Centered Outcomes Research Institute
Penumbra
Pfizer
Pharma Two B
Philips
Physicians' Services Incorporated Foundation
Population Health Research Insitiute
Praxis Spinal Cord Institute
PROCEPT BioRobotics
Prodeon Medical
Queen's University
ResMed Foundation
Revance
Rick Hansen Foundation
Roche
Rush University Medical Center
Sanofi
Savoy Foundation
Scientus Pharma
Sharon Francis Institute
Shire
Shoppers Drug Mart
SickKids Foundation
Spinal CSF Leak Canada
Spinal Research
Spondyloarthritis Research and Treatment
Network
Stanley Medical Research Institute
Stem Cell Network
Steminent Biotherapeutics
Stryker
Sunnybrook Health Sciences Centre
Systemic Lupus Erythematosus International
Collaborating Clinics
The Aneurysm and AVM Foundation
The Foundation of the American Society of
Neuroradiology
The MAYDAY Fund
The Michael J. Fox Foundation for Parkinson's
Research
The Plastic Surgery Foundation
The Princess Margaret Cancer Foundation
The War Amps
Theranexus
Theravance
Toronto Metropolitan University
Transport Canada
UCB
United States Department of Defense
Unity Health
Université de Sherbrooke
University Medical Centre Utrecht
University of Alberta
University of British Columbia
University of Calgary
University of Guelph
University of Manchester
University of Ottawa
University of Pennsylvania
University of Texas
University of Toronto
University of Virginia
Vertex Pharmaceuticals
Weston Family Foundation
Wings for Life
Women's College Hospital
WSIB
Zenflo
[Nov 20, 2024 2:45:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 29   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread