Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 450
Posts: 450   Pages: 45   [ Previous Page | 25 26 27 28 29 30 31 32 33 34 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 789597 times and has 449 replies Next Thread
MarkH
Advanced Cruncher
United States of America
Joined: May 16, 2020
Post Count: 50
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
smile Re: Workunits are being sent out

Regret to report that MCM work units are crashing on start after waiting several hours for downloads to be made available. I also had to abort an ARP job that tried to complete downloads all night. I was able to run OPN1, MCM, and ARP jobs a few days ago without all the delays, and then it went really bad again. Just aborted more MCM downloads after 8 hours of waiting in an endless fallback. sad

I know Krembil is working on this entire mess, and I will try downloading again in a few days. I want to continue being in the fight against these diseases and conditions. I want the Krembil people to know that I and others know you're working hard to get everything right. You've been working for months trying your best to get things to 100% capacity and 99.99% reliability. All of us that care get it, and wish we could give you hugs/handshakes/pats on the back, buy you a beer/wine/coffee/tea, and thank you personally for all your work. love struck smile

But at some point Krembil's leadership needs to realize they need a lot more help to fix the issues throughout the WCG systems than in-house staff could possibly handle alone. They've got worse luck than I do, and that's not a good thing.
----------------------------------------
"That science of the people, by the people, for the people, shall not perish from the Earth."
[Sep 7, 2022 9:16:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

Hang in there. While there are occasional networking problems and droughts where WUs aren't available, overall, things are - at least on the work side - more or less working. I have around five devices and (with few exceptions) received a steady flow of WUs since about 10 days ago - no manual intervention required.

hm, that's interesting to read.
My box with which I crunch MCM does not need manual intervention.
The three other computers with which I crunch OPNG need massive manual intervention.
[Sep 8, 2022 7:36:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 251
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

My box with which I crunch MCM does not need manual intervention.
The three other computers with which I crunch OPNG need massive manual intervention.


My main box (Ryzen 7 1700+GTX 960 running all projects) also needs a lot of babysitting. Raspberry Pis (OPN1 only) and dual/quad-core x86 systems without a usable GPU (all projects except OPNG) need a lot less, but still need an occasional kick in the pants.

Packing up the OPNG units so they don't require zillions of file requests requests would go a long way towards giving us breathing room, since they have lots of relatively-small files inside. ARP is tougher, though, since those work units are already quite porky. If bundled up, they'd still have a huge bandwidth demand and likely a worse disk footprint while unpacking.

Of course, that also means beta-testing any such changes, so there's no short-term fix other than eliminating the bottleneck(s) in the infrastructure, wherever they may be. All I can do at this point is report. I get the feeling that we are bumping up against both bandwidth and request rate constraints.
[Sep 8, 2022 1:21:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

It seems to me that they are re-sending "old" OPNG tasks from last year sad
Why do I think so: at the very beginning of OPNG, the tasks yielded credit points between 200 and 400; lateron, this was changed to a figure between ~800 and ~1000.
Since last night, the tasks again only show some 200-400 points. So my suspicion is that just to send tasks out, they use old ones that had been sent out last year already.
So I am questioning how much sense this makes. Just to feed us we get old stuff which had been processed long time ago?
[Sep 9, 2022 2:29:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wildhagen
Veteran Cruncher
The Netherlands
Joined: Jun 5, 2009
Post Count: 728
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

Well, given the untrustworthiness and unreliability of Krembil (given their lies, broken promises and lack of communiction), it wouldn't surprise me too much if all the 'work' (not only OPNG) we are doing is pure fake work, in the name of testing,
----------------------------------------
[Edit 1 times, last edit by wildhagen at Sep 9, 2022 4:43:54 AM]
[Sep 9, 2022 4:43:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 859
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

@erich56

When OPNG came back on stream, the work units had a new target (receptor) whilst OPN1 was still working on the same targets as before the hiatus. The sudden change suggests that there was a fairly urgent need to see the results for this target. This new target seems to have required more iterations per docking, with corresponding increases in credit awarded per task.

Now it appears that the "rush job" for the new target has finished (or, at least, paused), so OPNG is back processing the same target it was working on before the hiatus! The task names will have lower batch numbers because they are part of a substantial set of batches of available data that was already in the pipeline, though they are higher than pre-hiatus.

The highest OPNG batch numbers I recorded pre-hiatus were around the 0149280 region, and these new batches for the same receptor seem to be above 0150200; there are huge numbers of possible ligands to try to dock, so it is not too surprising if there is a backlog to clear now the "rush job" seems to be over. And there's no benefit in sending out unwanted work, so unless WCG inform us otherwise I'd tend to assume it's "live" data. It's just tough luck that the dockings seem to be a lot easier to achieve, so lower credit :-)

And in the meantime, OPN1 is chugging along looking at the same target that OPNG has now gone back to.

Cheers - Al.

[Edited to point out that I had not seen wildhagen's post when I posted this. It doesn't alter what I've said, as I tend to believe folks until they are proven really untrustworthy.]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Sep 9, 2022 5:25:32 AM]
[Sep 9, 2022 5:13:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wildhagen
Veteran Cruncher
The Netherlands
Joined: Jun 5, 2009
Post Count: 728
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

@erich56

[Edited to point out that I had not seen wildhagen's post when I posted this. It doesn't alter what I've said, as I tend to believe folks until they are proven really untrustworthy.]


Normally I do the the same, but they lied and broke promises a few time too many for my liking, let alone the lack of communication. In my book, and with the frequency they are doing this, thats make a party unreliably.

You can't trust anything they say, because they won't deliver on wat they say. Repeatedly.

A classic case of 'the boy who cried wolf'-syndrome.
[Sep 9, 2022 5:45:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

After about 1 day's outage, WCG is back; however, the network problems are even worse; whereas the problems so far were only with downloads, now uploads are also affected strongly.
So all in all: it gets worse instead of better. I am questioning how come, after all these months they took for the transition from IBM to Krembil.
[Sep 10, 2022 7:41:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wildhagen
Veteran Cruncher
The Netherlands
Joined: Jun 5, 2009
Post Count: 728
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

It's near to impossible to download anything at the moment. Even with several retries, it continues to fail.

Situation seems to be worsening a lot over the last few days, and that is not even counting the expired certificate.
[Sep 10, 2022 11:21:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 859
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunits are being sent out

After about 1 day's outage, WCG is back; however, the network problems are even worse; whereas the problems so far were only with downloads, now uploads are also affected strongly.
So all in all: it gets worse instead of better. I am questioning how come, after all these months they took for the transition from IBM to Krembil.
A moment's consideration might explain why difficulties with uploading would probably appear when the system came back from a lengthy period off-line with an active user base out there. I suspect a lot of us expected that to happen - been there before, had the experience elsewhere...

Think of all those client systems waiting to upload a whole day's worth of work, putting in requests at a much higher rate than would normally be expected... And the same magnification effect applies to downloads as not only were there lots of systems out there that got cut off whilst downloading but there would now be lots more "empty" systems wanting a top-up once they had finished uploading... It looks bad now, but start saying it has got worse if it's still as bad in 24 hours time :-)

The upload/download speeds during the first couple of hours after the system came back gave a good indication of the sheer volume of traffic there must have been -- I normally see upload and download speeds 10 times better than I was getting at 08:00 UTC today! And now I've got my normal half a day's worth of work things seem to have settled down as far as I'm concerned (especially as I've turned off OPN1/OPNG for now!...)

It probably isn't realistic to expect any BOINC system to add capacity just to handle the aftermath of major unplanned outages (though some folks may think otherwise), so when a BOINC site has an "accident" the end users will see upload/download problems; I've experienced this at several other sites, and I seem to recall one or two occasions when the old WCG had issues of this type (sometimes without actual down-time!)...

Cheers - Al.

P.S. It's probably a blessing that the BOINC client won't request work if the upload queue is too big, else there would have been even more initial traffic!
[Sep 10, 2022 11:30:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 450   Pages: 45   [ Previous Page | 25 26 27 28 29 30 31 32 33 34 | Next Page ]
[ Jump to Last Post ]
Post new Thread