| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 19
|
|
| Author |
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Novel, to me. But forums, result data fetches fail here in unpredictable ways since about when the US woke up (I hope). The rest of intertubes is 'performant'.
----------------------------------------Edit: Marked [RESOLVED], see post knreed below discussing mass-mailing issue. [Edit 1 times, last edit by SekeRob* at Jul 11, 2017 12:34:50 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I just had this too, both on the main site and on the forums, at the same time. But it's definitely intermittent.
Load balancer? Hardware being recycled mid-session? |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
There's much (over) load going on, the quickest place to checks is at the Who's active page... it's continuously wiped, no one signed in:
There are 3 Recently Active Users (3 Guests, 0 Member ) User What they are doing Duration since last activity Total time since log in Guest Get RSS Feed 0 minute 0 minute Guest View all recently active users 0 minute 0 minute Guest View forum index 0 minute 0 minute |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
So a long explanation. As part of moving to the IBM Cloud we have started using the email delivery offering (http://www.softlayer.com/email-delivery) that is available. Although we had a pretty good setup at our old environment, setting up SPF, DKIM, encryption for sending, processing bounces, etc is really not what we want to spend our time working on so we have stopped doing it ourselves and we are using an external service now.
The old environment worked pretty well and when we sent mass emails like the one we sent today in the old environment we would identify around 0.5% or so of emails as "bounced". Once an email bounces we disable sending emails to those accounts in the future. We also processed unsubscribe requests (this is all part of maintaining good email lists and being respectful of your users). However, we had custom built the method to identify bounced messages and so we missed some cases. Our new email delivery system is not missing those. As a result, the new system is identifying a lot more "bounced" emails than our old system did. The new email delivery system notifies us of a bounced email by calling an API on our server for each bounced email address. Due to the much better infrastructure that we have now, we sent out the first 20,000 emails in about 3 minutes. This resulted in a surge in "bounced" email API call back to our servers at a rate that exceeded our expectations and resulted in the site going offline temporarily. We spent the next couple of hours scaling up the API and letting the backlog of bounces get processed. Once those two tasks were done we were able to resume sending the emails at a more nuanced pace and the site has been behaving normally since then. Since we will have processed the backlog of bounces after this sending, we will be able to scale up our sending next time because our API is better prepared to handle the load and there will be fewer bounces generated by the sending of the email. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks, Kevin!
That's the sort of detailed answer that makes me feel that I'm being kept "in the loop". |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
Thanks, Kevin! me too That's the sort of detailed answer that makes me feel that I'm being kept "in the loop". ![]() ![]() |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Marked this thread, mistakenly, as resolved... it's not from the API result fetch side.... every 20th page or so a time-out pop-up appears to tell where and when it failed in the loop asking if I want to give the page another chance.
When this started: Kind of when the US woke up. :| |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
@SekeRob
What page are you trying to access? If you are using Chrome can you do the following:
Then access the website and recreate the issue. Once it has captured it, can you expand it to full screen, take a screen shot and send it to support@worldcommunitygrid.org? Also - if you could send a screen shot of the pop up as well that would be great. I think what you describe is different then the 503 issue. |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
It's my own designed pop-up in response to a -21nnnnnn with a continue or cancel option, one time only, if it then fails again, the procedure exits.
----------------------------------------Office only knows Internet Explorer, before, middle, after. The URL, which is dynamically changed with an incremented offset= https://www.worldcommunitygrid.org/api/member...me&Modtime=1498671859 Yes this was a belly up at page 9 Is it coincidence that the workday ended somewhere in the US, and the problem goes away? It's my bedtime now... if it shows up tomorrow, you'll be the first to hear my lament. ![]() Edit: No the final test went into Debug mode at 09:44 UTC: Runtime error '-2147217376 (80041020)': XML Parse Error [Edit 1 times, last edit by SekeRob* at Jul 11, 2017 9:17:49 PM] |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Mid-night visit to the fridge, to get a cold glass of water, and peek into cruncher and sure enough, the message box was waiting on a reply to continue or break off, at page 1 of 1234 (x250 results). This was about 1:35 UTC and went shuteye again. All had ran through this morning. Then in the hunt for the missed, an hour runtime is long to cause lots of changes ad interim, let alone the 1.5 hours of added stall, and again, page 1 hung, then ran through to the end.
As statistically it is happening most often at page 1, as if someone needs waking up to load a reel in the tape room, now programmed an added sleep of 30 seconds to try same page-offset again and ping when it's daytime, one time only, be silent when at night. In a loop, do until, max 5 times, then the message box. Far as I know, the Results Status page data remains accessible 24/24, no lock out periods. Previously this occurred very occasional, which is why a msg box was set, to give me a chance to check the intertubes, but the ISP has been very stable. Anyway, now the backoff loop is 2.5 minutes. |
||
|
|
|