Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 29
|
![]() |
Author |
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@ Mark Reiss
----------------------------------------I've seen nothing to indicate a problem with the team update at 00:01 Mar 25. The update at noon, which was initially missing but has now been corrected, would have had no impact on team stats. Can you be more specific when you say the team update did not !!! and give an example of the team stats that were not updated in order that Keith Uplinger knows what he's looking for? [edit] uplinger got his reply in first. ![]() ---------------------------------------- [Edit 1 times, last edit by jonnieb-uk at Mar 26, 2015 12:02:21 AM] |
||
|
[CSF] Thomas Dupont
Veteran Cruncher Joined: Aug 25, 2013 Post Count: 685 Status: Offline |
Thanks for letting me know about the stats. Users "have" to inform the staff about this?! ![]() It's not the first time that I read comments by users who inform about missing/outdated stats (and the staff seems not to be aware of this). When we notice an anomaly which was not notified on the WCG forum, it's important to submit it. It does not mean that the WCG staff has not already seen it. It's just that we are passed on the information. A server outage can influence many parameters. Some volunteers (like me and many others who publish in this thread) see this as mutual aid. And the WCG staff is very reactive, I can assure you. I participated in many other distributed computing projects in the past and I can assure you that the WCG staff is very effective and very responsive compared to other. BTW, thanks Keith for the stats ![]() |
||
|
pcwr
Ace Cruncher England Joined: Sep 17, 2005 Post Count: 10903 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Majority of my returned WUs since the crash have gone to P Ver status.
----------------------------------------OET1 and CEP2. regards, Patrick ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Had the Linux client, which had -sans exception- all OET results [zero redundant project], passing through 'Pending Verification'... like 58, even though there was no single error logged. Then in the afternoon, before website restore, it had one task with error 214 [could not unzip task input file], and the PVer continued for another 30 or so, which is when enough validations of PVer's had collected to go single again. The second [Windows] host running OET did not experience this. Both run CEP2 on 1 core and saw only 1 turning PVer... unaffected it seems.
The PVer series kicked off just before the website went belly up [coincidence], see not how this could affect the BOINC part which continued, bar a few 3600 second back-offs in the morning to report results. For ref., this Linux host had 162 valid yesterday i.e. is quick to get back to get the 20 odd serial valid to go it alone again. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
On 'root cause', of course a secret, but did notice few days ago the forum had [never seen before] over 3800 guests and just under 40 logged in members. If all those lines are committed, could see something is giving.
|
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sek,
We are still trying to get a firm answer as to the root cause of the outage, but from my understanding the load on the database was not the issue. It was a hardware issue. I fear it was bad physical memory in the device. Without getting a solid answer as to what it was, we are still running off a redundant system as more tests will happen on the problem device today. As for your OET1 having lots of Pending validations, about 12 hours before the main outage, the validator for OET1 was falling behind. This runs off a different server than what failed. But as of this moment, I do not see anything being outstanding. Please let me know if you are still seeing lots of pending validations for OET1. Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Kay, it's Pending Verification, meaning an extra copy was being send out after return of the original _0, by the boatload, same as for pcwr. This occurred -during- the outage.
Hardware failure... well no amount of guest lines could be accountable for that, but it was something odd. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Because the forum was poorly performing, looked at the whoosonline and saw over 1000, then later looked again and saw 1479 and then it was poof, maintenance page.
Coincidences or more hardware burnouts? |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sek,
This was not due to counts on the forum. Earlier in the week we encountered a hardware problem. When that server came back up, it mounted a filesystem that it was not supposed to. This was just recently discovered and needed to be corrected quickly as it had the potential for data loss. There was NO data loss, but the potential for it forced me to act quickly without scheduling anything. Sorry about the lack of notification, but I was hopeful the down time was only going to be a few minutes, but ended up taking longer than I expected. Thanks, -Uplinger |
||
|
|
![]() |