| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 352
|
|
| Author |
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2494 Status: Offline Project Badges:
|
Blame me, it must be my fault. I've been on a break from WCG for a month, not running anything. Today though, I decided to download a bunch of work, which I managed to do without any problems. A few hours later though, the problem started.
So, I guess I broke the system.... Sorry!! ![]() |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1293 Status: Offline Project Badges:
|
Thanks for the detailed info Al.
----------------------------------------Thank you Grumpy for taking responsibility for your actions ![]() [Edit 1 times, last edit by Unixchick at Jun 28, 2025 10:59:02 PM] |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
I have turned off one machine with 8 threads because it ran out of ARP and MCM. I would normally switch to a back-up project but it is hot here.
My other 8 thread machine is down to 3 ARP and no MCM. I am continuing this machine so as to know when to switch the other one back on and to complete the last 3 ARP. Mike |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2494 Status: Offline Project Badges:
|
No news on their FB page either. The last update there, is from April. Maybe they all went on a 5 weeks summer vacation
![]() |
||
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Offline Project Badges:
|
... hmmmm pondering the situation:
What, if anything, should we tell the volunteers? 1.) nothing. 2.) We are working on the problem. will fill you in on the details when resolved. ETA: _______. 3.) We are using this issue as a catalyst to migrate to the new configuration which will improve the reliability of the system. We will fill you in on the details when completed. ETA: _____. Please be patient. As you can probably surmise, lack of transparency is like a vacuum - it sucks the motivation and good will of the volunteers right out of the picture. Sad, but true. |
||
|
|
GB033533
Senior Cruncher UK Joined: Dec 8, 2004 Post Count: 206 Status: Offline Project Badges:
|
My last two work units started getting this error a few hours ago;
----------------------------------------Task MCM1_0235471_5882_0 postponed for 600 seconds: Waiting to acquire lock which repeated every 10 minutes, and then finally ended in 'Computation error'. I'd never seen this behaviour before. Has anyone else experienced this? Now dry like everyone else, waiting to report completed work units and for the issues to be resolved. Ho-hum. ![]() |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1293 Status: Offline Project Badges:
|
I'm down to my last few ARP WUs. I'm grateful we had a good burst of good flow of WUs before this breakdown.
I think the techs have been patching an old system for as long as they can. We know the new system is coming soon. There might be a limit to what time and resources can do. The techs have been very good lately getting systems back up and going, so something else is going on this weekend. They have been working hard to improve the software system they were given, and soon they will get some new hardware. It is amazing what they have done considering their funds not being where they need. I'm personally glad they have put their resources into tech not communication. That said, I do hope we get an update soon. It is hot where I am too, so I'll give my system a rest once I've finished my queue in about 6 hours, and hope that it is an easy fix for the techs on Monday morning. |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
One effect of whatever has happened this time is that there doesn't seem to have been the last midnight stats run or today's midday stats run (or the web site can't find it...) -- the ARP1 project results page has been stuck at the "statistics are being updated" holding page for over 12 hours, reporting that it should be finished in 1 minute! Yes, right...
![]() Also, at the moment (15:40 UTC) the three ARP1 status files are all empty. On one or two other occasions failure to run the relevant scripts left the old versions behind but in this case it remains to be seen whether those will acquire content later in the day (as has happened a couple of times in the past...). If that doesn't happen, let's hope things are resolved by the time Monday's status files are due... Cheers - Al. |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1293 Status: Offline Project Badges:
|
Interesting observations Al.
----------------------------------------We were expecting some maintenance time around early July... and here we are. Could this be a moving of the files and database to a new system?? Truthfully I was expecting an announcement of the downtime and upgrade, but as I said in an earlier email, while I really really like the info, the time is better spent on doing. Once we have a nice new shiny system, then they can put some effort into PR and recruiting some new people. [Edit 1 times, last edit by Unixchick at Jun 29, 2025 3:54:43 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Could this be a moving of the files and database to a new system?? Unfortunately, I suspect it's just a crash that they've not been able to service for one reason or another.If they need to migrate file store (rather than just moving the server nodes to new hardware) I'd have thought they'd want to do a controlled shutdown akin to those for the IBM->Krembil migration and the recent Christmas->New Year data centre shutdown. New WCG was fairly good about announcing the latter, so I'd expect the same again... As it is there are probably hundreds of thousands of MCM1 results on the upload servers waiting to be reported -- not a good scenario if moving stuff around... Ah, well, we will presumably find out on Monday (evening by my U.K. clock!) Oh, and as I used to be a tech troubleshooter (amongst other duties) and often working solo, I agree with the idea that sometimes it's more important to be doing than to be talking about it Cheers - Al. P.S. That ongoing "The statistics update will finish in about 1 minutes" appeals to my strange sense of humour ![]() [Edit 2 times, last edit by alanb1951 at Jun 29, 2025 8:10:49 PM] |
||
|
|
|