Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 387
Posts: 387   Pages: 39   [ Previous Page | 19 20 21 22 23 24 25 26 27 28 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 48865 times and has 386 replies Next Thread
nekomi_ch
Cruncher
Joined: Apr 23, 2024
Post Count: 18
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Server can't open database

Yeah something definitely has gone wrong
[Apr 26, 2025 10:20:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

From around 10:00 to 12:05 UTC it was "can't open database" (uploads still working though). As seems to be common recently, it then took a couple of hours to move on, and at 12:20 UTC it was "feeder not running" -- I wonder if that's a built-in "I give up!" time-out...

I do hope there's no correlation between users killing off masses of MAM1 Beta tests (as per multiple threads in the Beta Tests forum) and the database problem :-)

[Edited to add...] For ARP1 progress watchers -- the midday stats run hasn't happened (no surprise there) and as at 14:20 UTC the three text files are empty; whether those will actually get filled when the database comes back remains to be seen.

Cheers - Al.
----------------------------------------
[Edit 2 times, last edit by alanb1951 at Apr 26, 2025 2:32:06 PM]
[Apr 26, 2025 1:58:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks to Grumpy for calling it early and for everyone else who reported what they are seeing. We did indeed go "Boom"

"Feeder not running" error

Reminder: This is a good time for machine maintenance. Do you need a system update? time to get the dust out of the box? does it just need a good "turning it off and on again" ?

If all is in good order, then it is time to look at backup projects.
----------------------------------------
[Edit 1 times, last edit by Unixchick at Apr 26, 2025 3:40:58 PM]
[Apr 26, 2025 3:01:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 984
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Web sites seems to work again!
See latest bupdate from Jurisica Lab 🙂

Updates of done Wus still struggle!

Hans S.
[Apr 26, 2025 6:23:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thank you Dylan ! Nice to see the WUs flowing again. It will take a while for the system to catch up.
[Apr 26, 2025 6:32:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 786
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

April 26, 2025

12:40 ET - WCG database crashed. Dylan is rebooting VM and hopefully we can recover without Sharcnet sys admins.
Server is up. Assuming DB crash recovery goes well, we should be back online in 1-2 hours.
BOINC database crash recovery was successful; service has been restored: DB recovered and restarted, all BOINC daemons and feeder have been restarted; reset the state of the scheduler coordinator and confirmed job submissions are working, verified download of new MCM1 and ARP1 workunits.
MAM1 Beta Batches w/ Unacceptable Error Rate and Runtime: A small 4-work units batch (MAM1_9800035) to confirm the fix and a larger 100 work units batch (MAM1_9800036) were released yesterday night and this morning, respectively. So far no errors have been returned from either batch.
Reported Missing Beta Results; Lack of Results on Community Stats Page: Between 2025-04-24 and 2025-04-25, several beta batches were released for Linux/Windows MAM1 application version 7.04, numbered between 9800000-9800026. The intent was to vary parameters to optimize the quality of signatures returned based on previous runs of MCM1 and our local testing. Unfortunately, these batches revealed multiple issues and in some cases exploded the runtime as many volunteers have reported in the forums. We have updated the workunit records in the BOINC database on 2025-04-25 for all these workunits in this range of batches to hopefully prevent resends and Server Abort any BETA workunits that were awaiting a free slot to begin execution on BOINC clients. For those who continue to run these long running workunits, we will monitor those that remain outstanding and continue to extend deadlines.
New MAM1 Beta Batches 9800027-9800031 Released based on Low/No Error Rate Batches (9800001, 980008): We plotted outcomes for the 100-1000 workunit batches between 9800000-9800026, and released a series of batches 2025-04-25 from 9800027 onward that are based on those with no or low error rate. For these we have varied the length of the signature, and number of iterations, but left the model parameters known to be stable untouched. We will continue tuning the MAM1 model settings for the new dataset based on these preliminary distributions of signatures being returned by beta testers, and strictly confirm resource requirements and runtime of any changes to model parameters in the workunit settings file in all future MAM1 beta batches.
Fixing OOM after suspending or power-cycling the machine running the BOINC client: We will fix these issues in version 7.05 of the MAM1 application next week.
----------------------------------------
Paul.
[Apr 26, 2025 9:50:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks for the update!
[Apr 26, 2025 10:28:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks for the update.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Apr 26, 2025 10:36:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Boca Raton Community HS
Senior Cruncher
Joined: Aug 27, 2021
Post Count: 209
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Great update! Looks like we have some of the 9800027 onward on our systems so we will see how they process.
[Apr 27, 2025 12:05:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thank you for the update and for Dylan and/or savas and/or others in getting systems back up as well as fine-tuning MAM1 beta! Hope you guys have a wonderful weekend without more work. Much appreciated.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Apr 27, 2025 2:36:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 387   Pages: 39   [ Previous Page | 19 20 21 22 23 24 25 26 27 28 | Next Page ]
[ Jump to Last Post ]
Post new Thread