Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: 2023-04-06 Update (WU Distribution Update) |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 51
|
Author |
|
Cyclops
Senior Cruncher Joined: Jun 13, 2022 Post Count: 295 Status: Offline |
WU Distribution Update
We are working towards resuming a consistent WU supply similar to what we had before the storage system failure. The recent sparsity of OPN1 WU was caused by a batch that has blocked the create-work process for all other projects. We have found and fixed the glitch, and the system is busy creating work for OPN1 right now. We still have an ARP1 backlog of unsent results (see ARP project update ), but we now have a spare capacity for a larger backlog. After OPN1 work units are prepared, the system will prepare ARP1 work units. On the back end, we still had to finalize setup of the new storage as there was a networking issue that was preventing us from accessing the tape archive. Data center admins have helped to fix it, and the production system on the new storage is being backed up. We continue to investigate the errors in the BOINC system services, specifically assimilators and validators. Unfortunately, the application is written such that an unexpected error halts the service (which happened when our storage system failed). We are attempting to clear out the problematic data to allow the applications to continue processing other results, but BOINC doesn't seem to have an easy method of flushing specific workunits or results out of its system. If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding. WCG team |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1265 Status: Recently Active Project Badges: |
Thank you for the update. I have 6 pages of Opn1 (CPU tasks) pending verification, returned on 4/3 or 4/4. Hopefully these will clear in the coming day or 2
----------------------------------------Here are a few of the task names that I have "pending verification" OPN1_0128917_01594_0 OPN1_0128917_01584_0 OPN1_0128917_01589_0 Update work has started to be Verified. Keep up the great work is much appreciated [Edit 3 times, last edit by Speedy51 at Apr 7, 2023 9:26:44 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12146 Status: Offline Project Badges: |
I seem to have a problem with priority units. They are showing a correct deadline of +3 days but are not being crunched earlier than as if they were on +6 days. This has been occurring on MCM1 & OPN1. I am connected to 7.20.2.
Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12146 Status: Offline Project Badges: |
I have my cache set to 5+1 days. With an 8-thread machine that should result in at least 40 CPU days work.
However, I only have 12 days of ARP, 7 hours OPN and 3 hours MCM but my event log says not requesting tasks:don't need! I have app_config set to a maximum of 4 ARP, 2 OPN & 2 MCM. That means 3 days work for my ARP threads, only 2 OPN units for another 2.5 hours and only 1 MCM unit for another 80 minutes. I am crunching version 7.20.2 with Windows 7. Mike |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1316 Status: Offline Project Badges: |
I have my cache set to 5+1 days. With an 8-thread machine that should result in at least 40 CPU days work. This is only true, if that machine was running 24/7 for a long period, before asking more work.However, I only have 12 days of ARP, 7 hours OPN and 3 hours MCM but my event log says not requesting tasks:don't need! You could check in client_state.xml, whether in time_stats the on_frac value is near 1 or much lower. |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12146 Status: Offline Project Badges: |
I have my cache set to 5+1 days. With an 8-thread machine that should result in at least 40 CPU days work. This is only true, if that machine was running 24/7 for a long period, before asking more work.However, I only have 12 days of ARP, 7 hours OPN and 3 hours MCM but my event log says not requesting tasks:don't need! You could check in client_state.xml, whether in time_stats the on_frac value is near 1 or much lower.<on_frac>0.973955</on_frac> <connected_frac>0.999969</connected_frac> <cpu_and_network_available_frac>0.999966 Thanks Mike |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2092 Status: Offline Project Badges: |
Our old "foe", "transient HTTP error", is back with a vengeance. I think they started sending out OPNG tasks, at the same time as ARP1 tasks are being sent out. That hasn't worked before, and it sure doesn't work now.
----------------------------------------[Edit 2 times, last edit by Grumpy Swede at Apr 8, 2023 9:06:14 AM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2089 Status: Offline Project Badges: |
Well, Grumpy Swede, although I'm seeing "transient HTTP error", too, the problem is not downloading tasks, but it is uploading them instead. Will post back when the problem is resolved for me or when my upload queue is empty again.
----------------------------------------Anyway, the problems are transient, so there's hope. UPDATE: The problem started for me at 08:14 UTC today: 08-Apr-2023 10:14:16 [World Community Grid] Temporarily failed upload of OPN1_0129309_01811_0_r2116729635_0: transient HTTP error Sometimes an upload succeeds. The upload speed of ARP1-files is nearly abysmally slooooow: $ wcgresults -X Adri [Edit 2 times, last edit by adriverhoef at Apr 8, 2023 10:50:20 AM] |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 450 Status: Offline Project Badges: |
It's not just you, GS, I'm getting it when I try to upload OPN work.
|
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2092 Status: Offline Project Badges: |
@adriverhoef
Yes, it's uploading I was talking about. I failed to say that. Downloading is pretty OK here. The last time we had the same transient crap, it was for both downloading and uploading. |
||
|
|