Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: Comprehensive Issue List & Report Thread (Feb. 24, 2023) |
Member(s) browsing this thread: gj82854 , PMH_UK |
Thread Status: Active Thread Type: Sticky Thread Total posts in this thread: 427
|
Author |
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1932 Status: Offline Project Badges: |
WCG stats have not updated yet again They are strangely lagging, most notably the "Last updated" date&time stamp in the top right of the "Overview" page is not updated around the time it used to be before this last "data center maintenance". The graph a bit down that page, as well as the project table below that are updating, if those show indeed the right numbers is something I can't tell for sure as a lot of my hosts are still busy finishing their back projects WUs (and I don't just abort those), so I can't really tell if we are back to the new normal until early next week.The BOINC update script seems to be running, but again, I can't verify the numbers for now... Ralf |
||
|
stoneageman
Advanced Cruncher UK Joined: Nov 21, 2005 Post Count: 101 Status: Offline Project Badges: |
Global Statistics Last Updated: 10/7/24 23:59:59 (UTC) [128 hour(s) ago]
----------------------------------------My contribution stats not updated at midnight yet again |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7579 Status: Offline Project Badges: |
I have noticed the "waiting to be sent" issue has re-surfaced. I hope somebody notices before it gets out of hand again.
----------------------------------------Edit: I now see the number of "pending validation" work units has started to balloon again. I am up to 450 when I normally see about 150 to 200. Edit: Somebody gave the some server a kick. My number of "pending validations" is down about 190 and the "waiting to be sent" work units have now been sent. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 2 times, last edit by Sgt.Joe at Oct 16, 2024 2:20:35 AM] |
||
|
stoneageman
Advanced Cruncher UK Joined: Nov 21, 2005 Post Count: 101 Status: Offline Project Badges: |
Global statistics Last Updated: 10/7/24 23:59:59 (UTC) [200 hour(s) ago]
----------------------------------------My contribution stats not updated at midnight again Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7579 Status: Offline Project Badges: |
Global statistics Last Updated: 10/7/24 23:59:59 (UTC) [200 hour(s) ago] My contribution stats not updated at midnight again Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago] Statistics last updated: 10/16/24 12:06:03 (UTC) [10 hour(s) ago] cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
stoneageman
Advanced Cruncher UK Joined: Nov 21, 2005 Post Count: 101 Status: Offline Project Badges: |
Global Statistics Last Updated: 10/7/24 23:59:59 (UTC) [248 hour(s) ago]
----------------------------------------My point is that for statistics to be useful, they need to be updated at consistent time intervals, which they are not currently. Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago] cheers |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1932 Status: Offline Project Badges: |
Global Statistics Last Updated: 10/7/24 23:59:59 (UTC) [248 hour(s) ago] Just as for Sgt.Joe, the stats updates work for me just fine again as well. There was about a 48h delay initially after the outage, but since then, the only thing that I noticed still amiss is that the "last updated" time stamp on the Overview page is half a day behind, the stats themselves however ARE NOT.My point is that for statistics to be useful, they need to be updated at consistent time intervals, which they are not currently. Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago] cheers Ralf |
||
|
Mad_Max
Cruncher Russia Joined: Nov 26, 2012 Post Count: 22 Status: Offline Project Badges: |
I don't see this in the list of already known Issues, so I'll report it here:
----------------------------------------I noticed that some of the problems with downloading files from WCG servers are not related to overloading the network infrastructure, but to some kind of errors in the operation of the server(s) themselves. Network overloads give "transient" (temporary) errors to which the BOINC client simply continues trying to download them later. This has been a well-known problem for a long time and already in issues list. But I want to report another kind of download errors - for some reason, the project servers periodically respond to requests with HTTP error 404 (the requested URL/file was not found). Although in fact the URL is correct, and the file is present on the server and can be downloaded. And this is observed not only at times of peak network load, but also when the servers are relatively light loaded (as, for example, right now, at the moment, after suspending the issuance of new tasks for the ARP1 sub-project). This is much more serious and worse, because the BOINC client, having received such an error (in BOINC the error code will be -224 = "permanent HTTP error"), no longer makes attempts to download this file and WU processing ends with an error. After that, a new copy of this task is sent to another participant some time later, who starts downloading all the files from the very beginning, creating an extra load on the already overloaded servers and slowing down data processing in general. Because а significant amount of time passes before issuing a new copy of the task to another participant and downloading all the necessary files, during which WU cannot be validated and removed from the main database as processed. This bloat main database, which add more load on servers and so on. An example of tasks that ended with such errors on my computers (there were more than a few dozen of such errors in last ARP1 batch, I also regularly see the same errors from my "wingmans"): https://www.worldcommunitygrid.org/contribution/results/1152839245/log https://www.worldcommunitygrid.org/contribution/results/1148651979/log https://www.worldcommunitygrid.org/contribution/results/1150928483/log https://www.worldcommunitygrid.org/contribution/results/1152329676/log An example of a log from one of the links above (because they are temporary and will soon be deleted from server after the final validation of the corresponding WU): <core_client_version>7.20.2</core_client_version> An example of how it looks in the log on the BOINC client: 16-Nov-2024 18:48:17 [World Community Grid] Started download of 1ccfb9ea14367479d330a9261f6841a6. Although the file was present on the server and was available. I checked it - I took the full download URL from the client_state.xml file of this BOINC client and opening it in the browser was able to successfully download this file. However, it has already been deleted by now, because the processing of the WU in which it was used has already been completed. But for example, you can experiment with widely distributed files, for example, even with the main logo of WCG itself. The address that can be opened in the browser: https://download.worldcommunitygrid.org/boinc/slideshow/default_00_v03.png If you open this picture and press F5 (refresh) several times, the picture loads normally several times, then suddenly the server throws a 404 error (file not found): Not Found And after a few more attempts (we continue to press F5 with pauses), it returns the picture correctly again. More examples of common (unrelated to any particular WU - its pictures from a WCG screensaver, so it should be persistent) files that periodically give a 404 error when trying to download them: https://download.worldcommunitygrid.org/boinc/slideshow/opn1_01_v01.png https://download.worldcommunitygrid.org/boinc/slideshow/mip1_04_v01.png https://download.worldcommunitygrid.org/boinc/slideshow/stat_v05.png [Edit 2 times, last edit by Mad_Max at Nov 22, 2024 6:27:14 PM] |
||
|
savas
Cruncher Joined: Sep 21, 2021 Post Count: 30 Status: Offline |
Thank you for reporting the issue.
The 404 errors related to "widely distributed files" should no longer be present, as we have isolated the problem to the newly provisioned download servers and taken them out of service while we fix the issue. Separately, we have identified in the Apache logs all instances of a 404 returned from the download servers in the last few days. In each case, the file was no longer present on the filesystem, but we will look back further and investigate these workunits in the BOINC db to isolate the issue. We will post back here with our findings and a resolution soon. |
||
|
Maxxina
Advanced Cruncher Joined: Jan 5, 2008 Post Count: 115 Status: Offline Project Badges: |
Or post them in operational status in jurisica lab page :)
|
||
|
|