Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
Member(s) browsing this thread: gj82854 , PMH_UK
Thread Status: Active
Thread Type: Sticky Thread
Total posts in this thread: 427
Posts: 427   Pages: 43   [ Previous Page | 34 35 36 37 38 39 40 41 42 43 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 625081 times and has 426 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1932
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

WCG stats have not updated yet again crying
They are strangely lagging, most notably the "Last updated" date&time stamp in the top right of the "Overview" page is not updated around the time it used to be before this last "data center maintenance". The graph a bit down that page, as well as the project table below that are updating, if those show indeed the right numbers is something I can't tell for sure as a lot of my hosts are still busy finishing their back projects WUs (and I don't just abort those), so I can't really tell if we are back to the new normal until early next week.
The BOINC update script seems to be running, but again, I can't verify the numbers for now...

Ralf
----------------------------------------

[Oct 12, 2024 4:16:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
stoneageman
Advanced Cruncher
UK
Joined: Nov 21, 2005
Post Count: 101
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
angry Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Global Statistics Last Updated: 10/7/24 23:59:59 (UTC) [128 hour(s) ago]

My contribution stats not updated at midnight yet again
----------------------------------------
[Oct 13, 2024 8:09:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7579
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

I have noticed the "waiting to be sent" issue has re-surfaced. I hope somebody notices before it gets out of hand again.

Edit: I now see the number of "pending validation" work units has started to balloon again. I am up to 450 when I normally see about 150 to 200.

Edit: Somebody gave the some server a kick. My number of "pending validations" is down about 190 and the "waiting to be sent" work units have now been sent.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 2 times, last edit by Sgt.Joe at Oct 16, 2024 2:20:35 AM]
[Oct 14, 2024 11:46:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
stoneageman
Advanced Cruncher
UK
Joined: Nov 21, 2005
Post Count: 101
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Global statistics Last Updated: 10/7/24 23:59:59 (UTC) [200 hour(s) ago]

My contribution stats not updated at midnight again
Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago]
----------------------------------------
[Oct 16, 2024 8:34:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7579
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Global statistics Last Updated: 10/7/24 23:59:59 (UTC) [200 hour(s) ago]

My contribution stats not updated at midnight again
Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago]


Statistics last updated: 10/16/24 12:06:03 (UTC) [10 hour(s) ago]

cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Oct 16, 2024 10:17:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
stoneageman
Advanced Cruncher
UK
Joined: Nov 21, 2005
Post Count: 101
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Global Statistics Last Updated: 10/7/24 23:59:59 (UTC) [248 hour(s) ago]

My point is that for statistics to be useful, they need to be updated at consistent time intervals, which they are not currently.

Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago]

cheers
----------------------------------------
[Oct 18, 2024 8:07:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1932
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Global Statistics Last Updated: 10/7/24 23:59:59 (UTC) [248 hour(s) ago]

My point is that for statistics to be useful, they need to be updated at consistent time intervals, which they are not currently.

Statistics last updated: 10/14/24 23:59:59 (UTC) [32 hour(s) ago]

cheers
Just as for Sgt.Joe, the stats updates work for me just fine again as well. There was about a 48h delay initially after the outage, but since then, the only thing that I noticed still amiss is that the "last updated" time stamp on the Overview page is half a day behind, the stats themselves however ARE NOT.


Ralf
----------------------------------------

[Oct 18, 2024 2:55:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mad_Max
Cruncher
Russia
Joined: Nov 26, 2012
Post Count: 22
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

I don't see this in the list of already known Issues, so I'll report it here:

I noticed that some of the problems with downloading files from WCG servers are not related to overloading the network infrastructure, but to some kind of errors in the operation of the server(s) themselves.
Network overloads give "transient" (temporary) errors to which the BOINC client simply continues trying to download them later.
This has been a well-known problem for a long time and already in issues list.

But I want to report another kind of download errors - for some reason, the project servers periodically respond to requests with HTTP error 404 (the requested URL/file was not found).
Although in fact the URL is correct, and the file is present on the server and can be downloaded.

And this is observed not only at times of peak network load, but also when the servers are relatively light loaded (as, for example, right now, at the moment, after suspending the issuance of new tasks for the ARP1 sub-project).

This is much more serious and worse, because the BOINC client, having received such an error (in BOINC the error code will be -224 = "permanent HTTP error"), no longer makes attempts to download this file and WU processing ends with an error. After that, a new copy of this task is sent to another participant some time later, who starts downloading all the files from the very beginning, creating an extra load on the already overloaded servers and slowing down data processing in general. Because а significant amount of time passes before issuing a new copy of the task to another participant and downloading all the necessary files, during which WU cannot be validated and removed from the main database as processed. This bloat main database, which add more load on servers and so on.

An example of tasks that ended with such errors on my computers (there were more than a few dozen of such errors in last ARP1 batch, I also regularly see the same errors from my "wingmans"):
https://www.worldcommunitygrid.org/contribution/results/1152839245/log

https://www.worldcommunitygrid.org/contribution/results/1148651979/log

https://www.worldcommunitygrid.org/contribution/results/1150928483/log

https://www.worldcommunitygrid.org/contribution/results/1152329676/log

An example of a log from one of the links above (because they are temporary and will soon be deleted from server after the final validation of the corresponding WU):

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>1ccfb9ea14367479d330a9261f6841a6.</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
</message>
]]>

An example of how it looks in the log on the BOINC client:
16-Nov-2024 18:48:17 [World Community Grid] Started download of 1ccfb9ea14367479d330a9261f6841a6.
====cut=======
16-Nov-2024 18:48:30 [World Community Grid] Giving up on download of 1ccfb9ea14367479d330a9261f6841a6.: permanent HTTP error

Although the file was present on the server and was available. I checked it - I took the full download URL from the client_state.xml file of this BOINC client and opening it in the browser was able to successfully download this file. However, it has already been deleted by now, because the processing of the WU in which it was used has already been completed.

But for example, you can experiment with widely distributed files, for example, even with the main logo of WCG itself.
The address that can be opened in the browser: https://download.worldcommunitygrid.org/boinc/slideshow/default_00_v03.png

If you open this picture and press F5 (refresh) several times, the picture loads normally several times, then suddenly the server throws a 404 error (file not found):
Not Found

The requested URL was not found on this server.
Apache/2.4.52 (Ubuntu) Server at download.worldcommunitygrid.org Port 443

And after a few more attempts (we continue to press F5 with pauses), it returns the picture correctly again.

More examples of common (unrelated to any particular WU - its pictures from a WCG screensaver, so it should be persistent) files that periodically give a 404 error when trying to download them:
https://download.worldcommunitygrid.org/boinc/slideshow/opn1_01_v01.png

https://download.worldcommunitygrid.org/boinc/slideshow/mip1_04_v01.png

https://download.worldcommunitygrid.org/boinc/slideshow/stat_v05.png
----------------------------------------
[Edit 2 times, last edit by Mad_Max at Nov 22, 2024 6:27:14 PM]
[Nov 22, 2024 6:17:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
savas
Cruncher
Joined: Sep 21, 2021
Post Count: 30
Status: Offline
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Thank you for reporting the issue.
The 404 errors related to "widely distributed files" should no longer be present, as we have isolated the problem to the newly provisioned download servers and taken them out of service while we fix the issue.
Separately, we have identified in the Apache logs all instances of a 404 returned from the download servers in the last few days. In each case, the file was no longer present on the filesystem, but we will look back further and investigate these workunits in the BOINC db to isolate the issue. We will post back here with our findings and a resolution soon.
[Nov 22, 2024 7:32:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Maxxina
Advanced Cruncher
Joined: Jan 5, 2008
Post Count: 115
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Comprehensive Issue List & Report Thread (Aug. 9, 2022)

Or post them in operational status in jurisica lab page :)
[Nov 23, 2024 6:34:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 427   Pages: 43   [ Previous Page | 34 35 36 37 38 39 40 41 42 43 | Next Page ]
[ Jump to Last Post ]
Post new Thread