Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 214
Posts: 214   Pages: 22   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 149961 times and has 213 replies Next Thread
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Slow downloads may be caused by the following.
I only run Mapping Cancer Markers, but this project alone might be slowing downloads.
My computer repeatedly downloads the following huge (107 MB) file...
mcm1.dataset-sarc1.txt
which could be slowing everything down.
My computer tries to download more tasks but "Tasks are committed to other platforms" or some other error, so it runs out of tasks. When it runs out of tasks, BOINC deletes mcm1.dataset-sarc1.txt
With IBM, perhaps certain files wouldn't be deleted when all tasks that needed it were finished?
The solution could be for WCG to either...
(1) not have mcm1.dataset-sarc1.txt be deleted until "sarc2" or whatever comes out
(2) fixing the "Tasks are committed to other platforms" issue
Regardless, perhaps the simple solution is for everyone to store at least a couple days of work and to only run a few projects (like only Mapping Cancer Markers)?


No, that is not the cause of the slow downloads. That large MCM1 file,at least for me, downloads quicker than the small files do. Once it has a connection, it keeps that connection and continues its download. The problem seems to be to get a good connection in the first place. The issue of the large file needing to be downloaded each time you run out of MCM1 work units has always been there as far as I know. It is just more apparent now that the supply of work units is haphazard at best.
Anyway, at this time both OPN and MCM seem to both have run out of work for the moment.

10/8/2022 7:22:04 PM | World Community Grid | No tasks are available for OpenPandemics - COVID 19
10/8/2022 7:22:04 PM | World Community Grid | No tasks are available for OpenPandemics - COVID-19 - GPU
10/8/2022 7:22:04 PM | World Community Grid | No tasks are available for Mapping Cancer Markers
(Time is US Central Daylight Time -- UTC-5)

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Oct 9, 2022 12:27:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Anyway, at this time both OPN and MCM seem to both have run out of work for the moment.
Was out of the house/office all day and just checked what's going on.
The MCM1 well has been running dry at least since early this morning, if not since late last night.

But I stll get OPN1 (no resends) and both my GPU host have plenty of OPNG WUs as well. Let's see how it looks on Sunday morning...

Ralf
[Oct 9, 2022 6:28:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Link64
Senior Cruncher
Joined: Feb 19, 2021
Post Count: 206
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

I only run Mapping Cancer Markers, but this project alone might be slowing downloads.
My computer repeatedly downloads the following huge (107 MB) file...
mcm1.dataset-sarc1.txt
which could be slowing everything down.

Yes, that's insane, this file schould be "sticky" like similar reusable files from for example Milkyway or Einstein:

<file>
<name>stars-82_pt2.txt</name>
<nbytes>1539106.000000</nbytes>
<max_nbytes>0.000000</max_nbytes>
<md5_cksum>578e9411673c51fcc24354fb11f31b93</md5_cksum>
<status>1</status>
<sticky/>
<download_url>http://milkyway.cs.rpi.edu/milkyway/download/1a5/stars-82_pt2.txt </download_url>
</file>

----------------------------------------

[Oct 9, 2022 8:58:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
poppinfresh99
Cruncher
Joined: Feb 29, 2020
Post Count: 49
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

No, that is not the cause of the slow downloads. That large MCM1 file,at least for me, downloads quicker than the small files do. Once it has a connection, it keeps that connection and continues its download. The problem seems to be to get a good connection in the first place. The issue of the large file needing to be downloaded each time you run out of MCM1 work units has always been there as far as I know. It is just more apparent now that the supply of work units is haphazard at best.

My thought was that the downloads server can only make so many connections. The small files keep a connection for a moment. The 107 MB file keeps it for a minute or more.

IBM clearly had a better throughput and steady supply of work, but this file not being “sticky” just makes issues worse when issues occur, like coyotes waiting to attack the sick and weak. I am not sure how you can be so confident that this isn’t a cause of the problems.

I do have a couple 4 MB sticky file from MCM still on my computer from 2021 (.tga files), so I’m not convinced that this was always a problem, though these aren’t .txt files like the 107 MB file. Also 107 > 4 could be an issue slowing everything down…
----------------------------------------
[Edit 1 times, last edit by poppinfresh99 at Oct 9, 2022 1:45:05 PM]
[Oct 9, 2022 1:43:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

I am not sure how you can be so confident that this isn’t a cause of the problems.

Since you only need to download the large MCM1 once (as long as you have a steady supply of MCM1 work units), the download issues continue even when not downloading this file. Besides, the downloading of the OPN files should be entirely independent of the large MCM1 file. I would welcome any alternative explanation.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Oct 9, 2022 2:11:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Link64
Senior Cruncher
Joined: Feb 19, 2021
Post Count: 206
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

I do have a couple 4 MB sticky file from MCM still on my computer from 2021 (.tga files)

Technically those are not "sticky" files, i.e. files, which are downloaded as part of a WU and kept for future WUs, they are "project files". But yes, I agree, even if this large file downloads faster once the connection is there (making the connection is what seems to take most time on the small ones), everyone downloading it again and again, probably even more often than if the system was stable and people were not running out of WUs, is for sure adding to the issue, even though it's hard to tell how much.

I noticed this "issue" shortly after joining WCG when testing the different projects, so I copied it to the projects folder and hardlinked it back to the WCG folder each time BOINC deleted it using this batch file:

fsutil hardlink create "www.worldcommunitygrid.org\mcm1.dataset-sarc1.txt" "mcm1.dataset-sarc1.txt"

However this isn't something we should "need" to do.
----------------------------------------

[Oct 9, 2022 2:22:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

I do have a couple 4 MB sticky file from MCM still on my computer from 2021 (.tga files)

Technically those are not "sticky" files, i.e. files, which are downloaded as part of a WU and kept for future WUs, they are "project files". But yes, I agree, even if this large file downloads faster once the connection is there (making the connection is what seems to take most time on the small ones), everyone downloading it again and again, probably even more often than if the system was stable and people were not running out of WUs, is for sure adding to the issue, even though it's hard to tell how much.

I noticed this "issue" shortly after joining WCG when testing the different projects, so I copied it to the projects folder and hardlinked it back to the WCG folder each time BOINC deleted it using this batch file:

fsutil hardlink create "www.worldcommunitygrid.org\mcm1.dataset-sarc1.txt" "mcm1.dataset-sarc1.txt"

However this isn't something we should "need" to do.
No, we should not need to do this. And this never has been a problem before (the move)...

Just far too many people (possibly including some a Krembil?) are all just looking at the symptoms rather than at the source of the problem.,

Ralf
[Oct 9, 2022 3:38:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Since 2022-10-09, 21:20 UTC on one machine I haven't seen one transient HTTP error, during uploading nor during downloading. It's now 2 hours later (when I started writing this post).
After 2 hours, from 21:20 - 23:20 UTC:
36 tasks were uploaded (13x OPNG, 16x MCM1, 6x OPN1, 1x ARP1) and
44 tasks were downloaded (33x OPNG, 5x MCM1, 5x OPN1, 1x ARP1).

At 23:30 UTC, another 23 OPNG-tasks were downloaded, all also without any transient HTTP error.

It looks like a sudden change: during the last five minutes until the last error was seen, that's between 21:15 and 21:20 UTC, I got 39 transient HTTP errors (5 during uploading and 34 while processing downloads). After that, no more errors.

Is this because the ARP1-well has run dry for a big part? devilish (I'm only receiving resends of it.)

EDIT: Uploading the ARP1-result took 96 seconds, downloading the ARP1-task took 6 minutes.
----------------------------------------
[Edit 2 times, last edit by adriverhoef at Oct 10, 2022 12:04:41 AM]
[Oct 9, 2022 11:54:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
geophi
Advanced Cruncher
U.S.
Joined: Sep 3, 2007
Post Count: 113
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

The website suddenly got more responsive as well. Earlier today, it took a long time just to click on "Results" and actually list them. Now, it's virtually immediate.
[Oct 10, 2022 12:55:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sptrog1
Master Cruncher
Joined: Dec 12, 2017
Post Count: 1592
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

I do not have the measurements that adriverhoef does but things seem to be better. Updates seem without problems. I am not sitting with empty threads. I do not feel I have to push the transfer button to keep th research flowing. Keep uo the good work.
[Oct 10, 2022 2:00:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 214   Pages: 22   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread