World Community Grid - View Thread - Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

World Community Grid Forums

Category: Official Messages

Forum: News

Thread: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 159

[ ]

Author

This topic has been viewed 14574 times and has 158 replies

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2152
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

I know this has been mentioned before, but is there any technical reason why the files cannot be combined into one file for download?

My guess the challenges are:

1)The server would need to combine/zip/rar all your tasks into a bundle...requiring more/a lot of CPU time for thousands of users.

Let's suppose this could be made possible in a simple way. Let's have a look at the files in one ARP1-task.

There are typically 11 files in a task that is sent to a client.
One filename ending in ".input", 1,785 bytes.
Three filenames ending in ".", all three files having a size of 659,852 bytes.
Three filenames ending in ".7z", all three files varying in size between ± 14 - 21 MB.
Three filenames ending in "d01", "d02" and "d03", all three files having a size of 12,485,444 bytes (± 12 MB).
One more filename ending in ".", this file having a size of 49,385,088 bytes (± 49 MB).

The filenames ending in "." aren't compressed, they contain raw NetCDF Data Format data. They are either small (659,852 bytes) or large (49 MB). My guess is that it is too 'expensive' on resources to compress the larger one and 'a waste of time' to compress the small ones, including the file with its name ending in ".input", of course.
The files with their names ending in ".7z" are already compressed (LZMA); uncompressed, two of them are sized 87,327,860 bytes and the third one comprises 92,575,412 bytes. They contain, all three, raw NetCDF Data Format data, too.
Finally, the files with their names ending in "d01", "d02" and "d03" also contain raw NetCDF Data Format data.
(BTW, compressing the larger one of 49 MB could be an option, I don't understand why this wasn't done in the IBM days. It would take up ± 12 MB.)

So, before compressing, the server has, in bytes:

       1785 (1 file, filename ending in ".input")
    1979556 (3 files, filename ending in ".", sized 659852)
  174655720 (2 files, filename ending in ".7z", sized 87327860)
   92575412 (1 file, filename ending in ".7z")
   37456332 (3 files, filenames ending in "d01", "d02" and "d03", sized 12485444)
   49385088 (1 file, filename ending in ".")
  ---------
  356053893 total ≈ 356 MB

Since the server already compresses the large files, aside from the 49 MB file, there's no need for the server to compress the 11 files even further before creating the task that will be sent to a client.

Then the server would need to combine the already compressed files and the smaller files - and optionally compress the 49 MB file - which could then take up one huge download of ± 102 MB. (The fixed sizes are 1,785 + 3 * 659,852 + 3 * 12,485,444; the other sizes are 3 * approx. 14-20 MB (averaging a total of 50 MB) + approx. 12 MB.)

That's it, one large bundle with 11 files in it, some of them already compressed, others too small to make the effort, saving milliseconds of server time.

Adri

[Nov 13, 2024 3:09:31 AM]

ericinboston
Senior Cruncher
Joined: Jan 12, 2010
Post Count: 258
Status: Offline
Project Badges:

20 year badge for Help Fight Childhood Cancer

2 year badge for The Clean Energy Project - Phase 2

14 day badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

200 year badge for Mapping Cancer Markers

100 year badge for Smash Childhood Cancer


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Thanks for the breakdown and yes, that's what I was referring to...that the server(s) would need to spend CPU cycles compressing and creating the bundle. For all we know, compressing the remaining files may yield only a 2% reduction yet take 8 seconds to perform. Is it worth the CPU cycles and time to do this?...only WCG/the project can answer that. My guess is they thought of this process and decided it was not worth the time creating compressed bundles. It may be, of course, time to consider creating non-compressed bundles so the end user gets 1 file instead of 11 if there is risk (which I think is the problem in this thread) of X of those 11 files not transferring correctly and causing headaches for everyone.

----------------------------------------

[Nov 13, 2024 2:03:10 PM]

Boca Raton Community HS
Advanced Cruncher
Joined: Aug 27, 2021
Post Count: 125
Status: Offline
Project Badges:

10 year badge for Smash Childhood Cancer

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Thank you to both adriverhoef and ericinboston for the insights and analysis. This does answer some questions and still leaves some questions on the table. I feel there is a better way for these work units to be sent but I am sure the developers have their reasons. But, things could still change/improve!

[Nov 13, 2024 2:12:34 PM]

harmonytom
Cruncher
Joined: Apr 20, 2011
Post Count: 2
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

90 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

2 year badge for Uncovering Genome Mysteries

20 year badge for FightAIDS@Home - Phase 2

50 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Nine days later -- no progress.

[Nov 13, 2024 2:58:03 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

50 year badge for Smash Childhood Cancer


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Well, one point that Adri has been missing is that putting several files into a single .ZIP/.7z file doesn't mean that they need to be compressed/decompressed, again. After all, these kind of files are referred to as "Archive" files and it is quite common to use such archives for the last +40 years to combine files for a single download, going back to the days of dialup BBSs...

However, I am pretty sure that there is a reason why the developer of the ARP1 application (after all, it is THEIR project, not WCG/Krembil's) have chosen to supply multiple, partially compressed .7z files, partially apparently uncompressed .txt files, probably depending on how that source data is generated and processed. It would require a lot of additional work for WCG to change the upload (server side)/download (client side) process to handle the combining the files before upload and extracting after download, fitting in the standard BOINC procedures.

Yes, it is a pain where the sun doesn't shine, but people in general shouldn't be so greedy and just try to grab what ever they can with no regards to the system over all. It seems that the "community" part in World Community Grid doesn't ring a bell in a lot of folks mind...

Ralf

----------------------------------------

[Nov 13, 2024 5:08:47 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 945
Status: Recently Active
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Regarding packing up multiple files in one archive file (compressed or otherwise)...

As well as the issue of packing the files server-side, there's the matter of unpacking and verifying them client-side. As far as I am aware, BOINC doesn't provide a mechanism for doing that, so in the case of ARP1 it would be up to the application wrapper to do that before starting the actual application. At present, it can simply use the BOINC mechanisms for locating and accessing data files, and extra recoding to support do-it-yourself file management would be needed :-)

And it still wouldn't solve the server connectivity issues -- the connections would be held open for longer (which would possibly help to an extent) but the time to transmit the total data for a task wouldn't drop significantly unless the infrastructure could support it (and there's plenty of evidence that at present it can't!) so would the effort involved in the software changes be worth it?

Cheers - Al.

[Nov 13, 2024 7:48:59 PM]

gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 102
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

And after everything we are going through, do you want Krembil making changes to anything?

[Nov 13, 2024 10:48:34 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

That's what I was trying to say as well. Any efforts invested into this would be better applied to other things that are going less than swimmingly...

Ralf

----------------------------------------

[Nov 14, 2024 12:21:48 AM]

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1288
Status: Offline
Project Badges:

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

And after everything we are going through, do you want Krembil making changes to anything?

I say we want them to make changes, if changes are not made for example new projects cannot be added or existing projects that currently have no work can't have work sent out when it becomes available

----------------------------------------

[Nov 14, 2024 2:18:10 AM]

Link64
Advanced Cruncher
Joined: Feb 19, 2021
Post Count: 129
Status: Offline
Project Badges:

14 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

14 day badge for OpenPandemics - COVID-19


Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

I know this has been mentioned before, but is there any technical reason why the files cannot be combined into one file for download?

Maybe I'm confusing something, but IIRC it's possible to transfer more than one file using the same http(s) connection, i.e. not closing it after every file and opening a new one for the next file, just continue with the next file. And IIRC there is or at least was one BOINC project doing that, but no idea which one. Doesn't cost anything, just different settings on the server.

But like I said, IIRC, it might be completely wrong.

----------------------------------------

[Nov 14, 2024 10:04:37 AM]

[ ]