Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 159
|
![]() |
Author |
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2152 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I know this has been mentioned before, but is there any technical reason why the files cannot be combined into one file for download? My guess the challenges are: 1)The server would need to combine/zip/rar all your tasks into a bundle...requiring more/a lot of CPU time for thousands of users. Let's suppose this could be made possible in a simple way. Let's have a look at the files in one ARP1-task. There are typically 11 files in a task that is sent to a client. One filename ending in ".input", 1,785 bytes. Three filenames ending in ".", all three files having a size of 659,852 bytes. Three filenames ending in ".7z", all three files varying in size between ± 14 - 21 MB. Three filenames ending in "d01", "d02" and "d03", all three files having a size of 12,485,444 bytes (± 12 MB). One more filename ending in ".", this file having a size of 49,385,088 bytes (± 49 MB). The filenames ending in "." aren't compressed, they contain raw NetCDF Data Format data. They are either small (659,852 bytes) or large (49 MB). My guess is that it is too 'expensive' on resources to compress the larger one and 'a waste of time' to compress the small ones, including the file with its name ending in ".input", of course. The files with their names ending in ".7z" are already compressed (LZMA); uncompressed, two of them are sized 87,327,860 bytes and the third one comprises 92,575,412 bytes. They contain, all three, raw NetCDF Data Format data, too. Finally, the files with their names ending in "d01", "d02" and "d03" also contain raw NetCDF Data Format data. (BTW, compressing the larger one of 49 MB could be an option, I don't understand why this wasn't done in the IBM days. It would take up ± 12 MB.) So, before compressing, the server has, in bytes: 1785 (1 file, filename ending in ".input") Since the server already compresses the large files, aside from the 49 MB file, there's no need for the server to compress the 11 files even further before creating the task that will be sent to a client. Then the server would need to combine the already compressed files and the smaller files - and optionally compress the 49 MB file - which could then take up one huge download of ± 102 MB. (The fixed sizes are 1,785 + 3 * 659,852 + 3 * 12,485,444; the other sizes are 3 * approx. 14-20 MB (averaging a total of 50 MB) + approx. 12 MB.) That's it, one large bundle with 11 files in it, some of them already compressed, others too small to make the effort, saving milliseconds of server time. Adri |
||
|
ericinboston
Senior Cruncher Joined: Jan 12, 2010 Post Count: 258 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Since the server already compresses the large files, aside from the 49 MB file, there's no need for the server to compress the 11 files even further before creating the task that will be sent to a client. Then the server would need to combine the already compressed files and the smaller files - and optionally compress the 49 MB file - which could then take up one huge download of ± 102 MB. (The fixed sizes are 1,785 + 3 * 659,852 + 3 * 12,485,444; the other sizes are 3 * approx. 14-20 MB (averaging a total of 50 MB) + approx. 12 MB.) That's it, one large bundle with 11 files in it, some of them already compressed, others too small to make the effort, saving milliseconds of server time. Thanks for the breakdown and yes, that's what I was referring to...that the server(s) would need to spend CPU cycles compressing and creating the bundle. For all we know, compressing the remaining files may yield only a 2% reduction yet take 8 seconds to perform. Is it worth the CPU cycles and time to do this?...only WCG/the project can answer that. My guess is they thought of this process and decided it was not worth the time creating compressed bundles. It may be, of course, time to consider creating non-compressed bundles so the end user gets 1 file instead of 11 if there is risk (which I think is the problem in this thread) of X of those 11 files not transferring correctly and causing headaches for everyone. ![]() |
||
|
Boca Raton Community HS
Advanced Cruncher Joined: Aug 27, 2021 Post Count: 125 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
Thank you to both adriverhoef and ericinboston for the insights and analysis. This does answer some questions and still leaves some questions on the table. I feel there is a better way for these work units to be sent but I am sure the developers have their reasons. But, things could still change/improve!
|
||
|
harmonytom
Cruncher Joined: Apr 20, 2011 Post Count: 2 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Nine days later -- no progress.
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you to both adriverhoef and ericinboston for the insights and analysis. This does answer some questions and still leaves some questions on the table. I feel there is a better way for these work units to be sent but I am sure the developers have their reasons. But, things could still change/improve! Well, one point that Adri has been missing is that putting several files into a single .ZIP/.7z file doesn't mean that they need to be compressed/decompressed, again. After all, these kind of files are referred to as "Archive" files and it is quite common to use such archives for the last +40 years to combine files for a single download, going back to the days of dialup BBSs...However, I am pretty sure that there is a reason why the developer of the ARP1 application (after all, it is THEIR project, not WCG/Krembil's) have chosen to supply multiple, partially compressed .7z files, partially apparently uncompressed .txt files, probably depending on how that source data is generated and processed. It would require a lot of additional work for WCG to change the upload (server side)/download (client side) process to handle the combining the files before upload and extracting after download, fitting in the standard BOINC procedures. Yes, it is a pain where the sun doesn't shine, but people in general shouldn't be so greedy and just try to grab what ever they can with no regards to the system over all. It seems that the "community" part in World Community Grid doesn't ring a bell in a lot of folks mind... Ralf ![]() |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 945 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Regarding packing up multiple files in one archive file (compressed or otherwise)...
As well as the issue of packing the files server-side, there's the matter of unpacking and verifying them client-side. As far as I am aware, BOINC doesn't provide a mechanism for doing that, so in the case of ARP1 it would be up to the application wrapper to do that before starting the actual application. At present, it can simply use the BOINC mechanisms for locating and accessing data files, and extra recoding to support do-it-yourself file management would be needed :-) And it still wouldn't solve the server connectivity issues -- the connections would be held open for longer (which would possibly help to an extent) but the time to transmit the total data for a task wouldn't drop significantly unless the infrastructure could support it (and there's plenty of evidence that at present it can't!) so would the effort involved in the software changes be worth it? Cheers - Al. |
||
|
gj82854
Advanced Cruncher Joined: Sep 26, 2022 Post Count: 102 Status: Offline Project Badges: ![]() ![]() |
And after everything we are going through, do you want Krembil making changes to anything?
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Regarding packing up multiple files in one archive file (compressed or otherwise)... That's what I was trying to say as well. Any efforts invested into this would be better applied to other things that are going less than swimmingly...As well as the issue of packing the files server-side, there's the matter of unpacking and verifying them client-side. As far as I am aware, BOINC doesn't provide a mechanism for doing that, so in the case of ARP1 it would be up to the application wrapper to do that before starting the actual application. At present, it can simply use the BOINC mechanisms for locating and accessing data files, and extra recoding to support do-it-yourself file management would be needed :-) And it still wouldn't solve the server connectivity issues -- the connections would be held open for longer (which would possibly help to an extent) but the time to transmit the total data for a task wouldn't drop significantly unless the infrastructure could support it (and there's plenty of evidence that at present it can't!) so would the effort involved in the software changes be worth it? Cheers - Al. Ralf ![]() |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1288 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And after everything we are going through, do you want Krembil making changes to anything? I say we want them to make changes, if changes are not made for example new projects cannot be added or existing projects that currently have no work can't have work sent out when it becomes available ![]() |
||
|
Link64
Advanced Cruncher Joined: Feb 19, 2021 Post Count: 129 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
I know this has been mentioned before, but is there any technical reason why the files cannot be combined into one file for download? Maybe I'm confusing something, but IIRC it's possible to transfer more than one file using the same http(s) connection, i.e. not closing it after every file and opening a new one for the next file, just continue with the next file. And IIRC there is or at least was one BOINC project doing that, but no idea which one. Doesn't cost anything, just different settings on the server.But like I said, IIRC, it might be completely wrong. ![]() |
||
|
|
![]() |