| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi
---------------------------------------- I had a quick look through this forum and didn't see this question posted before, so here goes! Today a rather large file was transferred to me, which is concerning because of limited/shared bandwidth where I stay. I have set a conservative daily limit on maximum transfer size, but it would maybe be better to receive a smaller file instead of waiting longer (eg a few days) to be able to process the file. Would be possible to transfer it in zipped format and unzip it using the BOINC client? Size comparison below as an example. File: mcm1.dataset-17_72_SDG_v1.txt Transfer size: 32 714 954 bytes Zipped size: 10 018 609 bytes This would obviously reduce server bandwidth requirements as well, and might be beneficial to people using a mobile client (where operators charge per MB) Thanks for taking the time to answer my question ![]() [Edit 1 times, last edit by Former Member at Mar 31, 2014 7:45:29 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Far as can be seen the large 30mb txt file is a one time transfer. It seems to serve as a reference database to all the following mcm1 tasks. Why the file is not transmitted in pre-compressed form is unknown. Given that files are already being compressed / decompressed in real time over the internet by the modems, not a bandwidth concern. The compressed files arrive here at a few hundred kb speed, but the txt file at well over 2mb, evidencing the real time decompression on the agent side.
While writing this, the not pre-compressing may have something to do with the multiuse. All mcm1 tasks symlink to the file. Nothing that's transmitted in zip form gets unzipped while in the project folder. They are stored as is. The agent may lack the functionality to do that, it being a responsibility of the science application? Just a stab at the unknown why, but anyway the file is transmitted only one time, not with every new task it appears. |
||
|
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges:
|
bjorntfn,
BOINC automatically handles compression for transferred files so the actual amount transferred for that file should be far less than 32M. Also, as lavaflow pointed out, for MCM1 the same dataset file is used for many different work units, and BOINC is smart enough to only transfer it once to save some bandwidth there as well. Seippel |
||
|
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges:
|
bjorntfn, BOINC automatically handles compression for transferred files so the actual amount transferred for that file should be far less than 32M. Also, as lavaflow pointed out, for MCM1 the same dataset file is used for many different work units, and BOINC is smart enough to only transfer it once to save some bandwidth there as well. Seippel That bolded part is not correct (at least for me). If a machine is running multiple Boinc projects or even multiple applications here at WCG and it finishes the last MCM1 it has, then reports it without receiving a new MCM1 task, the dataset txt file is deleted from the project directory and is downloaded again once a new MCM1 task is received. Here are some messages taken from the client log since the first of March: 02-Mar-2014 01:54:05 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 02-Mar-2014 08:46:59 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 06-Mar-2014 15:03:35 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 08-Mar-2014 02:37:01 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 08-Mar-2014 20:33:25 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 10-Mar-2014 07:27:03 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 10-Mar-2014 20:49:54 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 11-Mar-2014 07:45:35 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 12-Mar-2014 17:14:06 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 14-Mar-2014 05:17:48 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 15-Mar-2014 04:49:55 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 15-Mar-2014 09:35:21 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 21-Mar-2014 15:54:50 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 23-Mar-2014 03:46:18 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 23-Mar-2014 17:03:50 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 24-Mar-2014 06:11:49 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 26-Mar-2014 12:02:52 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 27-Mar-2014 22:25:33 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 29-Mar-2014 17:00:18 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 31-Mar-2014 07:42:28 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt The client just reported an MCM1 task about 3 hours ago, doesn't have one currently in queue and the dataset txt file was deleted. Edit: Here are messages from a Linux machine that runs multiple projects, so it may have times where there are no WCG tasks on it. After it reported the 1 WCG MCM1 task below, it was empty of WCG tasks, later requested more work and had to download the dataset txt file again. 31-Mar-2014 05:03:56 [WCG] Sending scheduler request: To report completed tasks. 31-Mar-2014 05:03:56 [WCG] Reporting 1 completed tasks 31-Mar-2014 05:03:56 [WCG] Not requesting tasks: don't need 31-Mar-2014 05:03:58 [WCG] Scheduler request completed 31-Mar-2014 07:38:30 [WCG] Sending scheduler request: To fetch work. 31-Mar-2014 07:38:30 [WCG] Requesting new tasks for CPU 31-Mar-2014 07:38:33 [WCG] Scheduler request completed: got 3 new tasks 31-Mar-2014 07:39:05 [WCG] Started download of mcm1.dataset-17_72_SDG_v1.txt 31-Mar-2014 07:39:16 [WCG] Finished download of mcm1.dataset-17_72_SDG_v1.txt [Edit 1 times, last edit by BobCat13 at Apr 1, 2014 3:25:36 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The brainy part of version 7 agents, volunteers complaining of crud collecting over time. Can you imagine clients that get on and off cep2 with these 200mb sets? Maybe this cleaning process best only runs at a restart?
My client has been fighting to get new copies of both 7.16 and 7.20 fahv since the new logo launch, and failing due a signature / md5 incorrectness, so it's simapping. |
||
|
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges:
|
CEP2 never deleted the 200MB set when there was no CEP2 task present on the machine, nor HPF2. Rosetta also doesn't delete their large minirosetta database if you have no Rosetta tasks present. This is specific to the MCM1 dataset txt file.
----------------------------------------Just a guess here, but since MCM1 will use different datasets over its run, the files are not referenced under the <app_version> section in client_state.xml so they get deleted if no MCM1 tasks are present in queue. If I remember correctly, there was an option to make files <static/> in client_state.xml which would keep them on the system even if no tasks that used them were present. This option may have been removed by version 7 clients, but I'm not sure. The <static/> option would also require the WCG techs remove the flag once a dataset is no longer needed, so if the techs forget you could end up having a bunch of unneeded files taking up disc space. [Edit 1 times, last edit by BobCat13 at Apr 1, 2014 2:35:49 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Let's extend the bolded line "BOINC is smart enough to only transfer it once to save some bandwidth there as well, long as it's deemed needed for the current batches processed on the node.". The file not being replicated time and again into the slots, symlinked for each task, makes the whole processing / storage event substantially more efficient than what's seen with cep2 particularly.
|
||
|
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges:
|
My previous comment was a bit of an oversimplification. Work unit files are categorized into three different types:
1) Files which are specific to a single work unit. These are sent when the work unit is downloaded and deleted after it's done 2) File which span a large number of work units but not the life of the project. These files are downloaded once, but deleted from the client machine once all work units that need it are finished. The MCM1 dataset files fall into this category. 3) Files which are needed by all work units (or nearly all) work units for a project. These files are kept even if no work units that need them are currently downloaded. The qcaux.zip files for CEP2 fall into this last category. Seippel |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks all for the responses.
----------------------------------------I am still not convinced.... The issue I'm having is not whether or not the file is reused by multiple work units, although this does sound like a good idea - saves bandwidth for people like me Thanks for the good explanation of this mechanic. In theory BOINC could hang onto the file until the storage is needed for something else instead of deleting it, if there is potential for the file to be used "soon". You might find that some users have set conservative values for storage allowed and this causes the file to be deleted earlier. Anyway.... back to my question I'm concerned about the actual size transferred during the initial download - this will always happen at least once. As lavaflow explains, BOINC does allow for the files to be zipped/unzipped on-the-fly, but my setup seems to indicate that this is not happening. It seems a bit strange that my agent reports the transfer size at the full 31MB, and presumably counts this towards my daily transfer limit, rather than the compressed size. Another observation - according to the BOINC documentation I could find, if the file is compressed/decompressed by BOINC then resuming the download is not supported. However, in my situation, when the download stalled after reaching my daily bandwidth limit, the following day the download resumed (it did not restart). This leads me to think that the file is not being precompressed and decompressed in the background. Is there any way to confirm this beyond reasonable doubt? My client's only indication as to the file size is that the total of the uncompressed file is reported and the download pauses at the correct number of MB when it reaches my daily limit. I suppose that being able to resume the download is preferable to zipping it as this would ensure the file always reaches the client (whereas zipping + daily limit might prevent the file from ever reaching) Perhaps if the compression is explicitly performed by the science application, then the download could be treated as a normal download by BOINC and resumed as usual. Could we request this? Thanks! [Edit 2 times, last edit by Former Member at Apr 2, 2014 7:46:00 AM] |
||
|
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges:
|
bjorntfn,
BOINC uses libcurl to handle the compression/transfer for it, so the file size BOINC shows as transferred is the full file size, not the compressed file size. Since BOINC doesn't display the compressed file size anywhere, testing it would require network monitoring outside of BOINC while the transfer is running. For what it's worth, I confirmed this last night using iftop and transferring data for an MCM1 work unit and the actual amount of data transferred was just over 10MB (the dataset file was 32MB). One thing to keep in mind if you test that way, is that resumed transfers are not compressed so if you interrupt the transfer, you may see something different. Another option would be to set the http_debug and http_xfer_debug log flags in a cc_config.xml file (information about setting up a cc_config.xml file can be found in the Sekerob's "The Start Here Forum Frequently Asked Questions Index" post). This will show that BOINC requests the transfer to be compressed: [http_debug][ID#47] Sent header to server: Accept-Encoding: deflate, gzip Again, ignore the "HTTP: wrote X bytes" messages because those are what BOINC gets back from libcurl, which is uncompressed. Seippel |
||
|
|
|