Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 54
Posts: 54   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 60256 times and has 53 replies Next Thread
petehardy
Senior Cruncher
USA
Joined: May 4, 2007
Post Count: 318
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

Foundation,

Can you increase your bandwidth?
----------------------------------------

"Patience is a virtue", I can't wait to learn it!
[Jul 15, 2010 9:14:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Randzo
Senior Cruncher
Slovakia
Joined: Jan 10, 2008
Post Count: 339
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

Hello if you cannot increase your bandwihth why you just don´t run mix? You increase your efficiency and you also overcome upload problem.

Launch article about CEP2:
"The results for a single work unit can be around 50MB, which is about 500 times larger than a typical FightAIDS@Home result. A 756kbps network connection would take approximately 12-15 minutes to download the work unit. At this time, there are no alternative solutions to overcome this issue."

I would like to stress the last sentence.

So we have to trust scientists that thay need full results and cannot shape uploads.
[Jul 15, 2010 10:22:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

A 756kbps network connection would take approximately 12-15 minutes to download the work unit. At this time, there are no alternative solutions to overcome this issue."

Reminder: there was a mistake in this sentence. It is the upload which takes time. The download is peanuts.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Jul 16, 2010 12:33:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Foundation
Cruncher
Joined: Feb 22, 2009
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

@petehardy - "Can you increase your bandwidth?"
Yes. All it will take is money. And I guess as much as I have already devoted to this obsession, there's no reason not to consider paying more for a bigger pipe.

"So we have to trust scientists that thay need full results and cannot shape uploads."
Being a scientist myself, as well as a fanatical HPC advocate, I disagree with this. Scientists don't trust, we test.

I have some familiarity with ab initio QM calculations, and have fought continuously with students to stop moving and storing the full output on our network at campus. It is possible that this project does something fundamentally different in their calculations, but we achieve a 1000 fold decrease in the data that must be saved in exchange for an occasional, fast single point calculation that recovers the full output.

I would like someone from the scientific team to tell me that my experience doesn't apply, and even take a moment to explain why. But perhaps they don't frequent this forum and I should take the initiative to contact them directly. Or perhaps I just need to move on and devote my resources to projects that aren't such a pain.
[Jul 16, 2010 1:58:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

Also, I have some experience with ab initio QM calculations, and would ask the researchers if they have looked at all the ways possible to reduce the results files. I know we routinely generate 100MB+ output files, but only need to save a few hundred kB of that. Do you really need all 80MB of the result? Return the nuclear configuration and do a single point calculation to regenerate the rest only on the promising compounds!


Dear Cliff,

first of all: thanks for supporting our project! Your participation in the CEP2 is greatly appreciated.

Indeed, the size of the results is one of our biggest concerns as well (although the 80MB are somewhat of a worst case scenario, the average is so far 14MB and the maximum 68MB).

We put a lot of effort in designing the work units and tailoring them to the limitations of the WCG. The binary molecular orbital file is responsible for 90-99% of the data volume. Unfortunately, it is central to our analysis and subsequent calculations, so we cannot drop it. Regeneration is also not an option, since it would require nearly as much time as the original calculation on the grid. We still use a number of tricks to reduce the result size, e.g., we convert all binary numbers from double to single precision and recover the full accuracy in house. This reduces the overall size by a factor of 2 and the scaling by 4. Otherwise, our outputs are already stripped as bare as possible without compromising their scientific use.

Thanks for your suggestions anyways and best wishes

Johannes

(Harvard CEP2 Team)
[Jul 21, 2010 7:14:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

One more note to underline that we really need this data: The user bandwidth is only one aspect of the result size - for us it is also a hardware challenge. Believe me, we wouldn't invest in our own high-performance server and storage device (50TB for now) if we didn't need it wink. They are quite expensive and a challenge to support (it wouldn't be possible without our friends from Harvard Research Computing!).

Best

Johannes

(Harvard CEP2 Team)
[Jul 21, 2010 7:29:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Foundation
Cruncher
Joined: Feb 22, 2009
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

Johannes,
Thanks for the reply. I've read your response and the web pages about the science and admit that I still don't have a complete picture of what exactly you are calculating. But this is the internet, so I can't let a lack knowledge stand in the way of offering an opinion! wink

The critical ratio for distributed computing is the amount of calculation per unit of network transfer. So the first thing I would look at is whether more work could be done in a single calculation. Make the calculations run 10 times as long for the amount of network traffic, and I could put ten times the number of machines on this project before it chokes my DSL. With intermittently available computational resources, there would be some inefficiency due to interrupted/lost work units, but perhaps you have or could derive an empirical model for this effect, and could find an optimal value. (Or perhaps you already have?)

The next question is whether you could use parametric continuity in some fashion to reduce the transfered data volume. As an example, we often use ab initio QM to calculate reaction systems over a trajectory from one (meta-)stable configuration through a transition state to another. To determine the trajectory accurately, hundreds of points along the path are calculated. We evaluate a few physical quantities at each of these points, then store only the nuclear configuration and only at a handful of points over the trajectory. If we subsequently need to calculate additional physical quantities, we do a single-point calculation at the trajectory configuration of interest (usually only at the stable configurations or the transition state). This example of a reaction coordinate as a continuous parameter is where we achieve the factor of 1000 compression I mentioned before. It probably doesn't exactly apply to your calculations, but you might think of something parametric in your problem space and use a similar trick.

Finally, there is the question of whether this project really is (or should be) a search for candidate materials. There are always reasons to store the results from all work performed, the most obvious being to avoid redundant calculations. Also, scientific insight can be gained from looking at the results from both promising and non-promising materials. And it is possible that you are using the detailed results from all materials to guide subsequent material choices, in a directed and dynamic search process within a large dimensional material composition/configuration space. However, I'm guessing that is not the case and rather the researchers have simply defined a large number of materials they wish to evaluate. Therefore, if the goal is to find materials with the right physical properties, it seems a scalar functional "material value" could be formed. Once the "value" of a material has been defined, there are two obvious approaches to be considered. The first is that the detailed results (14-80MB) would be returned only for those materials with sufficiently high "value". The second is that only the material "value" and very limited results be returned from each work unit, with the intent that high "value" materials will be fully investigated in subsequent work units either on your own local machines or at institutions with large bandwidth.

An obvious counter to this "material value" approach would be if the detailed results from a current work unit are not sufficient to evaluate the materials behavior. But then I would point to my first suggestion to do more work in a single work unit.

If you could direct me to an available reference which explains the details of the calculations you are doing, I would be interested and might come back with more specific (and useful) suggestions. At the very least, it would keep me busy for a while. smile

I hope you see this discussion as constructive and not simply as having to deal with a crank. I currently have 16 of my 75 active cores devoted to this project. I have another 200 cores which could be brought online, but I face power, cooling and now bandwidth limitations. Rest assured, my interest is driven by wanting to make as much of this as possible available to you.
Regards, Cliff
[Jul 23, 2010 8:04:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

Fear the 10x longer is not a viable proposition. For you the issue is bandwidth, for others it is duration and incremental risk of something going foul over the course which for a farm owner is no drama, but a 1-2-few device running member-volunteer a big deal. Further, WCG likes to allow also the part time participants (the device that's only powered for few hours a day) which do have the technical requisites i.e. bandwidth and processor speed minima.

One day in the long future we'll be having scaled to power and preference work units i.e. wanting long or shorter, partially automated, meaning WCG will then sense the throughput capability and send the heavier or lighter units.

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Jul 23, 2010 4:05:16 PM]
[Jul 23, 2010 4:00:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

The good news is, the average task run times keep rising. It's now 6 hours and my quad is now starting to take 8+ CPU hours for the current work. Taken it off-line to see for when finished if there is a proportional reduction from the 15Mb sizes I've seen for the 5 hour tasks.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jul 25, 2010 2:54:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dango
Senior Cruncher
Joined: Jul 27, 2009
Post Count: 307
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Help needed] Can't receive CEP Phase 2 tasks due to network bandwidth issue

It's my favourite project, but i cant participate on it, because of low BW (256kbit/s = 32kbyte/s)
[Jul 29, 2010 10:22:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 54   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread