| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 7
|
|
| Author |
|
|
giddie
Cruncher UK Joined: Nov 21, 2006 Post Count: 29 Status: Offline Project Badges:
|
Boinc 6.12.34
ArchLinux x86-64 I'm trying to set up BOINC clients on diskless nodes that are part of a cluster. That means that all disk I/O has to go over a network to an NFS server. The problem I'm seeing is that the clients are not respecting the disk_interval setting (I've tried increasing it to 600, with no effect). Instead, in each slot directory, I see a couple of files that are being written at least once a *second*: boinc_mmap_file is written in this way in each slot. The other file is different for each slot: boinc_dsfl_0 boinc_gfam_1 boinc_dsfl_2 boinc_c3cw_3 Any idea why Boinc is going crazy with these files? I'm not seeing this on otherwise identical systems (that have disks). |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Whatever you see, can't be right. The only science not following the "Write To Disk" limit is CEP2, and that one has only 16 checkpoints maximum on a run time of up to 12 hours. WtD is not "Run based on preferences" dependent... it's one of these settings that always works.
----------------------------------------How many nodes are writing back to your NFS server and how many concurrent threads are running BOINC. If thousands, then yes, that could get into the once per second, but only at the NFS server end, not the individual node threads. And for sure, advise strongly not to run CEP2 on your setup. That one is likely to kill efficiency, but the experts are invited to contradict and explain how so. One thought; some firewalls have the habit of showing localhost traffic. That never leaves the node, but don't expect you to run security software at the device level. --//-- edit: Is this 6.12.34 a build by the distro or one fetched from Berkeley? [Edit 1 times, last edit by Former Member at Feb 22, 2012 2:16:01 PM] |
||
|
|
giddie
Cruncher UK Joined: Nov 21, 2006 Post Count: 29 Status: Offline Project Badges:
|
The BOINC build is from the distro (boinc-nox). The same package doesn't show the same problem on "normal" installations.
I am testing this with: # cd /var/lib/boinc/slots/0 # watch -n1 ls -lat --full-time The top two files show an updated timestamp every time watch updates. The problem for the network is the *number* of these small file writes. I'm not sure exactly how many writes are occurring per second, but it's clearly at least once per file per slot per machine. I'll look into CEP2; thanks for the tip. Our use case for this cluster involves pretty large I/O anyway, and simple tests show pretty decent throughput even with multiple nodes writing simultaneously. |
||
|
|
mikey
Veteran Cruncher Joined: May 10, 2009 Post Count: 826 Status: Offline Project Badges:
|
Boinc 6.12.34 ArchLinux x86-64 I'm trying to set up BOINC clients on diskless nodes that are part of a cluster. That means that all disk I/O has to go over a network to an NFS server. The problem I'm seeing is that the clients are not respecting the disk_interval setting (I've tried increasing it to 600, with no effect). Instead, in each slot directory, I see a couple of files that are being written at least once a *second*: boinc_mmap_file is written in this way in each slot. The other file is different for each slot: boinc_dsfl_0 boinc_gfam_1 boinc_dsfl_2 boinc_c3cw_3 Any idea why Boinc is going crazy with these files? I'm not seeing this on otherwise identical systems (that have disks). Check out the Dotsch Linux on a Server or a USB disk, he writes it himself and should be a helpful resource. Do a search and you will find him. ![]() ![]() |
||
|
|
giddie
Cruncher UK Joined: Nov 21, 2006 Post Count: 29 Status: Offline Project Badges:
|
Thanks, but that's not quite suited to my setup. I already have a diskless environment set up, and I'm hoping to run BOINC on the nodes to harvest wasted time when the cluster is up but we have no jobs to run.
I'd really appreciate some ideas as to why these files might be written so frequently, or other tests I might run to figure out what's going on. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I know at least 2 guys that run a render and computing for rent farm, former in a PXE setup, diskless FAIK. The guy with the render farm saves the nodes client "as is" when he's got a job and reloads them when ever they're free to BOINC. Never heard him on a perpetual disk writing to a staging drive, in fact, why would it act any different from on a local host. The WTD is adhered to, but that said, maybe he's using a RAMDISK type setup with minimal work queue to contain the memory needs. Here some google hits on BOINC on diskless nodes: http://www.google.it/search?q=BOINC+on+diskle...cial&client=firefox-a
----------------------------------------Sorry, but not much of a help on this. You could always send a message to support@worldcommunitygrid.org f.a.o techs. When talking hundreds of devices/cores, they will stretch to assist you as best they can. --//-- [Edit 1 times, last edit by Former Member at Feb 28, 2012 4:05:41 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had a setup with 4 diskless nodes at one time. Currently, I run just 2 diskless nodes for BOINC. I have experimented with Linux on diskless nodes(note: I am basically a linux noob). All of the machines use a shared folder from a Ramdrive on the PXE server to store the BOINC WUs. This works REALLY well for dedicated CEP2 crunching(which I do when I'm not trying to get badges). I had multiple "odd" issues with my experiments using both Linux Mint 12 and Ubuntu 11.10. I could never identify the exact issues but I assume it is related to the way in which file shares work using samba(The PXE server is a windows server 2008 R2 machine). The windows machines on the other hand, crunch like there's no tomorrow.
I will tell you that using my RAMDRIVE as a file share from the PXE server, I was able to run 40 concurrent threads of CEP2 simultaneously using gigabit LAN with no issues whatsoever. Based on the data I had collected the potential existed to run as many as 80 threads of CEP2 simultaneously. However CEP2 requires alot of disk space compared to the other work units and since I had only 32GB of space on my RAMDRIVE 40 threads was about the limits of what I could handle without some of the machines chocking and not receiving new work units because less than 2GB of free space was available. |
||
|
|
|