World Community Grid - View Thread - Linux Boinc clients heavy disk activity

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Linux Boinc clients heavy disk activity

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 7

[ ]

Author

This topic has been viewed 1179 times and has 6 replies

giddie
Cruncher
UK
Joined: Nov 21, 2006
Post Count: 29
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

1 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

90 day badge for OpenPandemics - COVID-19


Linux Boinc clients heavy disk activity

Boinc 6.12.34
ArchLinux x86-64

I'm trying to set up BOINC clients on diskless nodes that are part of a cluster. That means that all disk I/O has to go over a network to an NFS server.

The problem I'm seeing is that the clients are not respecting the disk_interval setting (I've tried increasing it to 600, with no effect). Instead, in each slot directory, I see a couple of files that are being written at least once a *second*:

boinc_mmap_file is written in this way in each slot. The other file is different for each slot:

boinc_dsfl_0
boinc_gfam_1
boinc_dsfl_2
boinc_c3cw_3

Any idea why Boinc is going crazy with these files? I'm not seeing this on otherwise identical systems (that have disks).

[Feb 22, 2012 1:57:42 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Linux Boinc clients heavy disk activity

Whatever you see, can't be right. The only science not following the "Write To Disk" limit is CEP2, and that one has only 16 checkpoints maximum on a run time of up to 12 hours. WtD is not "Run based on preferences" dependent... it's one of these settings that always works.

How many nodes are writing back to your NFS server and how many concurrent threads are running BOINC. If thousands, then yes, that could get into the once per second, but only at the NFS server end, not the individual node threads.

And for sure, advise strongly not to run CEP2 on your setup. That one is likely to kill efficiency, but the experts are invited to contradict and explain how so.

One thought; some firewalls have the habit of showing localhost traffic. That never leaves the node, but don't expect you to run security software at the device level.

--//--

edit: Is this 6.12.34 a build by the distro or one fetched from Berkeley?

----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 22, 2012 2:16:01 PM]

[Feb 22, 2012 2:14:48 PM]

giddie
Cruncher
UK
Joined: Nov 21, 2006
Post Count: 29
Status: Offline
Project Badges:


Re: Linux Boinc clients heavy disk activity

The BOINC build is from the distro (boinc-nox). The same package doesn't show the same problem on "normal" installations.

I am testing this with:

# cd /var/lib/boinc/slots/0
# watch -n1 ls -lat --full-time

The top two files show an updated timestamp every time watch updates.

The problem for the network is the *number* of these small file writes. I'm not sure exactly how many writes are occurring per second, but it's clearly at least once per file per slot per machine.

I'll look into CEP2; thanks for the tip. Our use case for this cluster involves pretty large I/O anyway, and simple tests show pretty decent throughput even with multiple nodes writing simultaneously.

[Feb 23, 2012 11:24:27 AM]

mikey
Veteran Cruncher
Joined: May 10, 2009
Post Count: 826
Status: Offline
Project Badges:

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Linux Boinc clients heavy disk activity

Check out the Dotsch Linux on a Server or a USB disk, he writes it himself and should be a helpful resource. Do a search and you will find him.

----------------------------------------

[Feb 23, 2012 1:06:05 PM]

giddie
Cruncher
UK
Joined: Nov 21, 2006
Post Count: 29
Status: Offline
Project Badges:


Re: Linux Boinc clients heavy disk activity

Thanks, but that's not quite suited to my setup. I already have a diskless environment set up, and I'm hoping to run BOINC on the nodes to harvest wasted time when the cluster is up but we have no jobs to run.

I'd really appreciate some ideas as to why these files might be written so frequently, or other tests I might run to figure out what's going on.

[Feb 28, 2012 3:48:14 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Linux Boinc clients heavy disk activity

I know at least 2 guys that run a render and computing for rent farm, former in a PXE setup, diskless FAIK. The guy with the render farm saves the nodes client "as is" when he's got a job and reloads them when ever they're free to BOINC. Never heard him on a perpetual disk writing to a staging drive, in fact, why would it act any different from on a local host. The WTD is adhered to, but that said, maybe he's using a RAMDISK type setup with minimal work queue to contain the memory needs. Here some google hits on BOINC on diskless nodes: http://www.google.it/search?q=BOINC+on+diskle...cial&client=firefox-a

Sorry, but not much of a help on this. You could always send a message to support@worldcommunitygrid.org f.a.o techs. When talking hundreds of devices/cores, they will stretch to assist you as best they can.

--//--

----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 28, 2012 4:05:41 PM]

[Feb 28, 2012 4:03:11 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Linux Boinc clients heavy disk activity

I had a setup with 4 diskless nodes at one time. Currently, I run just 2 diskless nodes for BOINC. I have experimented with Linux on diskless nodes(note: I am basically a linux noob). All of the machines use a shared folder from a Ramdrive on the PXE server to store the BOINC WUs. This works REALLY well for dedicated CEP2 crunching(which I do when I'm not trying to get badges). I had multiple "odd" issues with my experiments using both Linux Mint 12 and Ubuntu 11.10. I could never identify the exact issues but I assume it is related to the way in which file shares work using samba(The PXE server is a windows server 2008 R2 machine). The windows machines on the other hand, crunch like there's no tomorrow.

I will tell you that using my RAMDRIVE as a file share from the PXE server, I was able to run 40 concurrent threads of CEP2 simultaneously using gigabit LAN with no issues whatsoever. Based on the data I had collected the potential existed to run as many as 80 threads of CEP2 simultaneously. However CEP2 requires alot of disk space compared to the other work units and since I had only 32GB of space on my RAMDRIVE 40 threads was about the limits of what I could handle without some of the machines chocking and not receiving new work units because less than 2GB of free space was available.

[Feb 29, 2012 10:22:51 PM]

[ ]