Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 781
|
![]() |
Author |
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2278 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yup batches 13345 - 41773 seems to be of the larger type. More "jobs"/WU, and maybe more complicated "jobs". They do take considerably longer time to crunch though, and seems to take some time at the start, before they really begin crunching. Meaning the percentage starts rising from the beginning, but that's BOINC's pseudo-progress, then after some time, it backs down to the real percentage, after the first "job" is done. I haven't seen that on the previous batches, other than on my really slow GPU's (GTX660M, and iGPU HD4600)
----------------------------------------However, no peace of mind, for those with BOINC on SSD's. They still hammer the disk between each "job", when they checkpoint. I have BOINC on a HD, so I'm not that worried. [Edit 1 times, last edit by Grumpy Swede at Apr 27, 2021 1:18:41 PM] |
||
|
Biscotto
Cruncher Italy Joined: Apr 11, 2020 Post Count: 27 Status: Offline Project Badges: ![]() ![]() ![]() |
Does the continous disk writing i see people complaining about happen for temporary files? I'm not experiencing it, but i have my temporary folder (tmp) mounted in ramdisk through tmpfs, and i only see on the SSD the occasional "dumps" of data, every few minutes. Can this be it?
----------------------------------------Papa Ryzen 5 3600 / Mama Radeon RX 560 |
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 277 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Looks like someone must have kicked the upload server... there were a couple of connection failures, then things started draining. I told BOINC to send the rest of the WUs, and the stuck ones promptly became unstuck.
Regarding SSD write volume: I've been concerned about that as well, so last spring I transferred /var/lib/boinc-client to spinning rust (in my case, a ZFS dataset). |
||
|
Eurwin
Cruncher Joined: Apr 28, 2007 Post Count: 17 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Lot's of uploads go to 100% but somehow do not complete. I had the same behaviour for a couple of hours. Now things look "normal" again. |
||
|
Ian-n-Steve C.
Senior Cruncher United States Joined: May 15, 2020 Post Count: 180 Status: Offline Project Badges: ![]() ![]() |
Does the continous disk writing i see people complaining about happen for temporary files? I'm not experiencing it, but i have my temporary folder (tmp) mounted in ramdisk through tmpfs, and i only see on the SSD the occasional "dumps" of data, every few minutes. Can this be it? I wouldn't worry about it. the fears of SSD writes are largely FUD and blown out of proportion. most modern SSDs can handle PETABYTES of writes before failure is a concern and they have more advanced wear leveling than earlier SSDs. that's continuous writing for 10+ years in most cases. and real world use will be far below that. ![]() EPYC 7V12 / [5] RTX A4000 EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060 [2] EPYC 7642 / [2] RTX 2080Ti |
||
|
maeax
Advanced Cruncher Joined: May 2, 2007 Post Count: 142 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
One Ellesmere with HDD and one Ellesmere with SSD.
----------------------------------------The SSD one is crashing very often, because of a lot Checkpoints. Boinc is set to 1200 sec. for backup, but OPNG ignore this. Now are the longrunning OPNG-Tasks running on it (1 hour!). Something is wrong with checkpointing and SSD. https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=639284992
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
|
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 277 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
We're not out of the woods yet. Getting some stuck downloads now, while uploads are fine.
|
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just an FYI to those that may be running multiple tasks per GPU. These 5 digit batches don't seem to like that, at least not on my older but still powerful hardware, both AMD and Nvidia, Windoze and linux. BOLO errors.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 1 times, last edit by nanoprobe at Apr 27, 2021 1:37:02 PM] |
||
|
Michael Goetz
Cruncher United States Joined: Dec 11, 2017 Post Count: 35 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What I would need is a script for Windows10. Ask and You shall receive! I had a same problem, so made a little script. ----WARNING---- This script will retry ALL your uploads and downloads, no matter if they are active, pending or stalled, so don't schedule this to run too often. -----WARNING---- use at your own risk: @echo off Also, increase the max_file_xfers_per_project in cc_config.xml PRO TIP: There's a MUCH easier way to do this. It's a one-liner. There's a "retry all" command in the BOINC client command line interface: "--network_available". This is all you need: "c:\program files\boinc\boinccmd.exe" --network_available Then add loop/repeat controls as appropriate to your desires and scripting language. |
||
|
Ian-n-Steve C.
Senior Cruncher United States Joined: May 15, 2020 Post Count: 180 Status: Offline Project Badges: ![]() ![]() |
One Ellesmere with HDD and one Ellesmere with SSD. The SSD one is crashing very often, because of a lot Checkpoints. Boinc is set to 1200 sec. for backup, but OPNG ignore this. Now are the longrunning OPNG-Tasks running on it (1 hour!). Something is wrong with checkpointing and SSD. https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=639284992 Either something is wrong with your SSD, or something else is wrong with the system with the SSD. My systems are much faster and running 6-8 GPUs and producing many more writes to the SSD, but with no issues. SSDs in general are capable of many orders of magnitude more IOPs than a HDD. Your problem is likely system-specific, not SSD-specific. ![]() EPYC 7V12 / [5] RTX A4000 EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060 [2] EPYC 7642 / [2] RTX 2080Ti |
||
|
|
![]() |