Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Thread Type: Sticky Thread
Total posts in this thread: 27
Posts: 27   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8659 times and has 26 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

We are down to 8,300 of the "Kracken" sized jobs left to distribute (almost all of the 1.5 GB size left).

We have also been able to get things set to start building more normal size jobs. Those are building through the backend pipeline and will be making jobs available in the next hour. Please note that because we had to reset the entire pipeline, work will only be available intermittently until things get caught up. It will be another 12-18 hours before work is readily available again.
[Oct 7, 2021 12:28:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

The backlog of high memory requirement jobs has been sent out. There will be occasional resends that go out but those will become less frequent over time.

Most jobs that go out now will be slightly larger than normal sized jobs. The backend pipeline is working hard to make this work available. Work will be available intermittently until things get caught up. It will be another 12-18 hours before work is readily available again.
[Oct 7, 2021 12:40:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 815
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Thank you for the updates, it's much appreciated.
----------------------------------------

[Oct 7, 2021 1:38:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
poppageek
Advanced Cruncher
Joined: Nov 16, 2004
Post Count: 99
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Yes much appreciated. Thank you.
[Oct 7, 2021 3:20:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mwroggenbuck
Advanced Cruncher
USA
Joined: Nov 1, 2006
Post Count: 77
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

My Raspberry Pi computers are working just fine this morning. I know that they are not number crunching behemoths, but they only pull 7 watts. I can leave them on 24/7 without my electric bill going up significantly.

Thanks and much appreciation to everyone (especially knreed) for keeping us up to date and getting things going again. biggrin
[Oct 7, 2021 11:03:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1671
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Hi knreed,
thank you for the workaround.
While my PC machines did not struggle with the memory leak problem, the RPi 3 did not well.
With the current workaround, I limit to 2 concurrent WUs on each RPi 3 (server configuration, no GUI) and they succeed to compute the complete WUs without swapping. In normal case, it is 4 concurrent OPN1 WUs without swapping.
Happy fixing,
Yves
----------------------------------------
[Oct 7, 2021 3:50:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[AF>Libristes>April] Chre44
Cruncher
Joined: Mar 10, 2007
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Hi all,
yes thank you for workaround and informations nkreed!
----------------------------------------

[Oct 9, 2021 12:03:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 254
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Hi knreed,
thank you for the workaround.
While my PC machines did not struggle with the memory leak problem, the RPi 3 did not well.
With the current workaround, I limit to 2 concurrent WUs on each RPi 3 (server configuration, no GUI) and they succeed to compute the complete WUs without swapping. In normal case, it is 4 concurrent OPN1 WUs without swapping.
Happy fixing,
Yves


If you're running an RPi 3 or similar memory-constrained system, putting swap space on zram, as counterintuitive as it seems, works beautifully. My Pi 3s got through the storm of huge jobs without any of them crashing, even when they sometimes had to wait for RAM, and I normally run four tasks on them. No manual intervention was needed.

Here's a script/service that sets it up for you with appropriate size settings: https://github.com/foundObjects/zram-swap

If I try it on a Pi running Ubuntu, it will complain when it tries to start the service the first time, but restarting it (sudo systemctl restart zram-swap) or rebooting it will work. I've been converting my Pi fleet to Ubuntu so that MPICH will interwork with my (also Ubuntu) x86 systems. MPICH becomes a real headache when your distros don't match. When they do match, it's just apt install mpich away.

The Pi 4, and especially the Pi 400, are surprisingly good number-crunchers. From a credit-hours per watt standpoint, nothing in my CPU zoo comes even close, and they are significantly faster than the throttled-down i7s in my old ThinkPads (which are also crunching). I can generally expect a Pi 400's BOINC RAC to be right around 900 or so, give or take a dozen or two - not bad for about seven watts measured at the wall socket.

The downside is that ARM on WCG gets you only OpenPandemics units and nothing else. Higher-capacity ARM systems like these really need more support from WCG - while a Pi 3 would choke on ARP, a Pi 4 or 400 could handle it better than those old laptops - to say nothing of a server/workstation-class ARM system like an Ampere Altra.
[Oct 12, 2021 5:05:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rcthardcore
Cruncher
United States
Joined: Jan 29, 2009
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

The workset is apparently still to large for the Raspberry Pi 400. The server sends a message saying that there is not enough memory (3814.70 MB RAM needed but only 3455.31 available). Note this is an ARM 64 machine, but running a 32-bit OS.

Something still needs to be fixed. I have been running this project for several months on my Raspberry Pi. It stopped when this forum entry was created.



You need to be running a 64-bit OS. 32-bit is limited to 4 GB ram.
----------------------------------------
AMD Ryzen 9 5950x
NVIDIA RTX 3090 FE
128 GB DDR4-3200
Windows 10 64-bit 21H1
[Oct 14, 2021 11:32:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 27   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread