| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 7
|
|
| Author |
|
|
Jean-David Beyer
Senior Cruncher USA Joined: Oct 2, 2007 Post Count: 339 Status: Offline Project Badges:
|
For many projects, this is one of the items in their requirements. But what is supposed to happen when this is set? It so happens that I do enable this option when requested. But I do not understand what the effect is expected.
----------------------------------------In todays multiprogramming environments, with automatic memory management demand paging, if the OS suspends a process to allow another to run, and there is not enough RAM, the memory management part of the OS writes the least recently used RAM out to swap. The only way to stop this, in Linux anyway, is with the use of the mlock(), mlock2(), and munlock() system functions. And to work, the process invoking them must be privileged. Since boinc processes are not privileged, what does leaving suspended processes in memory accomplish? ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Workunits that are running save "checkpoints" from time to time. So when you reboot your computer, they do not proceed at the point where they stopped before reboot, but at the last checkpoint they saved.
Sometimes, workunits are suspended. Either manually by the user, or by BOINC, e.g. to first compute another workunit with a nearing deadline. This has nothing to do with the OS suspending the process for a short time! Now what happens, if a work unit is suspended and resumed later? If 'Leave applications in memory while suspended' is NOT checked, the behaviour will be the same as with a reboot: the workunit proceeds at the latest saved checkpoint, losing some of the work done. If the option is checked on the other hand, the process will remain in memory and proceed later exactly where it stopped, no work is lost. I would suggest to always check that option, unless you have a good reason not to. This could be the case for projects consuming large amounts of memory and you are short of memory. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Jean-David,
I'm far from the most technical person on the forums, but my first point would be to note that it is possible to, and some do (including me), run with swapping off, as swapping also takes system resources and slows things down. I prefer to run with enough memory to do what I want, and try not to overload the system. My second point is that BOINC does application scheduling of its own. I don't run multiple projects (I'm counting WCG as a single project, and I regard all the 'projects' that WCG runs as sub-projects) and, when switching between projects, it may choose to save system resources by unloading the project being preempted. (There may be other situations; I don't know much in this area.) I'm sure it does its best not to lose too much processing time, but with things like ARP that run for many hours between checkpoints, I feel sure that processing would be lost unless LAIM was set. That makes it a good idea to recommend its use in such circumstances. I'm sure others can provide more/better information, but I hope this helps. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The setting is one time global applying to all projects on a client.
Project swapping is in principle only done at checkpoint unless a high priority job pops up. It's those moments when you want to keep a job in memory, particularly if the intervals are many hours apart, else the job resumes from the previous checkpoint, the reason why WCG recommends it for ARP1. The exception is the first checkpoint. A running task is kept in memory until it reaches the first checkpoint. |
||
|
|
Jean-David Beyer
Senior Cruncher USA Joined: Oct 2, 2007 Post Count: 339 Status: Offline Project Badges:
|
Project swapping is in principle only done at checkpoint unless a high priority job pops up. It's those moments when you want to keep a job in memory, particularly if the intervals are many hours apart, else the job resumes from the previous checkpoint, the reason why WCG recommends it for ARP1. Project swapping under control of the boinc client may follow the rules you suggest. But the Linux process scheduler follows its own rules, as does its memory management. Since these can happen at any time, and since it has no idea about any boinc checkpointing, this parameter to the boinc client makes little difference to what really happens. At some point, some Linux process may need more and more RAM and to get it, some boinc tasks may not only be suspended, but their RAM could be swapped out. Now just before, or soon after (as needed) swapped out pages may be swapped back in. And the running process has no way of knowing that this has happened. So what practical difference does this make? ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So what practical difference does this make? If LAIM is off, BOINC will remove it from memory. If swapping is off, all that you say happens, doesn't happen. Edited to add: Which means that, when a WU that has been removed from memory restarts it will HAVE to restart form the last checkpoint. This is not good if an ARP1 WU has been running for four hours since the previous checkpoint -- four hours work lost! [Edit 1 times, last edit by Former Member at Nov 10, 2019 8:35:21 PM] |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
Jean-David Beyer said:
----------------------------------------So what practical difference does this make? I think you're letting swapping confuse you with this setting. It has nothing to do with it. LAIM has everything to do with checkpointing: If a task is suspended for any reason with LAIM off, or if the system is rebooted or shut down, the work unit resumes from the most recent checkpoint, losing all work since the last checkpoint (which could be hours of work). If LAIM is enabled, however, when a task is suspended and then resumed, instead of resuming from the most recent checkpoint (from disk), it resumes exactly where it left off. I can't think of any reason to have it disabled. It's a good setting.
|
||
|
|
|