Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3520
|
![]() |
Author |
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks, Kevin
Generations up to 079 have progressed in the last 4 days by 29,550 units out of 34,815 units returned, so 85% of total returns. We seem to be cracking the laggards. We have moved on a generation in the 4 days and the total outstanding units to complete generation 086 is 191,727 compared with 191,544 to complete generation 086 4 days before Mike. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2218 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Crystal Pellet posted:
All last 8 ARP-tasks received were from wingman errors from the types: couldn't start app: Can't get shared memory segment name: shmget() failed or couldn't start app: CreateProcess() failed - A required privilege is not held by the client. Seeing a lot of them at the moment, where this one takes the cake: workunit 776245489: ARP1_0008692_087_2-- Linux - In Progress 8/11/21 10:55:54 8/19/21 10:55:54 0.00 0.0 / 0.0 Details:Project Name: Africa Rainfall Project |
||
|
Acibant
Advanced Cruncher USA Joined: Apr 15, 2020 Post Count: 126 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
As best I can tell with some searching, the shared memory segment error should have been fixed in an older version but could still occur if somehow more tasks than cores/threads available are started. Citation.
----------------------------------------And the privilege not being held by the client seems to stem from a service install of BOINC but in a situation where the account running the service no longer has the appropriate rights. Citation. Unfortunately we'd have to have the people running those clients examine their own configurations and post them here to have any hope to resolve the issues. ![]() |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I see that generation 080 has joined the ranks of the stragglers.
I haven't been getting many stragglers. I have mostly been getting re-sends classified as priority. The prblem is that the one normal unit I have keeps being pushed back by new priority cases, so it will take at least 3 days to be returned, and counting. It will infringe the 'reliable' status. Mike |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1330 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm running 8 at a time. All 8 again shorter deadlines.
3 because they are stragglers: generation 075, 077 and 080 and 5 because of wingmen errors. To keep your reliability, only request new ARP-work, when the running ones are almost ready and evt. push them to the front of the queue by suspending OPN's etc. |
||
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 274 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Hello Mike
I’ve got my first 087: ARP1_0005102_087 Cheers, Mark |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you, Mark
087 indicates we are at about 47.5%, but for this month I am again assuming 2 generations behind to allow for the stragglers, so 46.4%. The latest interval is 4.66806 days and the 10-interval average is down to 2.94800 days. The end date forecast would have been May 2022, but, based on Kevin Reed's data on the stragglers, I expect it to be about October or November 2022. I would expect the next generation to start about 16 August. Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Crystal Pellet
Thanks for the reply. Your suggestion would work but it would require me to keep changing my cache limits up and down every time a unit was about to finish, including overnight. I have an 8-thread machine so am running 4 ARP using app_config.xml and holding just 1 spare by having a limit of 5 in my profile. Occasionally, I get one with the standard deadline, so that becomes my spare. Then I get a whole series of priority cases, mostly resends. Each time I finish a unit my spare starts up and then stops again when a new priority unit arrives. It has now amassed 22 minutes and 35 seconds in bursts of, say, 2 minutes every 6 hours or so over the last nearly 3 days. In another day it will be higher in the pecking order than new priority units and will then complete. By the time it gets going I will have had it for 4 days and it will take another day, so 5 days total. Mine is a reliable machine, almost never producing errors, but this unit will infringe the 'reliability' test of WCG. It is simply an effect of the catch-up exercise on the stragglers. I think I will put up with it if it gets demoted. It would soon get promoted again. Mike |
||
|
leloft
Cruncher Joined: Jun 8, 2017 Post Count: 23 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm running 8 at a time. All 8 again shorter deadlines. 3 because they are stragglers: generation 075, 077 and 080 and 5 because of wingmen errors. To keep your reliability, only request new ARP-work, when the running ones are almost ready and evt. push them to the front of the queue by suspending OPN's etc. I have been managing a heavily overloaded work cache through a pro-active use of app_config.xml as trying to do it through device profiles alone just doesn't work. Some work units have developed estimated times that are very close or equal to the deadline. I have suspended 12 units with the most favourable 'est to deadline' times so that only 12 of the 24 remaining are running; these 12 now utilise all 24 cores to different extents and seem to be reducing the 'est' times significantly: certainly, they appear to progress visibly faster in boinctui. This has led me to consider a new strategy: If the device profile were set to maintain a small (0.5d) cache with a number of spare units with CPU availability set at 100%, and app_config set to keep a core (or more) free (e.g ARP 12/24, OPN 6/24, MCM 5 (or less)/24) for boinc to use for 'on-demand' parallel processing of sub-tasks, is it possible that more total work could get done per unit time? If so, would 'optimal' cache and app_config values look something like these? Many thanks |
||
|
maeax
Advanced Cruncher Joined: May 2, 2007 Post Count: 142 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have a device Profile (f.ex. work) with all used Projects.
----------------------------------------ARP is set to two tasks, all other unlimited. No app_config. All is running well. A second Profile is without ARP for the other Computer. Edit: Every day about 60 days WCG-work.
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
----------------------------------------[Edit 1 times, last edit by maeax at Aug 12, 2021 12:48:10 PM] |
||
|
|
![]() |