World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Mapping Cancer Markers Forum

Thread: ~35% sys calls, ok?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 33

[ ]

Author

This topic has been viewed 7303 times and has 32 replies

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: ~35% sys calls, ok?

@Bobby B
Exactly. My point is that some people want longer run times some people want shorter.

I am not wanting to shift this thread anymore off-topic than it is all I will say is if I was able to use Linux on my daily machine I would but unfortunately I require windows to many tasks

----------------------------------------

[Aug 20, 2023 2:09:51 AM]

!evil
Cruncher
Joined: Aug 3, 2023
Post Count: 4
Status: Offline


Re: ~35% sys calls, ok?

From what I can see, SVMlight calls clock for timing/verbosity, so those calls should be easy to avoid. I tried LD_PRELOAD using the wrapper approach, but the tasks simply fail. If I had to guess, boinc probably protects us from malicious code.

Regarding long vs short vs not running a task: the sys calls are wasted CPU cycles, that is, the research team could potentially get results significantly faster/more results within their time budget.

[Aug 26, 2023 3:07:41 PM]

thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 240
Status: Recently Active
Project Badges:

180 day badge for Drug Search for Leishmaniasis

200 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: ~35% sys calls, ok?

This problem seems to have disappeared on my linux systems crunching MCM.

48 cores - 2.3% system time (2696 V2)
32 cores - 0.1% system time (7950X)
40 cores - 0.8% system time (2680 V2)

There's been no announcement, but how are other systems running now?

[Oct 26, 2023 6:05:13 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2347
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project


Re: ~35% sys calls, ok?

I don't know how you measure that, thunder7.

One thing that has changed is that we no longer need to download the 102 MB file mcm1.dataset-sarc1.txt;
instead there is now a 48 MB curatedOvarian_EarlyLate_v1.0 file:

-rw-r--r--. 1 boinc boinc 48177891 Oct 20 20:23 ~boinc/projects/www.worldcommunitygrid.org/e55b6bdba4ed0b4b6e315c6767d68e3f.txt

Adri

[Oct 26, 2023 9:23:02 AM]

thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 240
Status: Recently Active
Project Badges:


Re: ~35% sys calls, ok?

I use de 'top' command, which shows

Tasks: 436 total, 33 running, 403 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.1 sy, 99.9 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 63432.7 total, 59847.3 free, 3010.1 used, 1265.1 buff/cache
MiB Swap: 123567.0 total, 123567.0 free, 0.0 used. 60422.6 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6405 boinc 39 19 78552 39452 2392 R 100.3 0.1 20:08.60 wcgrid_mcm1_map
6400 boinc 39 19 78552 39460 2392 R 100.0 0.1 21:22.31 wcgrid_mcm1_map
6414 boinc 39 19 78848 40544 2392 R 100.0 0.1 18:05.74 wcgrid_mcm1_map
6418 boinc 39 19 78552 38920 2392 R 100.0 0.1 17:03.67 wcgrid_mcm1_map
etc.

0.1 sy means 0.1% system time
99.9 ni means 99.9% time running low priority process(es) like WCG.

[Oct 26, 2023 9:49:30 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2347
Status: Offline
Project Badges:


Re: ~35% sys calls, ok?

Intel:

Tasks: 400 total,   9 running, 383 sleeping,   8 stopped,   0 zombie
%Cpu(s):  0.5 us,  1.1 sy, 50.5 ni, 37.6 id, 10.0 wa,  0.1 hi,  0.1 si,  0.0 st
MiB Mem :  15862.5 total,    351.2 free,   7581.0 used,   7930.3 buff/cache
MiB Swap:  32551.0 total,  28954.2 free,   3596.8 used.   5826.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3680229 boinc     39  19   78996  38928   2468 R  99.5   0.2  79:30.10 wcgrid_mcm1_map
3682583 boinc     39  19   78848  38808   2468 R  99.5   0.2  36:27.43 wcgrid_mcm1_map
3680288 boinc     39  19   78524  38716   2468 R  99.2   0.2  78:16.98 wcgrid_mcm1_map
3680320 boinc     39  19   78144  38132   2468 R  99.2   0.2  77:33.39 wcgrid_mcm1_map
3684449 boinc     39  19   78716  38728   2468 R  99.2   0.2   4:24.83 wcgrid_mcm1_map
3681839 boinc     39  19   78988  38844   2468 R  99.0   0.2  51:08.58 wcgrid_mcm1_map
3682066 boinc     30  10 1852368 397936 163728 R  22.0   2.4  11:40.14 wcgrid_opng_aut

Another one, AMD:

Tasks: 467 total,  20 running, 447 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.4 sy, 55.7 ni, 43.5 id,  0.1 wa,  0.3 hi,  0.0 si,  0.0 st
MiB Mem :  31997.5 total,  20781.1 free,   2649.9 used,   8566.5 buff/cache
MiB Swap:   8192.0 total,   8070.0 free,    122.0 used.  28323.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2528725 boinc     39  19   78392  38368   2468 R 100.0   0.1  35:42.73 wcgrid_mcm1_map
2529279 boinc     39  19   77656  37852   2460 R 100.0   0.1   8:06.72 wcgrid_mcm1_map
2528336 boinc     39  19   79032  38932   2468 R  99.7   0.1  51:16.46 wcgrid_mcm1_map
2528347 boinc     39  19   78392  38448   2468 R  99.7   0.1  49:55.28 wcgrid_mcm1_map
2528573 boinc     39  19   78392  38400   2468 R  99.7   0.1  42:17.72 wcgrid_mcm1_map
2528728 boinc     39  19   78392  38360   2468 R  99.7   0.1  35:37.67 wcgrid_mcm1_map
2528737 boinc     39  19   78988  38860   2468 R  99.7   0.1  34:32.53 wcgrid_mcm1_map
2528743 boinc     39  19   78392  38356   2468 R  99.7   0.1  33:36.44 wcgrid_mcm1_map
2528746 boinc     39  19   78384  38116   2468 R  99.7   0.1  33:09.48 wcgrid_mcm1_map
2528771 boinc     39  19   78988  37256   2468 R  99.7   0.1  29:36.03 wcgrid_mcm1_map
2529564 boinc     30  10 5348492 335972 145204 R  99.7   1.0   3:42.70 wcgrid_opng_aut

And yet another one, Intel:

Tasks: 266 total,   9 running, 257 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,2 us,  0,2 sy, 99,3 ni,  0,0 id,  0,0 wa,  0,3 hi,  0,1 si,  0,0 st
MiB Mem :   7611,8 total,   2455,4 free,   1354,3 used,   3802,1 buff/cache
MiB Swap:  15803,0 total,  15802,5 free,      0,5 used.   5805,1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 335636 boinc     39  19   79012  38680   2304 R  99,3   0,5 122:46.53 wcgrid_mcm1_map
 335659 boinc     39  19   78996  38604   2304 R  99,3   0,5 114:37.08 wcgrid_mcm1_map
 335720 boinc     39  19   78788  38132   2176 R  99,3   0,5 106:28.80 wcgrid_mcm1_map
 335737 boinc     39  19   78800  38260   2176 R  99,3   0,5  98:22.71 wcgrid_mcm1_map
 335790 boinc     39  19   78552  38248   2176 R  99,3   0,5  92:56.41 wcgrid_mcm1_map
 335855 boinc     39  19   78788  38164   2176 R  99,0   0,5  84:11.20 wcgrid_mcm1_map
 336119 boinc     39  19   77788  36188   2176 R  99,0   0,5  29:03.04 wcgrid_mcm1_map
 335928 boinc     39  19   78116  37588   2304 R  98,7   0,5  70:15.77 wcgrid_mcm1_map
 335588 boinc     30  10  403692 216800  63360 S   0,3   2,8   4:45.17 wcgrid_opng_aut

[Oct 26, 2023 11:40:01 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: ~35% sys calls, ok?

Regarding "This problem seems to have disappeared on my linux systems crunching MCM" (thunder7):

I think all the Ovarian work units use VMethod=LOO (which, from doing some reading about SVMlight, I think stands for Leave One Out...), and LOO tasks run better than NFCV (N-Fold?) on Linux, especially on older CPUs[*1]

I've not looked at the library code (perhaps !evil has, and can comment again?) but I'd hazard a guess that the Leave One Out training mechanism doesn't result in as much timer activity overall as does NFCV...

Whatever the case regarding LOO versus NFCV as training method, it should be noted that the Ovarian dataset is quite a lot smaller than the Sarcoma one, so tasks will run a fair bit faster anyway :-)

Cheers - Al.

[*1] On Sarcoma tasks, my Intel boxes tend to take between 40% and 60% longer to run if NFCV rather than LOO; on my Ryzen boxes (which have more FPU capability per core!) the difference is down to between 10% and 20%... I don't think the performance difference has that much to do with timers :-)

[Oct 26, 2023 3:28:05 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7850
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

100 year badge for Smash Childhood Cancer

100 year badge for OpenPandemics - COVID-19


Re: ~35% sys calls, ok?

Whatever the case regarding LOO versus NFCV as training method, it should be noted that the Ovarian dataset is quite a lot smaller than the Sarcoma one, so tasks will run a fair bit faster anyway :-)

I had noticed the first couple of days of the ovarian work units the run time was smaller than I previously for the previous units. The units must be getting bigger because the time for each unit seems to have grown from about 1.7 hours to about 3 hours. They must have started with the easy ones.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Oct 26, 2023 6:54:58 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:


Re: ~35% sys calls, ok?

Sgt, Joe,

[I should probably have said "probably run a fair bit faster" :-)]

Interesting that you've seen such significant increases -- it doesn't tally with my experience (even on my Intel systems, which are more prone to swings!) , but I don't doubt your assessment -- I'd just like to know what's going on (and why!)

I've started poking around in my gathered statistics and parameter logs to see if there's an obvious link that will differentiate my slowest and quickest results. A quick scan with the Mark One Eyeball shows that (unlike Sarcoma units) there are considerable variations in task parameters, some of which may well play into runtime consumed!

I'm going to have to write scripts to match up task parameters with task run-times, but I have done a sweep of CPU times for Ovarian tasks from start (very late September) to 20th October (as up-to-date as my easiest-to-query database, which only contains results for days where everything has validated!), counting tasks returned per day and average CPU hours. The total lists would make for a long post so I'll just show a sample from the results for one of my Ryzens (the 3700X),,,

Date       NTasks  Ave. CPUhours
2023-09-29      1      1.2469      (First result!)
2023-09-30     29      1.3649      (10% slower overall)
    ...       ...        ...
2023-10-12     44      1.3273      (Last day at that level)
2023-10-13     52      1.2176      (Back down to start!)
2023-10-14     42      1.2063
    ...       ...        ...
2023-10-18     56      1.2007
2023-10-19     20      1.1473      (And even lower!)
2023-10-20     24      1.155
              ---      ------
Total         831      1.2865

I've had a look at the results not yet entered into that database, and the times for that system all seem to be in the range 1.17..1.22 hours with an occasional outlier at around 1.4 hours or 1.1 hours, so there's still some variability -- I'll keep looking, and if I spot anything interesting when I start matching parameters to run times, I'll report :-)

Cheers - Al.

P.S. Of course, it could just be that they send my "2 or 3 concurrent task" systems the easy stuff :-)

[Oct 27, 2023 12:18:42 AM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7850
Status: Offline
Project Badges:


Re: ~35% sys calls, ok?

Alan:
Just a little further information:
I have just used the results from my T520 with dual Xeon E5-2665. Starting on Oct. 18 through Oct 23 the units ranged from 1.8 hours to 2.1 hours. Late on Oct 23 to early on Oct.24 the units fairly quickly ramped from 2.1 hours to 3.0 hours. From then until the present the units have gradually ramped up to a very consistent 3.2 hours. This spans a total of about 800 units. This is on a machine running Linux.
I have one machine running Windows (I7-3770) which has been consistent over all these days at 2.2 to 2.3 hours.

Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Oct 27, 2023 2:11:54 AM]

[ ]