| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 23
|
|
| Author |
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
According to the spec requirements for this project, it is supposed to be the same as FAAH, but while I could run all 64 threads with FAAH concurrently, I can't do that with HSTB. This morning, I noticed that several of my WU had elapsed time much greater than the CPU time and discovered that on several my machines the CPU usages wasn't 100%. The memory usages was only around 12%, so that shouldn't be a problem.
----------------------------------------I had to go and downloaded new UGM WUs so I could run a UGM/HSTB mix. Based my what I see, if I run 31 UGM and 33 HSTB then the CPU usage will max out at 100%, otherwise it doesn't. I estimated that I have lost about .6 days of crunching overall. Any idea why? edit: These machines used to run 64 MCM which requires higher spec without any problems. I have also tried to reduced the max frequency checkpoints can write to disk from 10 seconds to 8 seconds, without any help ![]() [Edit 1 times, last edit by anhhai at Mar 26, 2016 5:39:17 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If you're experimenting with whether excessive checkpoint writing is causing some bottleneck or holdup, shouldn't you be increasing the "Tasks checkpoint to disk at most every" setting, say to 300 seconds?
|
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
10 seconds... 8 seconds? As tonyh205 notes, WtD has to be up up up, particularly for HST1. If the science would follow the setting, you'd be hammering the sotrage at a speed of 8 / 64 (cores) is 1 checkpoint per 0.125 seconds, noting that the System Requirements page indicates output files of up to 10MB... not looked, but some reported seeing 6MB being uploaded.
2 gold sovereigns |
||
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
Tony, I just tried increasing the checkpoint time and no help either.
----------------------------------------![]() |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Some Spock logic: Why read the settings time and again? The app does not not, only once at startup/restart of a task, so if you haven't restart your client :O)
|
||
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
:)
----------------------------------------I did restart it, but it didn't go to 100%. It went to 96% and stayed their until I paused a few HSTB to run UGM ![]() |
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
:) I did restart it, but it didn't go to 100%. It went to 96% and stayed their until I paused a few HSTB to run UGM I have the same issue the only difference being I can't get to anywhere near 100% running HST1 exclusively.. Nothing I've tried so far has fixed it.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
Nanoprobe,
----------------------------------------Sorry, guess I wasn't clear. It got to 96% because I was only running 34-35 HSTB WU, rest were UGM. It was 96% before I restarted it and was still 96% right after I restarted it, this suggest that the check pointing isn't the issue. edit: Sorry, English isn't my first language ![]() [Edit 1 times, last edit by anhhai at Mar 26, 2016 6:39:37 PM] |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
:) I did restart it, but it didn't go to 100%. It went to 96% and stayed their until I paused a few HSTB to run UGM A long long time ago in a far far away land did some testing and found that above 15 minutes or so there was no efficiency gain to be had, so 5 minutes on 64 virtual cores with lots of data is still asking much. Old days setting max was 999 seconds [you can knock yourself out these days and set it to 99999 in the client, you could at one point]. If you never have to boot that system [Linux with KSplice never has to], who worries about loosing 15 minutes progress every blue moon? Oh, yes, did you say Windows or Linux? If dual boot, try the other platform. |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
BTW if you had 96% before after 10 hours, any gain from delaying checkpointing [really skipping] would take quite some time to become visible.
|
||
|
|
|