Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 152
Posts: 152   Pages: 16   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 497174 times and has 151 replies Next Thread
Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Both quad core and 8 core (no HT) is running Ubuntu 10.10. The quad has 8GB ram while the 8 core has 4GB ram. My 8 core/16 thread went back to win7 which has 12GB triple ram channel. The others are running in dual channel. Thought about filling the rest of the slots for quad channel but not worth investing in the older technology. Trying to sell both of my harpertown pc's to build a SB pc.
----------------------------------------
Crunching for humanity since 2007!

[Jan 18, 2011 11:30:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Ingleside,

If this works and provides improved efficiency, great, but we tried similar on Linux to link files x-slot and it failed, corrupting results. knreed commented that it was in contravention of BOINC slot-isolation design.

Hmm, if not mis-remembers, the linux-approach tried to be "clever" by linking any non-zero files that was equal under /slots/ and possibly the whole BOINC data-dir. But it's possible one (or more) temporary file, appart for /qcaux, is equal at task-start, and if got linked any differences later-on in the run would lead to corruption...

Looking forward to hands-on reports. Anything that also could reduce the ''used'' and ''reported'' BOINC dir data store could possibly also help the improve the RAMDisk approach.

Using linkd won't decrease disk-usage.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Jan 18, 2011 11:52:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Dear Hypernova,
We are sorry to hear about the problems you experienced. We had a few server glitches in the beginning which we fixed back then, so your upload problem may already be ironed out by now. If not, please try our current beta test, where we replace HTTP with HTTPS.
It would be helpful for us and the IBM team is you could specify your ‘simply error out’ (in case it still persists) – maybe there is something we can do about or at least learn from it.

Best wishes,

Your Harvard CEP team


I should have a new rig (Luna) which should start this coming week-end. I will experiment again with CEP2 and let you know what happens.
----------------------------------------

[Jan 20, 2011 11:22:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Luna has started and is in operation now, crunching CEP-2 exclusively.
I have a 980X running 12 threads at 4.2 Ghz. The rig has 4GB of DDR3 running at 1900 MHz.
I have already 15 Valid WU, and another 4 in PV mode. The WU's are all of the 635 version. They have a runtime per WU that varies between 3.8 hours and 5 hours. One has lasted 6 hours. Too early for big stats but it looks promising as no errors for now. smile
I got some weird messages but it has not interrupted the crunching nor generated errors so I ignore them.
I will continue keeping this machine running exclusively CEP-2 and I will do a new stats update in about 10 days when the flow will have stabilized with the WCG servers.
----------------------------------------

[Jan 22, 2011 7:04:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Something changed. Whatever ''drives'' this, the daily number of tasks stayed fairly equal in the past weeks, but the average runtime dropped by about 1.25 hours per task and it wont be Luna doing this ;>). The shorter completion times translates to less runtime years overall being contributed, which has dropped from 15 to 13 per day. Fractionally some of that may be the DDDT2 rains as diversion, but can't see that as the main cause.
---Date---------Hrs/Tsk--Tasks/Day
JAN.01-11 8.88457 14,589
JAN.02-11 8.88461 15,125
JAN.03-11 8.96143 15,384
JAN.04-11 8.98109 14,581
JAN.05-11 9.00964 13,954
JAN.06-11 8.98414 13,472
JAN.07-11 8.90282 13,635
JAN.08-11 8.90202 13,230
JAN.09-11 8.93609 13,926
JAN.10-11 8.86320 14,571
JAN.11-11 8.84945 14,539
JAN.12-11 8.80440 14,308
JAN.13-11 8.79029 14,644
JAN.14-11 8.86567 14,748
JAN.15-11 8.71518 15,167
JAN.16-11 8.68457 15,103
JAN.17-11 8.53599 15,380
JAN.18-11 8.15219 15,313
JAN.19-11 7.91847 15,641
JAN.20-11 7.77700 14,671
JAN.21-11 7.61555 15,022

[Jan 22, 2011 7:25:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Sekerob, the reason is simple. Clean energy recently (within the last 2 weeks) said that their current batch of WUs are shorter run jobs. Secondly, shorter run jobs + limits on how many CEP2 WU can be on each system means less runtime period.

Personally on my 16 thread system, I have it set for a 6 CEP2 WU limit and my cache is 1.2 days. So no matter how short or long the CEP2 WU's are, I will only crunch 6 per 1.2 days on those systems. I choiced 6 because most WUs were taking 10-12 hrs for me. Now I noticed a lot of WU finishing up in 6-8 hrs.
----------------------------------------

[Jan 22, 2011 7:45:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Got a link for that statement of "shorter"? Was it cleanenergy or one of the techs posting this notice? It thoroughly escaped me but for going over the stats and seeing these shorter run time averages...

And time for you to up the number to cache, since you read it ;>)
[Jan 22, 2011 8:01:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Sekerob, apparently I lied/mistyped. It was actually a little over a month ago that cleanenergy stated this little fact. See snippet below along with the link.
I guess I can increase the limit up from 6 to 7, I usually didn't see any noticable difference in performance (wasted cpu time) when I was crunching 3 at a time vs 1 or 2. My goal was to keep it limit my machines to just crunching a max of 3 at a time.


Disc and memory requirements also increase with system size. Our library currently contains about 10 million candidate molecules which are for now sorted by size. They get sent to the grid in (more or less) continuously ascending order (the current max size is 499 of some complicated unit).
The next batch of 2.7 million candidates is made up of recently generated molecules of a size between ~250 and 499. Hence, there will be smaller ones again which are out of order but will go up to the same size as the currently biggest molecules. So there will be no big jump in size anytime soon. We will be able to make some more quantitative statements when we compile our internal statistics early next year.


http://www.worldcommunitygrid.org/forums/wcg/...ad,30547_offset,20#307597

Edit: Instead of increasing my CEP2 WU from 6 to 7, I just decrease my cache size a little bit.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by anhhai at Jan 22, 2011 8:57:11 PM]
[Jan 22, 2011 8:42:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

Thanks, now I remember that bit... so it will creep up again, but with 2.7 million (before quorum duplication?), at 15k units daily, that will be a long way to go.

Increase your cached number by the factor the jobs run shorter and you'll likely maintain the same "concurrency". Could probably plot that in a curve using 1 core and cache size and then multiply by the number wanted to run concurrent. It seems in practice that the scheduler is tuned to priority backfill any CEP2 shortfall on the next work-fetch calls.

Sample: Cache 1.5 days. Run time 8 hours: 36 hours / 8 = 4.5 * 2 Cores (half of quad), makes it 9 **. Has to be mixed with sciences that are not limited in supply. Not tested this, but over time this could balance out. My 1.2 day cache worked with a setting of 5, to get a quasi 2 out of 4 on the quad, but that was with longer running work units. Then had to cut it to 4, but that led also to periods that no CEP2 ran, so then just manually kicked them from the rear (but only when checking in a few times a day), which then on completion hastened the backfill again.

** As choice in the New [CEP2] Project Setting

edit: added ** reference
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 23, 2011 1:46:47 AM]
[Jan 22, 2011 9:26:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about participation drive for CEP2

I use 3 day cache as a standard for all machines and all projects.
----------------------------------------

[Jan 22, 2011 10:03:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 152   Pages: 16   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread