Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: Discovering Dengue Drugs - Together Phase 2 BETA |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 192
|
Author |
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
Type A require LARGE memory (1.75MB) and have large result files. Thus they have the bandwidth limit set. Try GB. Perhaps next time you could use a more reasonable bandwidth. Some of us are being nobbled because we have more than one system. Four hungry quad core systems, but only one 10Mb internet connection. |
||
|
TimAndHedy
Senior Cruncher Joined: Jan 27, 2009 Post Count: 267 Status: Offline Project Badges: |
I ended up with one of the big ones succeeding, so it appears they do not all fail.
BETA_ erag_ a172_ ps0000_ 2 |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: |
My one ps0000 WU that survived the error 29 deathtrap crashed with the Access Violation Error after 20.5h, probably 80-90% done.
----------------------------------------Fellow crunchers who've "lost" lots of CPU hours on these tests, we haven't really wasted the time, as this had to happen to find the bugs in the new project. Our time has been well spent! And the lost points? The crunching is much more important than the points, and I expect the techs will eventually give us credit anyway. Right now, I'm sure they have more important things to do though. I agree that the bandwidth limits should be relaxed, but the underlying difficulty is to set limits that depend on the number of results that will need to be uploaded by each member. That in turn depends on the number and speed of his active devices, and the project mix in each. There is also the factor of the member's willingness to have WCG use a big fraction of his internet bandwidth. I suggest that at least there be a manual override box for DDDT2 (Type A) on the project selection page(s). [Edit 1 times, last edit by Rickjb at Oct 8, 2009 4:54:00 AM] |
||
|
X-Files 27
Senior Cruncher Canada Joined: May 21, 2007 Post Count: 391 Status: Offline Project Badges: |
I agree to the relaxed bandwidth limits as long as its not 56k dial-up. Tighten up the hardware spec though - low DCF should be best. Type A eats a lot of memory so if the computer is not a dedicated cruncher, trouble awaits.
---------------------------------------- |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
rDCF has nothing to do with bandwidth. It's a correction factor between actual expected computation duration and the original estimate of the fpops in the task header. That value on my quad bounces up and down between 3 and 0.7 when running periods of only HCMD2 [the very smallest system resource project] and even as observed for HFCC yesterday, the range in the last 3 days was between 2.25 and 11 hours on a duo dedicated to that project. All valid, all getting credit close to claimed.
----------------------------------------The beta of course is a way to find out what actual bandwidths were encountered on a global scale... it should, as commented, not be that the guy next to the server center gets them and no one in other parts of the world because they happen to sit 50 router hops away... the measurement will be as fast as the slowest link. But, we're exploding a topic. Take the Ratio of A to B to C work units. 1 A generates 2 B which generates 5000 C units, in quorum 2. Are the constraints for A only? Can live with that. Facts of live... at launch there will be only A type ;>)
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: |
@X-files 27: I think the Type As don't need so much physical memory (220MB ea) as virtual memory (page/swap file) (1.3GB ea).
And I think the virtual memory is not accessed very often. While my 2nd WU was running, there was no noticeable increase in my disc activity. Total page faults, which I assume is a proxy for accesses to system cache/page file, got to 60000 early on, and when I last looked, at about 50% done, it had crept up to only about 65k. For the first 11min, until my 1st WU got error 29, my quad was running 2 Type As plus 2 other WUs (HCC and/or FAAH) in 2GB physical memory, which is less than the 2 x 1.3GB VM needed by the 2 Type As alone. I have limited knowledge of current virtual memory/swappping characteristics though. For example, I recently discovered that 32- and 64-bit Windows XP may not handle swapping between WCG tasks beyond physical but not pagefile memory, properly. Suspended tasks can get killed with exit code 0. BOINC silently restarts these from checkpoints when need be, so the problem is usually hidden. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The Page Faulting that slows down the computations has very little to do with the swap file exchange. It's an CPU/RAM bottleneck issue that as has been proven, can be coded around, if the exact cause is found.
----------------------------------------OT, but can you expand on that last paragraph. Talking about checkpoint resumes on suspended tasks? They will if LAIM is not on.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: |
@Sek: (Quote):"OT, but can you expand on that last paragraph" - OT?
----------------------------------------Scenario: Quad with only 1GB memory, waiting for return of 2x1GB sticks from RMA. LAIM is ON. Running FAAH + HCC. I like to keep an even mix of the 2 running, so I micromanage when I'm around, which means suspending & resuming tasks. WTR tasks have absolute priority over RTS, so I grab next-to-run WUs by starting them, running them for a short while, then resuming previous tasks. This goes OK while starting a new WU, and while it allocates itself memory about 2min into the run. (At this point, BOINC Progress moves off 0%). However, when I then suspend the new task to resume a previous one, there is the likelihood that all suspended tasks will be spat out, even though Task Mgr Commit Charge is well below Limit. The tasks will disappear from Task Manager, and BOINC Messages will suggest resetting the project, but they will stay unchanged in BOINC Tasks list. When they resume, they will start from 0 CPU time in Task Manager, but from the checkpointed time (I asume) in BOINC. Only tasks that were suspended very near the start, restart from 0% in BOINC. Workaround: Suspend the next-to-run WUS after only a few seconds, before they start grabbing lots of memory. They stay in their original spot in the Tasks list where I may not notice later that they are WTR, though. This workaround may not work for DDDT2-Type A, because these go to 170MB at or very close to startup. [Edit 2 times, last edit by Rickjb at Oct 8, 2009 12:15:16 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Yes, well, what could I say. Don't do that if you know it breaks tasks...
----------------------------------------There's an item on that wishlist for a very very long time... Allow to set a limit of same number of apps to run concurrently. One is temperature, yes 4 Primegrid raises my systemps 6C, another is memory reasons. Doubt it will ever come. There's some interesting functionality under the hood in 6.10, so by the time DDDT-2 launches I'll be trying to break that client, playing with the memory options and swap file limitations. At my own risk of course.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Is there any word on when/if the errored wus will be resent? I see some of my results have other wu's in the "waiting to be sent" status.
|
||
|
|