Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: CEP2 beta for windows - Version 6.25 |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 311
|
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
As far as the bandwidth limits go, we are likely going to be adopting the 6.10 client shortly. It includes a feature that lets you set your max data transferred. We will modify the server so that if you are using a 6.10 client and have set that field then we will ignore the bandwidth check. This will make sure that people are protected from excess data transfers. To translate this how it is shown in the 6.10 preferences, not yet integrated into the WCG device profiles is: - Goto to advanced preferences, network tab - Enter a value of Max Bandwidth use, which for my duo I've set to 400MB (up+dw) - Enter the period over which the limit should work. I've set 1 day - Specifically set a limit on the upload BW, e.g. 128kB (which for CEP2 then limits concurrent upload speed to 128 / X-files allowed concurrently, default 2 i.e. 64kB if 2 of the _4 result files upload). Limiting upload speed leaves room for improved download speed as measured by BOINC. There is a cross-effect where DownLoad is impacted by UpLoad throughput. Note that this is a per-device. Anyone having multiple devices and actually has ISP overall period limits will need to calculate how much the restriction per device needs to be, sized to the number of cores. Of course if you have no limits, set the value as high as you like. At any rate, set any value and the WCG server will be content to ignore the bwdown speed value, for WCG! See FAQ of July: http://www.worldcommunitygrid.org/forums/wcg/...ead,29406_offset,0#287348 which will be updated shortly with a new local prefs screenshot and the periodic bandwidth budget, not yet available in 6.10.17, but is in 6.10.58
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges: |
I just checked back from travel my Beta status and I see that my Saturn device picked a few of them. In total 23 of them.
----------------------------------------Unfortunately here is the present status: 10 Error, 0 CPU time, 0.0/0.0 1 Error, 3.79 CPU time 113.4/0.0 1 Error, 0.10 CPU time 2.8/0.0 2 Server Aborted 0 CPU time 0.0/0.0 6 In Progress 3 Valid Saturn is W7 64 bit, 6GB RAM, I7 980X 4Ghz, HT on. Here under the two logs of the typical error with 0 CPU time and with effective CPU time: Result Log Result Name: BETA_ E200365_ 738_ A.24.C19H12N2OS2.113.4.set1d06_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [18:00:45] Number of jobs = 16 [18:00:45] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:10:41] Number of jobs = 16 [18:10:41] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:20:45] Number of jobs = 16 [18:20:45] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting ....... ....... ....... etc. etc. etc. (many many pages long) ....... ....... No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:47:20] Number of jobs = 16 [18:47:20] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [19:07:21] Number of jobs = 16 [19:07:21] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting </stderr_txt> ]]> And here the log of the one that errored after more than 3 hrs CPU time: Result Log Result Name: BETA_ E200367_ 985_ A.24.C19H12N2S3.194.1.set1d06_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [16:57:23] Number of jobs = 16 [16:57:23] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [17:37:18] Number of jobs = 16 [17:37:18] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:17:18] Number of jobs = 16 [18:17:18] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:57:20] Number of jobs = 16 [18:57:20] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting ........... ........... ........... INFO: No state to restore. Start from the beginning. [18:09:27] Number of jobs = 16 [18:09:27] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:19:27] Number of jobs = 16 [18:19:27] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:29:27] Number of jobs = 16 [18:29:27] Starting job 0,CPU time has been restored to 0.000000. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [18:39:29] Number of jobs = 16 [18:39:29] Starting job 0,CPU time has been restored to 0.000000. [18:41:11] Finished Job #0 [18:41:11] Starting job 1,CPU time has been restored to 96.564619. [18:46:26] Finished Job #1 [18:46:26] Starting job 2,CPU time has been restored to 364.340335. [20:24:12] Finished Job #2 [20:24:12] Starting job 3,CPU time has been restored to 5708.014190. [20:29:46] Finished Job #3 [20:29:46] Starting job 4,CPU time has been restored to 6006.007300. [20:33:10] Finished Job #4 [20:33:10] Starting job 5,CPU time has been restored to 6207.934994. [20:36:58] Finished Job #5 [20:36:58] Starting job 6,CPU time has been restored to 6422.545570. [20:40:22] Finished Job #6 [20:40:22] Starting job 7,CPU time has been restored to 6623.989661. [20:45:05] Finished Job #7 [20:45:05] Starting job 8,CPU time has been restored to 6892.638983. [20:48:23] Finished Job #8 [20:48:23] Starting job 9,CPU time has been restored to 7088.357838. [20:52:00] Finished Job #9 [20:52:00] Starting job 10,CPU time has been restored to 7303.046414. [20:59:47] Finished Job #10 [20:59:47] Starting job 11,CPU time has been restored to 7755.340113. [21:04:07] Finished Job #11 [21:04:07] Starting job 12,CPU time has been restored to 8002.820100. [21:38:32] Finished Job #12 [21:38:32] Starting job 13,CPU time has been restored to 10035.528730. [22:38:56] Finished Job #13 [22:38:56] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [23:29:10] Number of jobs = 16 [23:29:10] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [23:39:11] Number of jobs = 16 [23:39:11] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [23:49:11] Number of jobs = 16 [23:49:11] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting .......... .......... ........... [15:20:18] Number of jobs = 16 [15:20:18] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [15:30:19] Number of jobs = 16 [15:30:19] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [15:40:20] Number of jobs = 16 [15:40:20] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [15:50:20] Number of jobs = 16 [15:50:21] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [16:00:21] Number of jobs = 16 [16:00:21] Starting job 14,CPU time has been restored to 13655.921937. No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting </stderr_txt> ]]> [Edit 2 times, last edit by Hypernova at Sep 22, 2010 8:29:49 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I hope this is not another cruncher unfriendly project .I dont no ! It´s nothing at me. Greetings |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
hmmmm, maybe I did something stupid. I will try it again later. Thx X-Files Same like me ! So, enough critic about the Bandwidth. Greetings |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
CEP2 is a "Read the My Projects preferences page with explicit invitation to consider the system requirement, manually opt-in" research project!
----------------------------------------Cruncher, be aware before selecting and as with any new application something that might need to have it's exceptions set in the security software. No Heartbeat does mean that the system was very busy, probably mighty busy with IO running 6 concurrent. Maybe the rule could be considered to be n / 3 cores rounded up for fractions so a duo still gets 1 and a regular quad 2 but a Hex getting 4 instead of 6 with HT switched on. Certainly running 4 concurrent on my Linux box did not kill anything, only valids returned, but lower efficiency.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Sep 22, 2010 9:19:28 AM] |
||
|
gb077492
Advanced Cruncher Joined: Dec 24, 2004 Post Count: 96 Status: Offline |
I'm definitely not convinced about this download bandwidth measurement.
I have a four-way machine that was running a couple of betas. By the time one of them finished the only additional downloads it had done were a single DDT2 job and the master file. Yet I still got a message to say the bandwidth was too low. I downloaded an HPF2 WU which "fixed" it and it then it could get a couple of new beta WUs. This machine is on the IBM internal network, which is no slouch! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm definitely not convinced about this download bandwidth measurement. I agree with you. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
gb077492, it's the small(er) files that cause the download speed rating to deteriorate rapidly, particularly a series. HPF2 tasks take long enough (1 MB or more) to lift the mean measured speed up to it's near true value, hence why knreed is implementing this work around.
----------------------------------------Some projects outside WCG seem to have "real" atrociously poor speeds on the DL so when these clients come to WCG they get affected too. Of course WCG would not up front want to shut out new arrivals.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
gb077492
Advanced Cruncher Joined: Dec 24, 2004 Post Count: 96 Status: Offline |
HPF2 tasks take long enough (1 MB or more) to lift the mean measured speed up to it's near true value I deliberately chose the project with the largest download file size on that assumption. it's the small(er) files that cause the download speed rating to deteriorate rapidly So only measure the speed for files larger than some threshold and ignore the smaller ones as noise. Wouldn't that help? Mike |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Yes, but how to tell BOINC? WCG is not measuring, only the client and then tells WCG! A developer issue for Berkeley I'd say. As eluded, knreed is setting up a trick for client 6.10. Anyone who specifies a bandwidth budget (daily, bi-daily, weekly, monthly) will get big bandwidth needing tasks so those coming in from projects outside WCG that are having real low bandwidths (the projects themselves) in or only small files will be able to contribute too.
----------------------------------------As for the C4CW comment somewhere on small files: There are no files exchanged once one has a C4CW task on the client of a target... nothing to measure. C4CW has a seed/departure point table and the server tells which seed to pick to get a new task popping up in the task list. To summarize: The BETA test is again used to not only learn on the task but also on it's grid load behavior and techs working to facilitate least impairment without throwing the child out with the bathwater (no gauges).
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
|