Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 28
Posts: 28   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7424 times and has 27 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

Luck of the draw still, I suspect. It looks as if your _1 and _2 wingmen exited in Job #0 and so agreed with each other, while your _0 exited in Job #3 so was deemed invalid. Do the result logs concur with that guess? Probably nothing to do with Windows version.
[Apr 14, 2016 8:35:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

The final result for that first workunit of mine that exited with RC = 0x1 in Job #0 is that my _1 turned Invalid, while _0 & _2 turned Valid after both exited in Job #3. Another luck of the draw situation that would be disappointing in production.

BETA_ E236441a_ 240_ S.482.C54H18N4O2S6.SVBPLOQAQKCZBI-UHFFFAOYSA-N.12_ s1_ 14_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 700 Valid 14/04/16 10:09:27 15/04/16 01:47:25 12.07 515.3 / 390.3
BETA_ E236441a_ 240_ S.482.C54H18N4O2S6.SVBPLOQAQKCZBI-UHFFFAOYSA-N.12_ s1_ 14_ 1-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 700 Invalid 13/04/16 18:18:08 13/04/16 20:25:25 1.91 75.1 / 75.1
BETA_ E236441a_ 240_ S.482.C54H18N4O2S6.SVBPLOQAQKCZBI-UHFFFAOYSA-N.12_ s1_ 14_ 0-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Valid 13/04/16 18:17:33 14/04/16 10:09:17 9.37 265.4 / 390.3
[Apr 15, 2016 6:47:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

Can one of the techs explain the reasoning behind the tasks running for 8,9 or even 10+ hours before the first checkpoint?
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Apr 15, 2016 5:13:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

S.482.C54 ... the higher the C number, the more complex the molecule [and this is a far from original 'reasoning'].
[Apr 15, 2016 5:20:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

After rebooting the laptop the WU came back with 0% Progress and 12 hrs to run. (Restart?)
To the best of my memory, this job was around 70% complete with 2 hrs to finish after 10hs run time.

Currently it is the only job running on an 8 processor LT. All other WU are suspende which has not really made it run faster.

I plan to run the following tests. Would a different test order be preferred?

1. Suspend & Resume
2. Suspend at checkpoint & Resume
3. End and Restart BOINC Manager
4. Reboot
5. Shutdown & Start

BETA_ E236441a_ 784_ S.492.C59H27N1O5S4.PURPVAQVVKDEJG-UHFFFAOYSA-N.8_ s1_ 14_ 2--
LesterGordon
In Progress
4/14/16 17:24:29
4/18/16 17:24:29
0.00 / 0.00
0.0 / 0.0

BoincTasks properties:
2016-04-15 10:19 AM

Computer: LesterGordon
Project World Community Grid

Name BETA_E236441a_784_S.492.C59H27N1O5S4.PURPVAQVVKDEJG-UHFFFAOYSA-N.8_s1_14_2

Application beta11 7.00
Workunit name BETA_E236441a_784_S.492.C59H27N1O5S4.PURPVAQVVKDEJG-UHFFFAOYSA-N.8_s1_14
State Running
Received 4/14/2016 10:24:08 AM
Report deadline 4/18/2016 10:24:29 AM
Estimated app speed 3.07 GFLOPs/sec
Estimated task size 86,121 GFLOPs
CPU time at last checkpoint 00:00:00
CPU time 12:45:39
Elapsed time 12:55:28
Estimated time remaining 02:40:00
Fraction done 70.894%
Virtual memory size 417.28 MB
Working set size 251.71 MB
Directory slots/8
Process ID 2383
----------------------------------------------
BoincTasks properties:
2016-04-15 09:28 AM

Computer: LesterGordon
Project World Community Grid

Name BETA_E236441a_784_S.492.C59H27N1O5S4.PURPVAQVVKDEJG-UHFFFAOYSA-N.8_s1_14_2

Application beta11 7.00
Workunit name BETA_E236441a_784_S.492.C59H27N1O5S4.PURPVAQVVKDEJG-UHFFFAOYSA-N.8_s1_14
State Running
Received 4/14/2016 10:24:08 AM
Report deadline 4/18/2016 10:24:29 AM
Estimated app speed 3.07 GFLOPs/sec
Estimated task size 86,121 GFLOPs
CPU time at last checkpoint 00:00:00
CPU time 11:54:44
Elapsed time 12:04:27
Estimated time remaining 02:42:09
Fraction done 66.179%
Virtual memory size 417.31 MB
Working set size 241.91 MB
Directory slots/8
Process ID 2383

MESSAGE LOG:

LesterGordon

587 4/15/2016 9:34:34 AM log flags: file_xfer, sched_ops, task, checkpoint_debug, task_debug
586 4/15/2016 9:34:34 AM Config: GUI RPCs allowed from:
585 4/15/2016 9:34:34 AM Config: GUI RPC allowed from any host
584 4/15/2016 9:34:34 AM Config: report completed tasks immediately
583 4/15/2016 9:34:34 AM Not using a proxy
582 4/15/2016 9:34:34 AM Re-reading cc_config.xml
581 4/15/2016 9:34:31 AM [time] dt 10.010824 w2 0.999988 on 0.999088; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
580 4/15/2016 9:34:21 AM [time] dt 10.011783 w2 0.999988 on 0.999088; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
579 4/15/2016 9:34:11 AM [time] dt 10.010726 w2 0.999988 on 0.999088; active 0.999954; gpu_active 0.999954;

bla bla bla to save space

conn -1.000000, cpu_and_net_avail 0.999954
495 4/15/2016 9:20:08 AM [time] dt 10.009042 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
494 4/15/2016 9:19:58 AM [time] dt 10.007967 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
493 4/15/2016 9:19:48 AM [time] dt 10.008128 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
492 4/15/2016 9:19:38 AM [time] dt 10.008095 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
491 4/15/2016 9:19:28 AM [time] dt 10.008527 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
490 4/15/2016 9:19:23 AM log flags: file_xfer, sched_ops, task, checkpoint_debug, task_debug, time_debug
489 4/15/2016 9:19:23 AM Config: GUI RPCs allowed from:
488 4/15/2016 9:19:23 AM Config: GUI RPC allowed from any host
487 4/15/2016 9:19:23 AM Config: report completed tasks immediately
486 4/15/2016 9:19:23 AM Not using a proxy
485 4/15/2016 9:19:23 AM Re-reading cc_config.xml
484 4/15/2016 9:19:18 AM [time] dt 10.038291 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
483 4/15/2016 9:19:08 AM [time] dt 10.045100 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
476 4/15/2016 9:17:58 AM [time] dt 10.033996 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
475 4/15/2016 9:17:49 AM log flags: file_xfer, sched_ops, task, checkpoint_debug, task_debug, time_debug
474 4/15/2016 9:17:49 AM Config: GUI RPCs allowed from:
473 4/15/2016 9:17:49 AM Config: GUI RPC allowed from any host
472 4/15/2016 9:17:49 AM Config: report completed tasks immediately
471 4/15/2016 9:17:49 AM Not using a proxy
470 4/15/2016 9:17:49 AM Re-reading cc_config.xml
469 4/15/2016 9:17:48 AM [time] dt 10.032776 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
468 4/15/2016 9:17:38 AM [time] dt 10.033031 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
458 4/15/2016 9:15:59 AM log flags: file_xfer, sched_ops, task, checkpoint_debug, task_debug, time_debug
457 4/15/2016 9:15:59 AM Config: GUI RPCs allowed from:
456 4/15/2016 9:15:59 AM Config: GUI RPC allowed from any host
455 4/15/2016 9:15:59 AM Config: report completed tasks immediately
454 4/15/2016 9:15:59 AM Not using a proxy
453 4/15/2016 9:15:59 AM Re-reading cc_config.xml
452 4/15/2016 9:15:58 AM [time] dt 10.024863 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
451 4/15/2016 9:15:47 AM [time] dt 10.023308 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; c
440 4/15/2016 9:13:57 AM [time] dt 10.035479 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
439 4/15/2016 9:13:57 AM log flags: file_xfer, sched_ops, task, checkpoint_debug, task_debug, time_debug
438 4/15/2016 9:13:57 AM Config: GUI RPCs allowed from:
437 4/15/2016 9:13:57 AM Config: GUI RPC allowed from any host
436 4/15/2016 9:13:57 AM Not using a proxy
435 4/15/2016 9:13:57 AM Re-reading cc_config.xml
434 4/15/2016 9:13:47 AM [time] dt 10.018375 w2 0.999988 on 0.999087; active 0.999954; gpu_active 0.999954; conn -1.000000, cpu_and_net_avail 0.999954
433 4/15/2016 9:13:41 AM log flags: file_xfer, sched_ops, task, checkpoint_debug, task_debug, time_debug
432 4/15/2016 9:13:41 AM Config: GUI RPCs allowed from:
431 4/15/2016 9:13:41 AM Config: GUI RPC allowed from any host
430 4/15/2016 9:13:41 AM Not using a proxy
429 4/15/2016 9:13:41 AM Re-reading cc_config.xml
428 4/15/2016 9:08:39 AM 14 connections rejected in last 10 minutes
427 4/15/2016 9:08:39 AM GUI RPC request from non-allowed address 2.0.231.209
426 World Community Grid 4/15/2016 9:05:39 AM Scheduler request completed
425 World Community Grid 4/15/2016 9:05:36 AM Not requesting tasks: some task is suspended via Manager
424
37 4/14/2016 9:42:35 PM GUI RPC request from non-allowed address 2.0.222.136
36 4/14/2016 9:32:30 PM 24 connections rejected in last 10 minutes
35 4/14/2016 9:32:30 PM GUI RPC request from non-allowed address 2.0.216.218
34 4/14/2016 9:22:55 PM 11565 integer MIPS (Dhrystone) per CPU
33 4/14/2016 9:22:55 PM 3048 floating point MIPS (Whetstone) per CPU
32 4/14/2016 9:22:55 PM Number of CPUs: 8
31 4/14/2016 9:22:55 PM Benchmark results:
30 4/14/2016 9:22:25 PM GUI RPC request from non-allowed address 2.0.211.39
29 4/14/2016 9:22:23 PM Suspending computation - CPU benchmarks in progress
28 4/14/2016 9:22:23 PM Running CPU benchmarks
27 4/14/2016 9:22:14 PM Not using a proxy
26 4/14/2016 9:22:14 PM gui_rpc_auth.cfg is empty - no GUI RPC password protection
25 4/14/2016 9:22:14 PM (to change preferences, visit a project web site or select Preferences in the Manager)
24 4/14/2016 9:22:14 PM max upload rate: 999000 bytes/sec
23 4/14/2016 9:22:14 PM max download rate: 999000 bytes/sec
22 4/14/2016 9:22:14 PM max disk usage: 10.00GB
21 4/14/2016 9:22:14 PM max memory usage when idle: 5499.50MB
20 4/14/2016 9:22:14 PM max memory usage when active: 5499.50MB
19 4/14/2016 9:22:14 PM Preferences:
18 4/14/2016 9:22:14 PM Reading preferences override file
17 4/14/2016 9:22:14 PM General prefs: using separate prefs for work
16 World Community Grid 4/14/2016 9:22:14 PM Computer location: work
15 World Community Grid 4/14/2016 9:22:14 PM General prefs: from World Community Grid (last modified 09-Apr-2016 10:45:55)
14 World Community Grid 4/14/2016 9:22:14 PM URL http://www.worldcommunitygrid.org/; Computer ID 3503029; resource share 100
13 4/14/2016 9:22:14 PM Config: GUI RPCs allowed from:
12 4/14/2016 9:22:14 PM Local time is UTC -7 hours
11 4/14/2016 9:22:14 PM Disk: 909.02 GB total, 855.16 GB free
10 4/14/2016 9:22:14 PM Memory: 7.67 GB physical, 7.87 GB virtual
9 4/14/2016 9:22:14 PM OS: Linux: 4.2.0-35-generic
8 4/14/2016 9:22:14 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni
7 4/14/2016 9:22:14 PM Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz [Family 6 Model 58 Stepping 9]
6 4/14/2016 9:22:14 PM Host name: LesterGordon
5 4/14/2016 9:22:14 PM No usable GPUs found
4 4/14/2016 9:22:14 PM Data directory: /var/lib/boinc-client
3 4/14/2016 9:22:14 PM Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
2 4/14/2016 9:22:14 PM log flags: file_xfer, sched_ops, task
1 4/14/2016 9:22:14 PM Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu
----------------------------------------

[Apr 15, 2016 6:34:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

How is it fair to get zero credit for a WU when my machine does perfectly good processing for 18 hours? The fact that the WU trashes itself before it gets to the end of the first step is NOT my fault; it is NOT an "error" on my side. WCG has the capability to fix this!

They should either:

  • not send the WU to a slow machine -- after all, the servers have an estimate of the capability of each machine;
  • modify the time-out to suit the capability of the machine -- again, WCG has an estimate of the capability of the machine so it can adjust the time-out to ensure that a certain number of FlOps are executed;
  • increase the limit time for everyone to eliminate the outliers.


I keep pointing out that the trend to low power-drain processors has meant that there is an increasingly wide range in FlOps capability of new machines, even without taking into account perfectly serviceable but old and slow machines like mine. I believe this issue really does need looking at.

And, just for the record, after the previous CEP2 beta I got no credit for around 20 WUs that failed in this way, so the beta "fix-up" process is broken too.

I do this for science, not for credit, but that doesn't stop it hurting when things like this don't get sorted out when they've been known for quite a while. And it might even drive others away from WCG altogether. I know that time spent fixing these things is time spent not really helping the science, but this is a voluntary system so grumbles can cost processing power if volunteers walk as a result.

Sorry for the rant, but I needed to get it off my chest.
[Apr 16, 2016 8:46:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

After rebooting the laptop the WU came back with 0% Progress and 12 hrs to run. (Restart?)
To the best of my memory, this job was around 70% complete with 2 hrs to finish after 10hs run time.

... snip ...

CPU time at last checkpoint 00:00:00
CPU time 12:45:39
Elapsed time 12:55:28
Estimated time remaining 02:40:00
Fraction done 70.894%

... snip ...

As you can see the WU was still on its way towards the first checkpoint, so when you rebooted the WU had to be restarted from the very beginning.
[Apr 16, 2016 10:42:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

As you can see the WU was still on its way towards the first checkpoint, so when you rebooted the WU had to be restarted from the very beginning.

I see it now.
Can one of the techs explain the reasoning behind the tasks running for 8,9 or even 10+ hours before the first checkpoint?
S.482.C54 ... the higher the C number, the more complex the molecule [and this is a far from original 'reasoning'].

Thanks to all three of you!
----------------------------------------

[Apr 17, 2016 12:28:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

Apis Tintinnambulator: I came here just now to say the same thing.
BETA_ E236441a_ 623_ S.484.C52H20N8S6.BUYQGTISGQSVTE-UHFFFAOYSA-N.15_ s1_ 14_ 0--
was sent to a slow machine (not mine), and got terminated at 18.00h, with 0 credit. The log file indicated that it was progressing normally.
I agree that that was unfair.

I expect that the time limit is there to catch jobs that have gone off the rails to prevent them looping indefinitely.
A way of extending the time limit for long jobs on slow machines needs to be implemented.
[Apr 19, 2016 9:02:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta April 13, 2016 [ Issues Thread ]

Maybe this science app is not able to assess if it is progressing normally, except for those RC = conditions. If it were endlessly looping, then the max_fpops [which is like 10 fold standard], could maybe be lowered to 2-3 fold and the 18 hour limit be removed. Given that the server keeps injecting a variable 'normal' fpops based on returned results, if there are many RC = stops, that normal fpops will be too low, so a reduced max fpops factor would cut out many results on faster machines before the 18th hour... catch 22.5 maybe? We keep on inserting our ideas of how it 'should' be, when not knowing all the criteria.

FAIK, credit is granted, but not always. I do see 18 hour jobs killed at job #0 that do get credited, not the member machine's fault. On the other hand, maybe it's a good idea to then opt-out as continuing in is not advancing science one iota, just time credit tallying.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Apr 19, 2016 10:46:06 AM]
[Apr 19, 2016 9:20:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 28   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread