World Community Grid - View Thread - Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

World Community Grid Forums

Category: Beta Testing

Forum: Beta Test Support Forum

Thread: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 114

[ ]

Author

This topic has been viewed 22847 times and has 113 replies

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

2 GB for 1 slot allowed and exceeding.

I got 3 resends only from Linux machines with this "Maximum disk usage exceeded" and will probably run into the same error on my machine.

[Feb 27, 2016 6:59:23 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

BETA_E236293_985_S.320.C37H29N1O4S1Si1.VYS...UHFFFAOYSA-N.13_s1_14 started as a 3:34 hour WU. Restarted BOINC and my computer within initial projected timeframe without error, but it did not finish within that time. After 15+ computing, I shutdown the computer for the night. Next morning, the WU restarted from the beginning with no messages to the event log. it is still running after 2:40 minutes with a remaining(estimated) of 3:05 minutes.

The event log does not have any messages about the beta WU, not even a checkpoint. Other WU messages appear not nothing from this beta.

Partial event log:
2/27/2016 8:45:22 AM | | Starting BOINC client version 7.6.22 for windows_x86_64
2/27/2016 8:45:22 AM | | log flags: file_xfer, sched_ops, task, checkpoint_debug
2/27/2016 8:45:22 AM | | Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8
2/27/2016 8:45:22 AM | | Data directory: C:\ProgramData\BOINC
2/27/2016 8:45:22 AM | | Running under account trodr
2/27/2016 8:45:26 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 960 (driver version 361.91, CUDA version 8.0, compute capability 5.2, 2048MB, 1636MB available, 2412 GFLOPS peak)
2/27/2016 8:45:26 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 960 (driver version 361.91, device version OpenCL 1.2 CUDA, 2048MB, 1636MB available, 2412 GFLOPS peak)
2/27/2016 8:45:26 AM | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 4600 (driver version 20.19.15.4331, device version OpenCL 1.2, 1630MB, 1630MB available, 200 GFLOPS peak)
2/27/2016 8:45:26 AM | | OpenCL CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 5.2.0.10094, device version OpenCL 1.2 (Build 10094))
2/27/2016 8:45:26 AM | | All projects have zero resource share; setting to 100
2/27/2016 8:45:26 AM | | Host name: Mango
2/27/2016 8:45:26 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz [Family 6 Model 60 Stepping 3]
2/27/2016 8:45:26 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 pbe fsgsbase bmi1 smep bmi2
2/27/2016 8:45:26 AM | | OS: Microsoft Windows 10: Core x64 Edition, (10.00.10586.00)
2/27/2016 8:45:26 AM | | Memory: 15.92 GB physical, 18.29 GB virtual
2/27/2016 8:45:26 AM | | Disk: 237.92 GB total, 81.93 GB free
2/27/2016 8:45:26 AM | | Local time is UTC -8 hours
2/27/2016 8:45:26 AM | | VirtualBox version: 5.0.10
2/27/2016 8:45:26 AM | GPUGRID | URL http://www.gpugrid.net/; Computer ID 292928; resource share 100
2/27/2016 8:45:26 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3467956; resource share 100
2/27/2016 8:45:26 AM | World Community Grid | General prefs: from World Community Grid (last modified 19-Feb-2016 06:48:36)
2/27/2016 8:45:26 AM | World Community Grid | Host location: none
2/27/2016 8:45:26 AM | World Community Grid | General prefs: using your defaults
2/27/2016 8:45:26 AM | | Reading preferences override file
2/27/2016 8:45:26 AM | | Preferences:
2/27/2016 8:45:26 AM | | max memory usage when active: 16300.71MB
2/27/2016 8:45:26 AM | | max memory usage when idle: 16300.71MB
2/27/2016 8:45:26 AM | | max disk usage: 21.41GB
2/27/2016 8:45:26 AM | | suspend work if non-BOINC CPU load exceeds 45%
2/27/2016 8:45:26 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
â¦
2/27/2016 11:46:34 AM | World Community Grid | [checkpoint] result FAH2_000071_avx17558_000096_0056_012_wcgfahb00020000_0 checkpointed

--------------

Aborted the job last night after a suspend/resume cycle reset the WU to the beginning. The WU was not going anywhere.

----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 28, 2016 9:23:36 PM]

[Feb 27, 2016 8:07:55 PM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

5 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Microbiome Immunity Project

20 year badge for OpenPandemics - COVID-19


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

FWIW I looked for check pointing on many of the tasks I received. I didn't find any that check pointed before 9+ hours of run time.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Feb 27, 2016 8:41:46 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

2 GB for 1 slot allowed and exceeding.

I got 3 resends only from Linux machines with this "Maximum disk usage exceeded" and will probably run into the same error on my machine.

As expected: first error task of the three mentioned

27 Feb 22:17:27 Aborting task BETA_E236295_747_S.314.C30H22N6O5S1Si1.IQWNYZMZQUSGER-UHFFFAOYSA-N.13_s1_14_2: exceeded disk limit: 2144.85MB > 2048.00MB

[OT]: Can't report the task atm because of the famous:
27 Feb 21:24:56 Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates
after a Linux upgrade sad

Have to wait to restart BOINC/reboot the machine until 4 other CEP Beta's are ready/errored out. [/OT]

----------------------------------------
[Edit 1 times, last edit by Crystal Pellet at Feb 27, 2016 9:41:38 PM]

[Feb 27, 2016 9:27:40 PM]

Trotador
Senior Cruncher
Joined: Mar 26, 2009
Post Count: 154
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Nutritious Rice for the World

1 year badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for Computing for Clean Water

200 year badge for Mapping Cancer Markers

100 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

200 year badge for Smash Childhood Cancer

100 year badge for Microbiome Immunity Project

100 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

FWIW I looked for check pointing on many of the tasks I received. I didn't find any that check pointed before 9+ hours of run time.

Yeah, no checkpoint in the many first hours of crunching

----------------------------------------

[Feb 27, 2016 9:35:50 PM]

pvh513
Senior Cruncher
Joined: Feb 26, 2011
Post Count: 260
Status: Offline
Project Badges:

14 day badge for Discovering Dengue Drugs - Together - Phase 2

20 year badge for The Clean Energy Project - Phase 2

50 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

I finished five WUs from the new batch so far. Three ended with "Application exited with RC = 0x100", the other two with "Application exited with RC = 0xb".

One of these five units had a "Maximum disk usage exceeded" with my wingman, but I did not get that error. I am running openSUSE 42.1 on all rigs. This WU is called

BETA_ E236294_ 325_ S.314.C30H22N8O3S2.FNMFXVKTVDLVCB-UHFFFAOYSA-N.7_ s1_ 14_ 1--

----------------------------------------
[Edit 1 times, last edit by pvh513 at Feb 27, 2016 10:55:40 PM]

[Feb 27, 2016 10:45:24 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

How often are tasks of this workunit resent then all will give the same error: "Maximum disk usage exceeded"

BETA_ E236295_ 747_ S.314.C30H22N6O5S1Si1.IQWNYZMZQUSGER-UHFFFAOYSA-N.13_ s1_ 14_ 4-- Linux 3.16.0-38-generic - In Progress 2/28/16 08:00:29 2/29/16 17:36:28 0.00 0.0 / 0.0
BETA_ E236295_ 747_ S.314.C30H22N6O5S1Si1.IQWNYZMZQUSGER-UHFFFAOYSA-N.13_ s1_ 14_ 3-- Linux 2.6.32-504.el6.centos.plus.x86_64 - In Progress 2/27/16 21:30:05 2/29/16 07:06:04 0.00 0.0 / 0.0
BETA_ E236295_ 747_ S.314.C30H22N6O5S1Si1.IQWNYZMZQUSGER-UHFFFAOYSA-N.13_ s1_ 14_ 2-- Linux 3.2.0-98-generic 700 Error 2/27/16 07:54:36 2/28/16 08:00:15 12.86 371.3 / 0.0
BETA_ E236295_ 747_ S.314.C30H22N6O5S1Si1.IQWNYZMZQUSGER-UHFFFAOYSA-N.13_ s1_ 14_ 1-- Linux 3.16.0-38-generic 700 Error 2/26/16 20:50:06 2/27/16 07:54:28 7.18 169.4 / 0.0
BETA_ E236295_ 747_ S.314.C30H22N6O5S1Si1.IQWNYZMZQUSGER-UHFFFAOYSA-N.13_ s1_ 14_ 0-- Linux 4.2.0-1-amd64 700 Error 2/26/16 20:49:23 2/27/16 21:30:03 4.63 166.8 / 0.0

[Feb 28, 2016 8:46:25 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Dreading, got 8 of the 295 batch, 6 running, and some into the 17th hour with 3 recorded checkpoints, all _0 and _1.

Just curious, there was a longer time ago talk of real biggies coming to the grid [suppose they need opt-in to opt-in]... could these be it and the disk_bound really needing upping?

Anyway, 5 copies would be the stop-sign, but not going to wait for it. The first that's going MDUE, and the rest goes exitus by hand... beta hours just for the heck of getting them credited on known failed jobs have not my particular interest.

[Feb 28, 2016 9:22:58 AM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Next one returned:

28 Feb 10:38:47 Aborting task BETA_E236295_221_S.318.C26H26N4O8S1Si2.FAWZKWOIBUVPJU-UHFFFAOYSA-N.8_s1_14_3: exceeded disk limit: 2104.93MB > 2048.00MB

I can't imagine that in production one should need >2GB for 1 slot.
As far as I can see the errors are only happening on Linux machines. Maybe something wrong with purging temporary files from the slot?

[Feb 28, 2016 9:47:09 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Oh, well mine run on a W7-64. The purging issue / true obliteration of slot content of old jobs was resolved somewhere 7.6.9. Have 7.6.22 on the Ubuntu.

2GB is the minimum setting on the System Requirement page... just had a rejection of UGM work requests, because the full 5GB allowed for BOINC had been used, so upped it to 6GB...

10531 2/28/2016 9:50:24 AM Message from server: Uncovering Genome Mysteries needs 500.00MB more disk space. You currently have 0.00 MB available and it needs 500.00 MB.

and later

10959 2/28/2016 11:02:29 AM Message from server: Uncovering Genome Mysteries needs 4.83MB more disk space. You currently have 495.17 MB available and it needs 500.00 MB.

So gave it another 1GB... not usually running 6 CEP2 concurrent.... 1 or 2 max, because as of now, got a slug under my hands with very poor efficiency.

edit: First finished of 295 with 4 checkpoints, skipping #4 (job 5), earlier observed by others. Guess this is same same continuation with an RC = end ... no science mileage to be made, or gone between nothingness and eternity.

Result Name: BETA_ E236295_ 676_ S.314.C30H22N6O5S1Si1.YTTCTFKEFVTPBZ-UHFFFAOYSA-N.1_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[17:14:34] Number of jobs = 5
[17:14:34] Starting job 0,CPU time has been restored to 0.000000.
[06:42:39] Finished Job #0
[06:42:39] Starting job 1,CPU time has been restored to 43698.625718.
[08:10:48] Finished Job #1
[08:10:48] Starting job 2,CPU time has been restored to 48849.139134.
[08:41:17] Finished Job #2
[08:41:17] Starting job 3,CPU time has been restored to 50549.394033.
09:25:55 (23264): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[09:26:28] Number of jobs = 5
[09:26:28] Starting job 3,CPU time has been restored to 50549.394033.
09:31:57 (21456): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[09:34:31] Number of jobs = 5
[09:34:31] Starting job 3,CPU time has been restored to 50549.394033.
Application exited with RC = 0xc0000005
[11:35:22] Finished Job #3
[11:35:22] Starting job 4,CPU time has been restored to 56878.619805.
[11:35:22] Skipping Job #4
11:35:28 (21352): called boinc_finish

</stderr_txt>
]]>

----------------------------------------
[Edit 1 times, last edit by SekeRob* at Feb 28, 2016 10:46:32 AM]

[Feb 28, 2016 10:24:38 AM]

[ ]