Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 315
Posts: 315   Pages: 32   [ Previous Page | 18 19 20 21 22 23 24 25 26 27 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 124088 times and has 314 replies Next Thread
JSYKES
Senior Cruncher
Joined: Apr 28, 2007
Post Count: 206
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

I have an ARP beta running (slowly!!) - currently 13hrs+ and only 32% completed - I have spotted that the checkpoint times seem to be long - currently at least 4hrs..... not sure if this a good (or bad?) thing - I guess it ought to be more frequent than that to allow for machine usage/routine start/stops etc?
The SCC beta's have all raced through without a hitch....
----------------------------------------

[Jun 19, 2019 5:33:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 328
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

I have had one work unit which has gone invalid: BETA_ARP1_0000488_002.
Other wingmen had a range of status - 1 * no reply, 2 * valid, 1 * error and my 1 * invalid

The wingman’s status log in error is:
Result Log

Result Name: BETA_ ARP1_ 0000488_ 002_ 2--


<core_client_version>5.4.11</core_client_version>
<message>
Couldn't start or resume: 2
</message>

My invalid units had many error messages throughout but it continued running for over 23 hours. The error states the work unit was ‘out of memory’ but my PC is an I7-6700 with 16Gb of memory and reached a maximum of 45% memory usage.
The output from my result log is:
Result Log

Result Name: BETA_ ARP1_ 0000488_ 002_ 4--


<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[18:30:23] INFO: Checkpoint taken at 2018-04-05_06:00:00
ERROR: Out of Memory Error on compression
.
18 identical messages removed
.
ERROR: Out of Memory Error on compression
[22:18:30] INFO: Checkpoint taken at 2018-04-05_12:00:00
ERROR: Out of Memory Error on compression
.
70 identical messages removed
.
ERROR: Out of Memory Error on compression
[09:03:14] INFO: Checkpoint taken at 2018-04-05_18:00:00
ERROR: Out of Memory Error on compression
.
71 identical messages removed
.
ERROR: Out of Memory Error on compression
[11:38:44] INFO: Checkpoint taken at 2018-04-06_00:00:00
ERROR: Out of Memory Error on compression
.
71 identical messages removed
.
ERROR: Out of Memory Error on compression
[13:51:03] INFO: Checkpoint taken at 2018-04-06_06:00:00
ERROR: Out of Memory Error on compression
.
70 identical messages removed
.
ERROR: Out of Memory Error on compression
[17:14:52] INFO: Checkpoint taken at 2018-04-06_12:00:00
ERROR: Out of Memory Error on compression
.
70 identical messages removed
.
ERROR: Out of Memory Error on compression
[21:08:50] INFO: Checkpoint taken at 2018-04-06_18:00:00
ERROR: Out of Memory Error on compression
.
74 identical messages removed
.
ERROR: Out of Memory Error on compression
00:00:58 (28276): called boinc_finish(0)

</stderr_txt>

The number of identical messages removed may be wrong by 1 or 2.
[Jun 19, 2019 7:36:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dangertk
Cruncher
The Netherlands
Joined: Oct 16, 2009
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Got one beta WU on android and one on Windows both have run without problems (both SCC betas and valid) for about 2 hours on windows and about 3.25 hours on Android. I noticed that my wingman on the windows WU claimed 80 points while I claimed 40 points. I don't know if that's related to the Beta but it seems a bit off.
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by Dangertk at Jun 19, 2019 8:17:21 AM]
[Jun 19, 2019 8:08:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

A new kind of Beta

BETA_ ARP1_ 0000441_ 001_ 4-- Microsoft Windows 10 Core x64 Edition, (10.00.18362.00) - In Progress 6/18/19 23:37:50 6/21/19 10:25:50 0.00 0.0 / 0.0
BETA_ ARP1_ 0000441_ 001_ 3-- Microsoft Windows 10 Professional x64 Edition, (10.00.17763.00) - No Reply 6/16/19 12:49:48 6/18/19 23:37:48 0.00 0.0 / 0.0
BETA_ ARP1_ 0000441_ 001_ 2-- Microsoft Windows 10 Education x64 Edition, (10.00.17134.00) - No Reply 6/14/19 01:40:45 6/16/19 12:28:45 0.00 0.0 / 0.0
BETA_ ARP1_ 0000441_ 001_ 0-- Microsoft Windows 8.1 x64 Edition, (06.03.9600.00) 721 Pending Validation 6/7/19 01:40:39 6/8/19 06:14:44 20.64 173.7 / 0.0
BETA_ ARP1_ 0000441_ 001_ 1-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) - No Reply 6/7/19 01:40:39 6/14/19 01:40:39 0.00 0.0 / 0.0

If testing "how many do not make it to the Beta short deadline check", you've convinced me. 60% failure rate.
[Jun 19, 2019 12:04:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

A new kind of Beta

BETA_ ARP1_ 0000441_ 001_ 4-- Microsoft Windows 10 Core x64 Edition, (10.00.18362.00) - In Progress 6/18/19 23:37:50 6/21/19 10:25:50 0.00 0.0 / 0.0
BETA_ ARP1_ 0000441_ 001_ 3-- Microsoft Windows 10 Professional x64 Edition, (10.00.17763.00) - No Reply 6/16/19 12:49:48 6/18/19 23:37:48 0.00 0.0 / 0.0
BETA_ ARP1_ 0000441_ 001_ 2-- Microsoft Windows 10 Education x64 Edition, (10.00.17134.00) - No Reply 6/14/19 01:40:45 6/16/19 12:28:45 0.00 0.0 / 0.0
BETA_ ARP1_ 0000441_ 001_ 0-- Microsoft Windows 8.1 x64 Edition, (06.03.9600.00) 721 Pending Validation 6/7/19 01:40:39 6/8/19 06:14:44 20.64 173.7 / 0.0
BETA_ ARP1_ 0000441_ 001_ 1-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) - No Reply 6/7/19 01:40:39 6/14/19 01:40:39 0.00 0.0 / 0.0

If testing "how many do not make it to the Beta short deadline check", you've convinced me. 60% failure rate.

Not sure I understand your point. The last No Reply was after a week. The other 2 after 3 days. It’s beta testing. Short deadlines are sometimes the nature of the beast.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 2 times, last edit by nanoprobe at Jun 19, 2019 3:02:08 PM]
[Jun 19, 2019 12:32:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

I finally snagged one of these (BETA_ARP1_0000442_004). It ran for a little over 30 hours on a Linux box. Checkpointed every six hours. CPU is Xeon X5650 hyperthreaded. It ran with 23 other Zika units. No problems.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jun 19, 2019 4:02:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

hchc said:
Looks like each work unit simulates 48 hours, with checkpoints taken at:

06:00
12:00
18:00
00:00
06:00
12:00
18:00

On my i3-8100 @ 3.6 GHz, that's about 1-2.5 CPU hours between checkpoints. It's fine for a 24/7 device, but could these checkpoints maybe be doubled? So every 3 simulated hours instead of every 6 simulated hours.

Is doubling the checkpoints to every 3 simulated hours feasible?
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jun 19, 2019 6:32:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Jonathan said
For this project, the only method available to validate results is to run redundant copies and check for binary equivalence.
which makes me question the way this WU was handled:
Result Name 	OS type 	OS version 	App Version Number 	Status 	Sent Time 	Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
BETA_ ARP1_ 0001170_ 002_ 2-- Microsoft Windows 7 Professional x86 Edition, Service Pack 1, (06.01.7601.00) - In Progress 6/19/19 16:32:52 6/22/19 03:20:52 0.00 0.0 / 0.0
BETA_ ARP1_ 0001170_ 002_ 0-- Microsoft Windows XP Home x86 Edition, Service Pack 3, (05.01.2600.00) 721 Pending Verification 6/13/19 14:16:10 6/19/19 16:32:38 122.41 167.7 / 0.0
BETA_ ARP1_ 0001170_ 002_ 1-- Microsoft Windows 10 Enterprise x64 Edition, (10.00.17134.00) 721 Pending Verification 6/13/19 14:16:10 6/15/19 03:04:16 32.08 167.7 / 0.0

The _0 copy was sent to an X86 machine, while the _1 went to an x64 machine. Surely it is unlikely that machines with 32-bit and 64-bit architectures will produce binary equivalent results, especially if floating point calculations are involved, or am I missing something?
[Jun 19, 2019 7:03:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1403
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Is doubling the checkpoints to every 3 simulated hours feasible?

Very few people seem to understand that the data analyzed is not from a 6-hours period, but the the data of one fixed time e.g. 06:00 UTC or 12:00 UTC and not the period from 06 to 12.
So check-pointing more often will be very hard or one have to run this on a virtual machine where one could make snapshots more often during the analyzing process.
[Jun 19, 2019 7:06:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Crystal Pellet said:
Is doubling the checkpoints to every 3 simulated hours feasible?

Very few people seem to understand that the data analyzed is not from a 6-hours period, but the the data of one fixed time e.g. 06:00 UTC or 12:00 UTC and not the period from 06 to 12.

Thanks for explaining that these are instantaneous simulations. My question remains to the WRF developers: Can checkpoints be doubled or tripled? Slow devices may take 10 calendar hours between checkpoints. Even fast devices can take 1-2 hours between checkpoints.

I'm wondering if any of the developers are monitoring the WCG forums.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jun 19, 2019 7:13:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 315   Pages: 32   [ Previous Page | 18 19 20 21 22 23 24 25 26 27 | Next Page ]
[ Jump to Last Post ]
Post new Thread