Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 17
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4279 times and has 16 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Invalid work units after power failure

This morning I had a power failure.
All of these work units started over and repeated the work that was done before.
None of these work units restarted from check point.
All of these work units were marked invalid.
From the log file it looks like the restart included the info from before the power failure.
I don't understand why they were marked Invalid.

c4cw_ target03_ 140895814_ 1-- Robert-PC Invalid 6/8/11 11:11:31 6/8/11 20:30:13 5.81 109.8 / 0.0
c4cw_ target04_ 010771697_ 0-- Robert-PC Invalid 6/8/11 08:47:52 6/8/11 20:30:13 5.83 110.1 / 0.0
c4cw_ target04_ 010754507_ 0-- Robert-PC Invalid 6/8/11 08:24:33 6/8/11 20:30:13 6.25 118.0 / 0.0
c4cw_ target04_ 010670292_ 0-- Robert-PC Invalid 6/8/11 06:45:00 6/8/11 20:30:13 8.73 165.0 / 0.0
c4cw_ target04_ 010657807_ 0-- Robert-PC Invalid 6/8/11 06:15:22 6/8/11 20:30:13 9.21 174.0 / 0.0
c4cw_ target04_ 010647874_ 0-- Robert-PC Invalid 6/8/11 06:07:56 6/8/11 20:30:13 8.61 162.7 / 0.0
c4cw_ target04_ 010632688_ 0-- Robert-PC Invalid 6/8/11 06:01:27 6/8/11 20:30:13 9.49 179.2 / 0.0
c4cw_ target04_ 010639274_ 0-- Robert-PC Invalid 6/8/11 05:50:17 6/8/11 20:30:13 9.68 182.7 / 0.0
[Jun 8, 2011 10:37:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

with out info from log file, I also don't have a clue
[Jun 8, 2011 11:31:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

The logs are all about the same.

Result Log

Result Name: c4cw_ target03_ 140895814_ 1--



<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcg_c4cw_lmps_6.41_windows_x86_64 -screen none -in in.wcg.acc -var wcgsteps1 10000 -var wcgsteps2 10000 -var loop 0 -var restart 0 -var rinterval 100 -var ifile in.wcg.acc -var wcgseed 140895814
[06:13:15] Percent complete = 0.499975
[06:14:50] Percent complete = 0.999950
[06:16:26] Percent complete = 1.499925
[06:17:57] Percent complete = 1.999900
[06:19:28] Percent complete = 2.499875
[06:21:01] Percent complete = 2.999850
[06:22:33] Percent complete = 3.499825
[06:24:03] Percent complete = 3.999800
[06:25:32] Percent complete = 4.499775
......

[07:32:11] Percent complete = 25.998700
[07:33:40] Percent complete = 26.498675
[07:35:19] Percent complete = 26.998650
[07:36:51] Percent complete = 27.498625 ---Power failure---
Commandline = projects/www.worldcommunitygrid.org/wcg_c4cw_lmps_6.41_windows_x86_64 -screen none -in in.wcg.acc -var wcgsteps1 10000 -var wcgsteps2 10000 -var loop 0 -var restart 0 -var rinterval 100 -var ifile in.wcg.acc -var wcgseed 140895814
[09:10:29] Percent complete = 0.499975
[09:12:01] Percent complete = 0.999950
[09:13:39] Percent complete = 1.499925
[09:15:14] Percent complete = 1.999900
[09:16:53] Percent complete = 2.499875
[09:18:24] Percent complete = 2.999850
[09:19:54] Percent complete = 3.499825
[09:21:34] Percent complete = 3.999800
[09:23:04] Percent complete = 4.499775
.......

[10:30:13] Percent complete = 25.998700
[10:31:47] Percent complete = 26.498675
[10:33:20] Percent complete = 26.998650
[10:34:55] Percent complete = 27.498625
[10:36:27] Percent complete = 27.998600
[10:38:00] Percent complete = 28.498575
[10:39:34] Percent complete = 28.998550
[10:41:06] Percent complete = 29.498525
[10:42:39] Percent complete = 29.998500
........

[13:51:59] Percent complete = 91.495425
[13:53:32] Percent complete = 91.995400
[13:55:02] Percent complete = 92.495375
[13:56:33] Percent complete = 92.995350
[13:58:06] Percent complete = 93.495325
[13:59:37] Percent complete = 93.995300
[14:01:11] Percent complete = 94.495275
[14:02:43] Percent complete = 94.995250
[14:04:15] Percent complete = 95.495225
[14:05:46] Percent complete = 95.995200
[14:07:18] Percent complete = 96.495175
[14:08:52] Percent complete = 96.995150
[14:10:24] Percent complete = 97.495125
[14:11:57] Percent complete = 97.995100
[14:13:29] Percent complete = 98.495075
[14:15:02] Percent complete = 98.995050
[14:16:36] Percent complete = 99.495025
[14:18:11] Percent complete = 99.995000
Exiting with final cpu time = 18451.345877
14:18:13 (4464): called boinc_finish

</stderr_txt>

Brazoria999
[Jun 9, 2011 2:04:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

Hi Brazoria999,

It appears that the result validation is so tight that even the slightest difference [due the unexpected power-failure restart] causes them to be seen invalid.

Ever since buying a couple of real cheapo ups's with high surge protection [cost equal to 2 month of Internet rent] I've not had this. We 'had' power outages more frequent than the Sundays in a week, but since returning none, even though we've had quite a few thunderstorms, wet and dry. Yesterday afternoon several nearly direct hits [seeing and hearing had no time-lapse], but nothing happened... just crunched on.

--//--

edit: What is curious is in your first post that you show some substantial CPU time variation for the very steady Clean Water tasks. Mine, no matter what on duo and quad, just always finish within a band of a few minutes. Sample.

c4cw_ target04_ 010527836_ 0-- 1112084 Valid 8-6-11 03:17:37 9-6-11 06:31:02 4.47 82.4 / 77.5
c4cw_ target04_ 010359841_ 0-- 1112084 Valid 7-6-11 23:13:57 9-6-11 02:01:34 4.49 82.8 / 77.2
c4cw_ target04_ 010309116_ 0-- 1112084 Valid 7-6-11 21:53:11 9-6-11 01:31:18 4.51 83.1 / 77.4
c4cw_ target04_ 010132698_ 0-- 1112084 Valid 7-6-11 18:16:45 8-6-11 20:59:02 4.52 83.3 / 77.7
c4cw_ target04_ 009945115_ 0-- 1112084 Valid 7-6-11 14:29:44 8-6-11 17:21:35 4.50 83.1 / 78.0
c4cw_ target04_ 009932927_ 0-- 1112084 Valid 7-6-11 14:24:20 8-6-11 14:24:35 4.49 82.8 / 78.4
c4cw_ target04_ 009644609_ 1-- 1112084 Valid 7-6-11 08:32:57 8-6-11 07:55:03 4.50 82.9 / 78.2
c4cw_ target04_ 009517195_ 0-- 1112084 Valid 7-6-11 05:53:08 8-6-11 05:39:35 4.49 82.8 / 77.6
c4cw_ target04_ 009319745_ 0-- 1112084 Valid 7-6-11 01:18:46 8-6-11 03:23:45 4.55 83.8 / 77.6
c4cw_ target04_ 008815821_ 0-- 1112084 Valid 6-6-11 14:27:42 7-6-11 18:54:48 4.48 82.5 / 77.5

It's likely the time up to the point of power failure plus normal run time, so the log sample suggests.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 9, 2011 7:08:53 AM]
[Jun 9, 2011 6:57:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
joeperry39@gmail.com
Advanced Cruncher
USA
Joined: Nov 22, 2006
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

Ever since buying a couple of real cheapo ups's with high surge protection [cost equal to 2 month of Internet rent] I've not had this. We 'had' power outages more frequent than the Sundays in a week, but since returning none, even though we've had quite a few thunderstorms, wet and dry. Yesterday afternoon several nearly direct hits [seeing and hearing had no time-lapse], but nothing happened... just crunched on.


I have found that a UPS unit is the cheapest insurance you can buy for a computer. My local power company had a problem a couple years ago and we were getting quick little power failures - usually 3 or 4 seconds in length - sometimes several per hour. The power company finally solved the problem, but not before I lost a hard drive on my computer. crying

When I purchased a new HD I also added a UPS. Now I have two computers, both protected by UPS. They are well worth the cost, especially if you live in an area with power problems or subject to weather events that can cause power failure. smile
----------------------------------------


"Everything in moderation, including moderation" -- Mark Twain
[Jun 9, 2011 8:37:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

My work units normally finsh in a little over 5 hrs.
These each included the time they ran before the power failure with time 5 hrs needed to rerun the entire work unit.

EDIT:
I was refering to the elapsed time that was displayed on the task page. I do not know why the claimed time varied that much.

Brazoria999
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 9, 2011 9:19:46 PM]
[Jun 9, 2011 9:15:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

Ever since buying a couple of real cheapo ups's with high surge protection [cost equal to 2 month of Internet rent] I've not had this. We 'had' power outages more frequent than the Sundays in a week, but since returning none, even though we've had quite a few thunderstorms, wet and dry. Yesterday afternoon several nearly direct hits [seeing and hearing had no time-lapse], but nothing happened... just crunched on.


I have found that a UPS unit is the cheapest insurance you can buy for a computer. My local power company had a problem a couple years ago and we were getting quick little power failures - usually 3 or 4 seconds in length - sometimes several per hour. The power company finally solved the problem, but not before I lost a hard drive on my computer. crying

When I purchased a new HD I also added a UPS. Now I have two computers, both protected by UPS. They are well worth the cost, especially if you live in an area with power problems or subject to weather events that can cause power failure. smile

One thing to note, which is in much of Europe, large parts of the US and at least a large part of China [know little of the SH conditions], is the enormous shortfall of rain, and when it rains it here hoses and runs off before the land / our hillside too can even absorb a little. We're working the land now differently. The grasses are left between the olive trees and plowing is postponed else the topsoils would just wash away. Some neighbors regrettably have not adopted that method yet. What's the reason for the story? The ground barely leads when the fulmine hits.

On topic of validation, maybe the techs could visit this issue in between their overladen to-do list. It will have been minimalistic differences, in fact the whole task recomputed from scratch and this being ''zero redundant'' science, there's no wingman absolute verification need. Could as well abort these tasks on next powerout and observing a resume from start... the progress column in BOINC Manager would indicate this reverting.

2 Euro cents.

--//--

edit: PS, aborts go against quota, NOT against reliability rating!
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 10, 2011 3:44:22 AM]
[Jun 10, 2011 3:43:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

One last comment, I had a problem earlier when I was
updating software on the computer. Windows 7 64 bit.
8 Gbytes of ram I7 860 4 cores HT. I do not belieive that this
application uses checkpoints. I believe that it is probramed to
use them but for some reason decides not to use them.
I had a simular problem earlier when I was updating software
on the computer and after the first restart c4cw looked like
it was doing what it did this time. After the second restart
it appeared that it really did completely start over and did a
valid complete.

[Jun 11, 2011 12:14:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

Clean Water checkpoints very frequently, like every 0.99% progress. You can see that in the result log you posted. Default checkpoint writing LIMIT in client is set to 60 seconds. I've changed it to 5 minutes at most (300 seconds).

--//--

Edit: nope it's even 0.49% progress. More than 200 times per result.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 11, 2011 8:35:38 AM]
[Jun 11, 2011 8:32:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid work units after power failure

I wasn't saying that c4cw didn't check point.
I was saying that it does not use the check points that it has created.
That is part of the problem.

Brazoria999

[Jun 11, 2011 11:50:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread