Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 48
Posts: 48   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 10152 times and has 47 replies Next Thread
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

I agree with not imposing something else. However, the end result is the same whether the server removes that science from the profile or if the error rate reduces the queue to zero. The only difference is a lower load of repair units.

People who are chasing one specific project usually are aware enough to know when their computer is misbehaving. At the very least, they should understand the consequences of de-selecting "send other work".

There are always situations where the rules are not appropriate, in this case, I think those situations are remote enough to justify the proposed approach.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Aug 21, 2011 1:14:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

I have another computer that had not run C4CW for a long time. So I tried downloading C4CW onto this computer and have the same problem. Tried all the suggestions detach reboot etc, but no luck. So I have two reliable computers able to run any project except C4CW.
On the plus side that is plenty to keep my computers busy.
[Aug 21, 2011 8:27:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of errors

Short predefined Test work units could be sent out for each project, to ascertain which task types are failing. Say 10min long. Then only tasks that should work, from within the users choice for that systems profile, or failing that from any of the users profiles. If neither is possible, then the task type with the lowest overhead (bandwidth and RAM usage) should be the default. At least this way the user will improve their credit rating and eventually start getting their choice work, after a set required period or improvement.
I suppose such systems could go through the same loop repeatedly, so each time round the loop increase the requirement before allowing normal selection.
There was a project opt-out discussion some time ago, and an order of preference discussion. Perhaps a user could chose projects by favorite down, and in this situation, the emergency profile would try from the top of the users preferences first and work down, if need be.

You might be surprised how many people have been caught out crunching for only one WCG Project. Even experienced crunchers have fallen foul to this problem. For badge hunters with more than one system it's more likely to occur.

Came across a similar problem (continuous c4cw failures) myself yesterday:
c4cw_ target04_ 096838345_ 1-- dualcore Error 20/08/11 14:48:23 20/08/11 14:49:44 0.00 0.0 / 0.0
c4cw_ target04_ 096834702_ 0-- dualcore Error 20/08/11 14:47:03 20/08/11 14:48:23 0.00 0.0 / 0.0
c4cw_ target04_ 096828235_ 0-- dualcore Error 20/08/11 14:36:19 20/08/11 14:37:48 0.00 0.0 / 0.0
c4cw_ target04_ 096828762_ 0-- dualcore Error 20/08/11 14:34:52 20/08/11 14:36:17 0.00 0.0 / 0.0

It's ironic that running other tasks might have caused this, because it's recommended to crunch more than one WCG project on each system; if c4cw was only selected this might not happen! c4cw has a fairly high bandwidth requirement, so failures are unfortunate, but at least they fail immediately. Anyway, there is little danger of that system being out of work, and the general rule of crunch more than one project still stands. For now that profile has been edited to be without c4cw.
----------------------------------------
[Edit 1 times, last edit by skgiven at Aug 21, 2011 3:04:07 PM]
[Aug 21, 2011 2:22:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

I have a similar problem. On Friday I attempted to bring up a new machine. I installed BOINC 6.10.58 from WCG and attached to WCG. WCG then downloaded lots of files, including four works units for C4CW. I found these errors in the message queue:

File c4cw.target04.water-filled-filtered has wrong size: expected 12642142, got 3153468
Checksum or signature error for c4cw.target04.water-filled-filtered

The four work units errored off immediately. I think the above errors are the most likely root cause.

Now my new machine has a bad reputation, and it can’t download any more work for 24 hours. After dropping C4CW, on Saturday I download four new work units, three HCC and one FAAH. These all ran to completion and have been validated, but the new machine is currently serving a second 24 hour suspension.

How long will this machine remain on a 24 hour quota of four work units?
[Aug 21, 2011 3:04:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

I have a similar problem. On Friday I attempted to bring up a new machine. I installed BOINC 6.10.58 from WCG and attached to WCG. WCG then downloaded lots of files, including four works units for C4CW. I found these errors in the message queue:

File c4cw.target04.water-filled-filtered has wrong size: expected 12642142, got 3153468
Checksum or signature error for c4cw.target04.water-filled-filtered

The four work units errored off immediately. I think the above errors are the most likely root cause.

Now my new machine has a bad reputation, and it can’t download any more work for 24 hours. After dropping C4CW, on Saturday I download four new work units, three HCC and one FAAH. These all ran to completion and have been validated, but the new machine is currently serving a second 24 hour suspension.
How long will this machine remain on a 24 hour quota of four work units?

On the bolded bit, your machine should get an instant doubling of the quota as soon as these WU's have been returned successfully.

Can't do much about that filtered file. Ingleside noted it's a sticky, so guess it is best deleted manually. Why it's a sticky file, not removed during a project reset, I don't know. To me it's contrary to being able to quick restore a possible corrupt data/science application set.

--//--
----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 21, 2011 3:16:02 PM]
[Aug 21, 2011 3:15:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

Thanks, SekeRob I deleted the file. I forgot to mention that I am running C4CW on three other machines without any problems. It appears that this problem only affects machines attempting to add C4CW.

I hope the techs will be able to take a look at this problem and enable machines to join in running C4CW.
[Aug 21, 2011 3:28:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

That suggest, also from the past comms on compression check fails and the -200 error, that the server stored file is presently in need of closer inspection. Sorry about that... Dirty Harry might want to do really dirty and copy a copy, same OS and bit size, from one machine to another. Who knows, this uncharted territory would work, but dirty it is and in desperation only.

--//--
[Aug 21, 2011 3:35:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

I have already copied the c4cw.target04.water-filled-filtered file from my working machine and downloads are now successful and work units are running. I assume this should be ok but am waiting for the units to complete successfully and this will take a few more hours.
[Aug 21, 2011 4:04:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

Already :)))))) well, there is that Teleconnection phenomena going around the planet in the hyperwave band :O)

Make sure to let us know how things finish up.

--//--
[Aug 21, 2011 4:13:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of errors

Update: This problem has now struck another machine, one which has been running C4CW for several months with no errors. The first sign of a problem is shown in the following messages:

8/21/2011 12:10:26 PM World Community Grid Computation for task c4cw_target04_097580182_0 finished
8/21/2011 12:10:26 PM World Community Grid Starting c4cw_target04_097714117_0
8/21/2011 12:10:27 PM World Community Grid Starting c4cw_target04_097708619_0
8/21/2011 12:10:27 PM World Community Grid Starting c4cw_target04_097788936_0
8/21/2011 12:10:27 PM World Community Grid Starting c4cw_target04_097795719_0
8/21/2011 12:10:28 PM World Community Grid Computation for task c4cw_target04_097714117_0 finished
8/21/2011 12:10:28 PM World Community Grid Output file c4cw_target04_097714117_0_0 for task c4cw_target04_097714117_0 absent
8/21/2011 12:10:28 PM World Community Grid Computation for task c4cw_target04_097708619_0 finished
8/21/2011 12:10:28 PM World Community Grid Output file c4cw_target04_097708619_0_0 for task c4cw_target04_097708619_0 absent
8/21/2011 12:10:28 PM World Community Grid Computation for task c4cw_target04_097788936_0 finished
8/21/2011 12:10:28 PM World Community Grid Output file c4cw_target04_097788936_0_0 for task c4cw_target04_097788936_0 absent
8/21/2011 12:10:28 PM World Community Grid Computation for task c4cw_target04_097795719_0 finished
8/21/2011 12:10:28 PM World Community Grid Output file c4cw_target04_097795719_0_0 for task c4cw_target04_097795719_0 absent
8/21/2011 12:10:29 PM World Community Grid Started download of c4cw.target04.water-filled-filtered
8/21/2011 12:10:29 PM World Community Grid Started upload of c4cw_target04_097580182_0_0
8/21/2011 12:10:32 PM World Community Grid Finished download of c4cw.target04.water-filled-filtered
8/21/2011 12:10:32 PM World Community Grid [error] File c4cw.target04.water-filled-filtered has wrong size: expected 12642142, got 3153468
8/21/2011 12:10:32 PM World Community Grid [error] Checksum or signature error for c4cw.target04.water-filled-filtered
8/21/2011 12:10:33 PM World Community Grid Finished upload of c4cw_target04_097580182_0_0

I have dropped C4CW from all machines for now.
[Aug 21, 2011 7:31:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 48   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread