Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 47
Posts: 47   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7351 times and has 46 replies Next Thread
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

The Skylake bug is an AVX condition. No science app uses AVX instructions.
If you are referring to WCG, that is so. But Asteroids uses AVX2 quite nicely; it is a big speed improvement there. And the optimized app for DENIS shows a x10 speed improvement over the basic app, though they are now transitioning to a new basic app entirely. And some of the others are (barely) beginning to think about it; I think that Einstein is one, though nothing definite yet.

And while we are on the subject, the days of big speed improvements from one CPU generation to the next are over and done with for good. The basic semiconductor physics of the current CMOS technology limits it. It will only be with the instruction set extensions (SSE, AVX, etc.) that improvements will be made. But not many software developers on the science side are very familiar with them, so it looks like it will be a slow transition.
[Feb 1, 2016 8:16:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
metalius
Cruncher
Lithuania
Joined: Aug 19, 2014
Post Count: 31
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
angry Re: All jobs are failing with Invalid

Every task is marked as Invalid, if at the moment of 100% of progress the PC is offline on some obstacles, for example it lost the connection with Internet (this may happen quite often, if PC is connected by GSM connection).
I really do not know, what to say to the project technicians. skull
----------------------------------------

[Mar 7, 2016 6:23:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
XSmeagolX
Senior Cruncher
Joined: Nov 12, 2009
Post Count: 444
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hmmm...
I have one offline-crunch-PC...
So on weekend this PC doesn't have internet connection...
All 40 WUs went invalid, when i connect this PC to the internet in the morning.

So FAHB is not an option for offline crunching, I think....
----------------------------------------
WCG-Team Captain of Team SETI.Germany

(official Partner of World Community Grid)

[Mar 14, 2016 8:41:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hmmm...
I have one offline-crunch-PC...
So on weekend this PC doesn't have internet connection...
All 40 WUs went invalid, when i connect this PC to the internet in the morning.

So FAHB is not an option for offline crunching, I think....

It's not a beautiful workaround, maybe preposterous to do, but since they run > 24 hours each [on my laptop], something that can be done without too much inconvenience:

A) Load up fresh on FAHB for e.g. 3 day buffer. (I have no trouble when I load up a fresh day's worth]
B) When off-line, suspend each task that has passed 90% and nearing 100. (Eventually, you'll have a bunch of > 90% complete)
C) When internet is back, let the transfer tab clear and trickles report (See the event log stops 'reporting trickle xyz like the sample below, with extra log flags)
3/14/2016 9:43:28 AM [sched_op] Starting scheduler request
3/14/2016 9:43:28 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_FAH2_000081_avx38741_000096_0019_005_0_1457945004.xml
3/14/2016 9:43:28 AM Sending scheduler request: To send trickle-up message.
3/14/2016 9:43:28 AM Not requesting tasks: don't need (job cache full)
3/14/2016 9:43:28 AM [sched_op] CPU work request: 0.00 seconds; 0.00 devices
3/14/2016 9:43:30 AM Finished upload of FAH2_000081_avx38741_000096_0019_005_0_r2044393219_13
3/14/2016 9:43:31 AM Finished upload of FAH2_000081_avx38741_000096_0019_005_0_r2044393219_3
3/14/2016 9:43:31 AM Scheduler request completed
3/14/2016 9:43:31 AM [sched_op] Server version 701
3/14/2016 9:43:31 AM Project requested delay of 121 seconds
3/14/2016 9:43:31 AM [sched_op] Deferring communication for 00:02:01

D) Resume all suspended tasks and let them finish. (LAIM not needed to be on as FAHB checkpoints frequently)

Theory is, you get a wave of valids in one stats session.

I've done it, unplanned, and all went valid... suspending them when getting past 90%. [Not sure if the trickle handler insists on getting and processing them in sequence, but it seems Spock logic]. One problem may simply be that when trickle 10 comes in and the validator gets launched for the task, the trickle handler may not yet have churned through the previous last yet [it does over 2.5 million a day]... a timing issue, but compressed versus uncompressed transmission was mentioned as a problem. It seems the thing they would address, was not addressed [but lacking positive feedback "yes we did cover that base", it's a huge guess... maybe the techs did].

Summary: Test if it works for you going off-line [if willing to put up], crunch to 90, suspend task, then go on line and let them report, then resume.
[Mar 14, 2016 9:20:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
XSmeagolX
Senior Cruncher
Joined: Nov 12, 2009
Post Count: 444
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hi Sek..

This may work, when having direkt access to the computer...
But on weekend, I don't have direkt access...
My direkt access end every working day at about 16:00... :D
----------------------------------------
WCG-Team Captain of Team SETI.Germany

(official Partner of World Community Grid)

[Mar 14, 2016 10:04:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Certainly Monday-Thursday you could start fresh FAHB before going home... don't know how fast your office machine is to not hit 100% in what I presume is something like a 16 to 8 away period. The scheduling in BOINC can be set also with a custom daily schedule as well.

If scripting/scheduling is something engageable, boinccmd can be used to read in different configs at different times. Else, yes having an office machine volunteered is nice, but I would not let me be distracted by BOINC. Choose whatever is convenient in other projects. Limited supply, so tasks in the feeder will get crunched anyhow.
[Mar 14, 2016 10:32:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hi.

I've only just started getting some tasks form this project, this one has taken over 13hrs and has gone invalid, the result file that is on my user page looks the same as a valid one. confused

The rig is a Haswell-R xeon running Ubuntu 14.04lts x64.

It has now been resent to two other hosts.

Created: 04/23/2016 20:15:44
Name: FAH2_000077_avx38672_000072_0022_021.
[Apr 25, 2016 5:32:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

"It has now been resent to two other hosts."?

FAH2 is true zero redundant, so not sure how to understand it being resent to 2 other hosts because of yours going invalid. Maybe if you post a copy of the distribution page, we can reconstruct why those 2 extra copies were issued.
[Apr 25, 2016 1:07:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hi SekeRob.

You asked for it.

Project Name: FightAIDS@Home - Phase 2
Created: 04/23/2016 20:15:44
Name: FAH2_000077_avx38672_000072_0022_021
Minimum Quorum: 1
Replication: 2


Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit

FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 2-- Linux 3.10.0-327.4.5.el7.x86_64 - In Progress 4/25/16 05:03:35 4/29/16 05:03:35 8.32 116.6 / 0.0

FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 1-- Linux 3.19.0-20-generic - In Progress 4/25/16 05:03:31 4/29/16 05:03:31 3.63 77.7 / 0.0

FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 0-- Linux 3.16.0-70-generic 715 Invalid 4/24/16 02:50:20 4/25/16 05:02:00 13.06 490.2 / 0.0 === ME.
[Apr 25, 2016 9:38:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hi.

After getting another 3 tasks marked as Invalid for no reason that I can see, with over 40hrs gone down the drain, I'm going to pull the plug so to speak as I don't want to waste more time on this project. sad

And the tasks have all been resent twice right away. Is there something wrong with the validator. confused
[Apr 26, 2016 1:12:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 47   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread