Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: FightAIDS@Home Phase 2 Thread: All jobs are failing with Invalid |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 47
|
Author |
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: |
The Skylake bug is an AVX condition. No science app uses AVX instructions. If you are referring to WCG, that is so. But Asteroids uses AVX2 quite nicely; it is a big speed improvement there. And the optimized app for DENIS shows a x10 speed improvement over the basic app, though they are now transitioning to a new basic app entirely. And some of the others are (barely) beginning to think about it; I think that Einstein is one, though nothing definite yet.And while we are on the subject, the days of big speed improvements from one CPU generation to the next are over and done with for good. The basic semiconductor physics of the current CMOS technology limits it. It will only be with the instruction set extensions (SSE, AVX, etc.) that improvements will be made. But not many software developers on the science side are very familiar with them, so it looks like it will be a slow transition. |
||
|
metalius
Cruncher Lithuania Joined: Aug 19, 2014 Post Count: 31 Status: Offline Project Badges: |
Every task is marked as Invalid, if at the moment of 100% of progress the PC is offline on some obstacles, for example it lost the connection with Internet (this may happen quite often, if PC is connected by GSM connection).
----------------------------------------I really do not know, what to say to the project technicians. |
||
|
XSmeagolX
Senior Cruncher Joined: Nov 12, 2009 Post Count: 444 Status: Offline Project Badges: |
Hmmm...
----------------------------------------I have one offline-crunch-PC... So on weekend this PC doesn't have internet connection... All 40 WUs went invalid, when i connect this PC to the internet in the morning. So FAHB is not an option for offline crunching, I think....
WCG-Team Captain of Team SETI.Germany
(official Partner of World Community Grid) |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Hmmm... I have one offline-crunch-PC... So on weekend this PC doesn't have internet connection... All 40 WUs went invalid, when i connect this PC to the internet in the morning. So FAHB is not an option for offline crunching, I think.... It's not a beautiful workaround, maybe preposterous to do, but since they run > 24 hours each [on my laptop], something that can be done without too much inconvenience: A) Load up fresh on FAHB for e.g. 3 day buffer. (I have no trouble when I load up a fresh day's worth] B) When off-line, suspend each task that has passed 90% and nearing 100. (Eventually, you'll have a bunch of > 90% complete) C) When internet is back, let the transfer tab clear and trickles report (See the event log stops 'reporting trickle xyz like the sample below, with extra log flags) 3/14/2016 9:43:28 AM [sched_op] Starting scheduler request 3/14/2016 9:43:28 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_FAH2_000081_avx38741_000096_0019_005_0_1457945004.xml 3/14/2016 9:43:28 AM Sending scheduler request: To send trickle-up message. 3/14/2016 9:43:28 AM Not requesting tasks: don't need (job cache full) 3/14/2016 9:43:28 AM [sched_op] CPU work request: 0.00 seconds; 0.00 devices 3/14/2016 9:43:30 AM Finished upload of FAH2_000081_avx38741_000096_0019_005_0_r2044393219_13 3/14/2016 9:43:31 AM Finished upload of FAH2_000081_avx38741_000096_0019_005_0_r2044393219_3 3/14/2016 9:43:31 AM Scheduler request completed 3/14/2016 9:43:31 AM [sched_op] Server version 701 3/14/2016 9:43:31 AM Project requested delay of 121 seconds 3/14/2016 9:43:31 AM [sched_op] Deferring communication for 00:02:01 D) Resume all suspended tasks and let them finish. (LAIM not needed to be on as FAHB checkpoints frequently) Theory is, you get a wave of valids in one stats session. I've done it, unplanned, and all went valid... suspending them when getting past 90%. [Not sure if the trickle handler insists on getting and processing them in sequence, but it seems Spock logic]. One problem may simply be that when trickle 10 comes in and the validator gets launched for the task, the trickle handler may not yet have churned through the previous last yet [it does over 2.5 million a day]... a timing issue, but compressed versus uncompressed transmission was mentioned as a problem. It seems the thing they would address, was not addressed [but lacking positive feedback "yes we did cover that base", it's a huge guess... maybe the techs did]. Summary: Test if it works for you going off-line [if willing to put up], crunch to 90, suspend task, then go on line and let them report, then resume. |
||
|
XSmeagolX
Senior Cruncher Joined: Nov 12, 2009 Post Count: 444 Status: Offline Project Badges: |
Hi Sek..
----------------------------------------This may work, when having direkt access to the computer... But on weekend, I don't have direkt access... My direkt access end every working day at about 16:00... :D
WCG-Team Captain of Team SETI.Germany
(official Partner of World Community Grid) |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Certainly Monday-Thursday you could start fresh FAHB before going home... don't know how fast your office machine is to not hit 100% in what I presume is something like a 16 to 8 away period. The scheduling in BOINC can be set also with a custom daily schedule as well.
If scripting/scheduling is something engageable, boinccmd can be used to read in different configs at different times. Else, yes having an office machine volunteered is nice, but I would not let me be distracted by BOINC. Choose whatever is convenient in other projects. Limited supply, so tasks in the feeder will get crunched anyhow. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi.
I've only just started getting some tasks form this project, this one has taken over 13hrs and has gone invalid, the result file that is on my user page looks the same as a valid one. The rig is a Haswell-R xeon running Ubuntu 14.04lts x64. It has now been resent to two other hosts. Created: 04/23/2016 20:15:44 Name: FAH2_000077_avx38672_000072_0022_021. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
"It has now been resent to two other hosts."?
FAH2 is true zero redundant, so not sure how to understand it being resent to 2 other hosts because of yours going invalid. Maybe if you post a copy of the distribution page, we can reconstruct why those 2 extra copies were issued. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi SekeRob.
You asked for it. Project Name: FightAIDS@Home - Phase 2 Created: 04/23/2016 20:15:44 Name: FAH2_000077_avx38672_000072_0022_021 Minimum Quorum: 1 Replication: 2 Result Name OS type OS version App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 2-- Linux 3.10.0-327.4.5.el7.x86_64 - In Progress 4/25/16 05:03:35 4/29/16 05:03:35 8.32 116.6 / 0.0 FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 1-- Linux 3.19.0-20-generic - In Progress 4/25/16 05:03:31 4/29/16 05:03:31 3.63 77.7 / 0.0 FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 0-- Linux 3.16.0-70-generic 715 Invalid 4/24/16 02:50:20 4/25/16 05:02:00 13.06 490.2 / 0.0 === ME. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi.
After getting another 3 tasks marked as Invalid for no reason that I can see, with over 40hrs gone down the drain, I'm going to pull the plug so to speak as I don't want to waste more time on this project. And the tasks have all been resent twice right away. Is there something wrong with the validator. |
||
|
|