Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 37
|
![]() |
Author |
|
dividedbymyself
Cruncher Joined: Aug 10, 2008 Post Count: 43 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
Hi,
I've already returned several results that get reported as invalid in my result status while they seem to finish without error in Boinc, though some curious things happen before that. First of all, my computer is pretty slow, 1300Mhz, 386Mb. An average HPF2 Wu runs for about 25 hours to finish. I've got the duration time set to 2 hour. When a task is almost finished and already shows 100% it stops and waits for another turn. I read that's because of a checkpoint near the end. After it starts again it takes just a few moments for it to complete and the result is returned to the server. But when I check the result it is marked as invalid. Could the invalid result have anything to do with the way the Wu ends as I described above? If so, is there something I can do to prevent this from happening? Or is my crunch box just too slow? Any suggestions? I really don't mind about the credits I loose because of this, but loosing 25 hours per Wu is quite a lot when many of my results have nothing to add to the project. Bart |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello dividedbymyself,
From your discussion of duration, it sounds as though you are running some non-WCG projects as well. The first thing that pops into mind is that you might be running Vista 64. It has been reported to give problems with HPF2: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=22661 In any event, the first thing to do is to give us more information. If you could post Messages that will give us an idea about your system. Lawrence |
||
|
dividedbymyself
Cruncher Joined: Aug 10, 2008 Post Count: 43 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
Hi Lawrence,
----------------------------------------I am running several other (non WCG) Boinc projects as well on the same computer. The OS is WinXP-SP3, on a AMD Athlon (K3 if I remember well) 32 bit CPU. Memory is 384 Mb btw, not 386 ;) Need more info? Bart [Edit 1 times, last edit by dividedbymyself at Jan 22, 2009 9:40:16 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi dividedbymyself,
On your Results Status page, if you click on 'Invalid' for a work unit, you will get a Result Log for that work unit. Is there a difference between valid and invalid results? Also, are you getting any Valid results for WCG projects? If so, is there any pattern, such as all HPF2 jobs fail but FAAH always succeed? For that matter, when was the last time a HPF2 job was marked Valid? (Not a trick question. I am just trying to find out if this is an intermittent problem or one that always happens or one that always happens if and only if you switch projects after reaching 100% but before returning the results.) Lawrence |
||
|
dividedbymyself
Cruncher Joined: Aug 10, 2008 Post Count: 43 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
I only have two WU's in my history, only one from HPF2 and another from TCEP that's running now on my other computer, all the rest have disappeared already.
This is what the last HPF2 result shows: <core_client_version>6.2.19</core_client_version> <![CDATA[ <stderr_txt> called boinc_finish called boinc_finish </stderr_txt> ]]> There seems to be no error here, but maybe you see things different? Most HPF2 results were fine in the past but I do not regularly check the results, so to be honest I can't tell how many were invalid, but I don't remember there were many errors in the past when I did check. What I do know is that of the last 3, two were invalid. I think they were all returned this month. On my other computer (WinXP SP3 on AMD-K6 64) I currently run TCEP and I remember to have had only one failure/invalid result, the first one, and the rest was OK, but also no history of that anymore, so I can't be 100% sure. I'm sorry I can't give much more information, but the history page is not very extended and because I don't check the result page very often I just don't remember them all. Bart |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi dividedbymyself,
I will compare with my HPF2 result logs once the statistics finish updating. I don't remember for sure, but I think that a double call to boinc_finish is unusual. It may turn out to be a problem with switching after 100%. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just checked. All 3 of my HPF2 results call boinc_finish just once. So I am going to guess that the problem occurred because BOINC switched to a different project after reaching 100%. I remember somebody else who had a problem at the same point, but I cannot remember what project he was running or just what the problem was.
I don't have a solution. Just from curiosity, have you checked 'Leave in memory' in your profile? Lawrence |
||
|
dividedbymyself
Cruncher Joined: Aug 10, 2008 Post Count: 43 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
Just from curiosity, have you checked 'Leave in memory' in your profile? I don't have that option checked as I think it consumes too much memory as I'm crunching 5 to 8 projects on that computer, dependent on the availability of work. But out of curiosity... Why can't the checkpoint not just get skipped at the end of a Wu? Finished is finished I suppose. I have to assume a lot about the inner workings of the app and over simplifying too much, but I assume that there's a checkpoint at the end of some sort of loop to enable intermediate results to be written to file so the app can restart from there the next time it loads again. But when it knows the Wu is finished, it can also be told to skip the 100% checkpoint and write to file by loop-independent means and then send the results back to base. But you think that leaving the app in memory could resolve the issue? Wouldn't it affect my available memory as there are going to be at least 5 apps in memory, dependent on available work and how often I restart Boinc or reboot? Btw, I reboot daily, so in the end it could still effect HPF2. Hmm, lots of questions here... And as far as I can see there's not a fail-safe solution. Bart |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The project default switch is 60 minutes. You're much better served with e.g. 240 minutes. Chance of the scheduler allowing the task to finish and pack it up is much greater what I think is a much to eager pre-emptive scheduling flaw.
----------------------------------------Proposed a test to say e.g. if > 99.5% done, let it complete and pack up for transmission, but doubt it was heard by the developers. They have very selective hearing actually, because it was multiple times reported on their forums and possible a Trac ticket has been existing for longer.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi dividedbymyself,
In your situation, I would not leave anything in memory either. The correct way to handle this problem is to change BOINC so that it does not switch projects when it is so near the end. I do not expect this correction to be made in the near future with so much else being done to BOINC. So - - just another extremely low-occurrence bug to put up with. Lawrence |
||
|
|
![]() |