| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 60
|
|
| Author |
|
|
KieX
Cruncher Spain Joined: Dec 19, 2009 Post Count: 8 Status: Offline Project Badges:
|
Kevin,
----------------------------------------Not sure if the information you requested was for this thread's problem or for the post regarding 'unexpected XML tag or syntax'. But with regards to the association of PV and not receiving any tasks, this is what my log shows for a similarly affected computer: 17/11/2012 02:51:59 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2202771; resource share 100 17/11/2012 02:52:00 | World Community Grid | Sending scheduler request: To fetch work. 17/11/2012 02:52:00 | World Community Grid | Requesting new tasks for CPU and ATI 17/11/2012 02:52:03 | World Community Grid | Scheduler request completed: got 1 new tasks 17/11/2012 02:52:03 | World Community Grid | Resent lost task X0960073631347200608011011_4 17/11/2012 02:52:03 | World Community Grid | [error] App version returned from anonymous platform project; ignoring 17/11/2012 02:52:03 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4 17/11/2012 02:58:19 | World Community Grid | Sending scheduler request: To fetch work. 17/11/2012 02:58:19 | World Community Grid | Requesting new tasks for ATI 17/11/2012 02:58:23 | World Community Grid | Scheduler request completed: got 1 new tasks 17/11/2012 02:58:23 | World Community Grid | Resent lost task X0960073631347200608011011_4 17/11/2012 02:58:23 | World Community Grid | [error] App version returned from anonymous platform project; ignoring 17/11/2012 02:58:23 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4 Unless I'm mistaken, the same WU that was initially discarded was subsequently sent again a few minutes later. I hope this helps. |
||
|
|
Tomahawk4196
Advanced Cruncher USA Joined: Aug 16, 2007 Post Count: 93 Status: Offline Project Badges:
|
I apologize for my ignorance, but where exactly is the 'messages log'? And where are the 'messages on your client'?
----------------------------------------I see the 'event log' under the Advanced pulldown - is that the same thing? Thanks ![]() |
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
I found a work around if you're using an app_info file but you may need to have at least 1 WU of each version in your cache for it to work. That's the way i got it to work.
----------------------------------------If you have removed the app_info file from the projects folder use the text below to replace it. Shut down boinc, copy and paste into notepad. This text is for my rigs to run 10 WUs at a time. Make the necessary changes for your rigs accordingly in the <avg_ncpus>0.80</avg_ncpus> <max_ncpus>1.0</max_ncpus> and <count>.10</count> area of the code for both apps to what you need for your rigs. Rename the file app_info.xml, drop it back into the WCG project folder, restart boinc and you should be good to go. I was able to get new 705 tasks after the 656 task had finished. Then some more 656 tasks showed up so it seems to be working. *fingers crossed* <app_info> <app> <name>hcc1</name> <user_friendly_name>Help Conquer Cancer</user_friendly_name> </app> <file_info> <name>wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1</name> <executable/> </file_info> <file_info> <name>hcckernel.cl.7.05</name> <executable/> </file_info> <app_version> <app_name>hcc1</app_name> <version_num>705</version_num> <platform>windows_intelx86</platform> <plan_class>ati_hcc1</plan_class> <avg_ncpus>0.80</avg_ncpus> <max_ncpus>1.0</max_ncpus> <coproc> <type>ATI</type> <count>.10</count> </coproc> <file_ref> <file_name>wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1</file_name> <main_program/> </file_ref> <file_ref> <file_name>hcckernel.cl.7.05</file_name> <open_name>hcckernel.cl</open_name> </file_ref> </app_version> <app> <name>hcc1</name> <user_friendly_name>Help Conquer Cancer</user_friendly_name> </app> <file_info> <name>wcg_hcc1_img_6.56_windows_intelx86__ati_hcc1</name> <executable/> </file_info> <file_info> <name>hcckernel.cl.6.56</name> <executable/> </file_info> <app_version> <app_name>hcc1</app_name> <version_num>656</version_num> <platform>windows_intelx86</platform> <plan_class>ati_hcc1</plan_class> <avg_ncpus>0.80</avg_ncpus> <max_ncpus>1.0</max_ncpus> <coproc> <type>ATI</type> <count>.10</count> </coproc> <file_ref> <file_name>wcg_hcc1_img_6.56_windows_intelx86__ati_hcc1</file_name> <main_program/> </file_ref> <file_ref> <file_name>hcckernel.cl.6.56</file_name> <open_name>hcckernel.cl</open_name> </file_ref> </app_version> </app_info>
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges:
|
I apologize for my ignorance, but where exactly is the 'messages log'? And where are the 'messages on your client'? I see the 'event log' under the Advanced pulldown - is that the same thing? Thanks Yes. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
So here is the problem:
17/11/2012 02:58:23 | World Community Grid | [error] App version returned from anonymous platform project; ignoring 17/11/2012 02:58:23 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4 A job gets assigned that uses the old app version. The host is anonymous platform, so it ignores the app version sent. The host can't find app version that matches the platform, version num and plan_class so it discards the job The problem is that discarding the job does not report to the server that the client isn't running it. Thus the job is still assigned to you. Next request to the server, you get sent the job again. This continues. Even worse, each time the job is sent to you the deadline for the job is re-evaluated and possibly slightly increased. Thus it can potentially never pass its deadline. When I started digging into this problem today, there were a lot of computers who were repeatedly being resent the same jobs. This issue occurs when all three of app_info.xml is used, homogenous_app_version and resend_results. Resolving this issue for the long run is going to be somewhat tricky. As a result, what I am doing now is changing the app_version on the workunits to all be at the 705 level. The new binaries are backward compatible so this shouldn't be an issue. This should return life to normal for now. |
||
|
|
BSD
Senior Cruncher Joined: Apr 27, 2011 Post Count: 224 Status: Offline |
Computer ID 2015741
----------------------------------------Windows 7 x64 8 GB DDR3 RAM BOINC 7.0.36 GPU Radeon HD 7770, not O/C Do not have a "app_info.xml" file, and I haven't manually customized any BOINC configurations files. These are listed as IP, but they are not listed in my BOINC client "Tasks" or "stdoutdae" or "stdoutdae.old" file. Most are 705 app (which I'm currently crunching without problem) and one is 656. X0930076580255200610242352_ 0-- hickory In Progress 11/14/12 12:50:27 11/21/12 12:50:27 0.00 / 0.00 0.0 / 0.0 X0900076591287200610181534_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0 X0900076591296200610181534_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0 X0900076590697200610121550_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0 X0960075851301200609151151_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0 X0900076591297200610181533_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0 X0960074130401200609131417_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0 X0930075120602200610031638_ 0-- hickory In Progress 11/11/12 09:14:50 11/18/12 09:14:50 0.00 / 0.00 0.0 / 0.0 I have about 1.5 pages of "No Reply" for this same device, but a spot check of the WU's were not found in my "stdoutdae.(old)" logs. Very strange, I've manually stopped receiving new work and am finishing up what I have now to see what I can do to help flush/retreive the zombie WU's. Edit: Changed PV to IP [Edit 1 times, last edit by BSD at Nov 17, 2012 4:09:44 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here is to normality! We have been missing it. Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
... The problem is that discarding the job does not report to the server that the client isn't running it. Thus the job is still assigned to you. Next request to the server, you get sent the job again. This continues. ... It looks like Ingleside knew these issues were coming even during the beta phase. So how did WCG look at those things then? Specifically, what accounts for the WCG scheduling-server not complying* with what appears to be a 'documented procedure'?Notes: *From Ingleside [Nov 9, 2012 12:32:05 AM] post: While scheduling-servers on other BOINC-projects works as documented, example SETI@home, POEM@home, Einstein@home and so on, the WCG scheduling-server does not work as it should. ; |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We are the only ones using homogenous_app_version - and I believe we are the first. And after reviewing the BOINC code today, I've discovered that it has a complete lack of support for the app_info.xml/anonymous mechanism. Additionally, not all projects use the resend lost results feature.
----------------------------------------This is why I manually moved the app version numbers up so the part of code that deals with deprecated app versions will no longer be executed. However, since we are using the homogenous_app_version logic, we still traverse a path of code that other projects aren't using. And it is that code in particular that doesn't check which app/platform/plan_class sent by the client for the anonymous platforms. We need to use this feature because the nvidia results don't match the ati results and this is the only mechanism available to separate the results into different buckets. We are going to have to modify this section of so that it works correctly. And just to be clear - the code we are talking about is standard stock BOINC code. It can be found here: http://boinc.berkeley.edu/trac/browser/boinc/sched/sched_resend.cpp#L155 http://boinc.berkeley.edu/trac/browser/boinc/sched/sched_version.cpp#L536 http://boinc.berkeley.edu/trac/browser/boinc/sched/sched_version.cpp#L420 [Edit 2 times, last edit by knreed at Nov 17, 2012 4:54:33 AM] |
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
Thanks for the update Kevin. We appreciate all the hard work you and the other techs do keeping everything running as smoothly as possible.
----------------------------------------![]()
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
|