Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 60
Posts: 60   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 255876 times and has 59 replies Next Thread
KieX
Cruncher
Spain
Joined: Dec 19, 2009
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

Kevin,

Not sure if the information you requested was for this thread's problem or for the post regarding 'unexpected XML tag or syntax'. But with regards to the association of PV and not receiving any tasks, this is what my log shows for a similarly affected computer:

17/11/2012 02:51:59 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2202771; resource share 100

17/11/2012 02:52:00 | World Community Grid | Sending scheduler request: To fetch work.
17/11/2012 02:52:00 | World Community Grid | Requesting new tasks for CPU and ATI
17/11/2012 02:52:03 | World Community Grid | Scheduler request completed: got 1 new tasks
17/11/2012 02:52:03 | World Community Grid | Resent lost task X0960073631347200608011011_4
17/11/2012 02:52:03 | World Community Grid | [error] App version returned from anonymous platform project; ignoring
17/11/2012 02:52:03 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4
17/11/2012 02:58:19 | World Community Grid | Sending scheduler request: To fetch work.
17/11/2012 02:58:19 | World Community Grid | Requesting new tasks for ATI
17/11/2012 02:58:23 | World Community Grid | Scheduler request completed: got 1 new tasks
17/11/2012 02:58:23 | World Community Grid | Resent lost task X0960073631347200608011011_4
17/11/2012 02:58:23 | World Community Grid | [error] App version returned from anonymous platform project; ignoring
17/11/2012 02:58:23 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4


Unless I'm mistaken, the same WU that was initially discarded was subsequently sent again a few minutes later. I hope this helps.
----------------------------------------
Join our small but dedicated team at: TechPowerUp!


[Nov 17, 2012 3:22:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Tomahawk4196
Advanced Cruncher
USA
Joined: Aug 16, 2007
Post Count: 93
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

I apologize for my ignorance, but where exactly is the 'messages log'? And where are the 'messages on your client'?

I see the 'event log' under the Advanced pulldown - is that the same thing?

Thanks
----------------------------------------

[Nov 17, 2012 3:26:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

I found a work around if you're using an app_info file but you may need to have at least 1 WU of each version in your cache for it to work. That's the way i got it to work.
If you have removed the app_info file from the projects folder use the text below to replace it. Shut down boinc, copy and paste into notepad. This text is for my rigs to run 10 WUs at a time. Make the necessary changes for your rigs accordingly in the
<avg_ncpus>0.80</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
and
<count>.10</count>
area of the code for both apps to what you need for your rigs.
Rename the file app_info.xml, drop it back into the WCG project folder, restart boinc and you should be good to go. I was able to get new 705 tasks after the 656 task had finished. Then some more 656 tasks showed up so it seems to be working. *fingers crossed* biggrin

<app_info>
<app>
<name>hcc1</name>
<user_friendly_name>Help Conquer Cancer</user_friendly_name>
</app>
<file_info>
<name>wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1</name>
<executable/>
</file_info>
<file_info>
<name>hcckernel.cl.7.05</name>
<executable/>
</file_info>
<app_version>
<app_name>hcc1</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<plan_class>ati_hcc1</plan_class>
<avg_ncpus>0.80</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>ATI</type>
<count>.10</count>
</coproc>
<file_ref>
<file_name>wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>hcckernel.cl.7.05</file_name>
<open_name>hcckernel.cl</open_name>
</file_ref>
</app_version>
<app>
<name>hcc1</name>
<user_friendly_name>Help Conquer Cancer</user_friendly_name>
</app>
<file_info>
<name>wcg_hcc1_img_6.56_windows_intelx86__ati_hcc1</name>
<executable/>
</file_info>
<file_info>
<name>hcckernel.cl.6.56</name>
<executable/>
</file_info>
<app_version>
<app_name>hcc1</app_name>
<version_num>656</version_num>
<platform>windows_intelx86</platform>
<plan_class>ati_hcc1</plan_class>
<avg_ncpus>0.80</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>ATI</type>
<count>.10</count>
</coproc>
<file_ref>
<file_name>wcg_hcc1_img_6.56_windows_intelx86__ati_hcc1</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>hcckernel.cl.6.56</file_name>
<open_name>hcckernel.cl</open_name>
</file_ref>
</app_version>
</app_info>
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Nov 17, 2012 3:26:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

I apologize for my ignorance, but where exactly is the 'messages log'? And where are the 'messages on your client'?

I see the 'event log' under the Advanced pulldown - is that the same thing?

Thanks

Yes.
----------------------------------------
[Nov 17, 2012 3:45:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

So here is the problem:

17/11/2012 02:58:23 | World Community Grid | [error] App version returned from anonymous platform project; ignoring
17/11/2012 02:58:23 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4


A job gets assigned that uses the old app version.
The host is anonymous platform, so it ignores the app version sent.
The host can't find app version that matches the platform, version num and plan_class so it discards the job

The problem is that discarding the job does not report to the server that the client isn't running it. Thus the job is still assigned to you.

Next request to the server, you get sent the job again. This continues.

Even worse, each time the job is sent to you the deadline for the job is re-evaluated and possibly slightly increased. Thus it can potentially never pass its deadline.

When I started digging into this problem today, there were a lot of computers who were repeatedly being resent the same jobs.

This issue occurs when all three of app_info.xml is used, homogenous_app_version and resend_results.


Resolving this issue for the long run is going to be somewhat tricky. As a result, what I am doing now is changing the app_version on the workunits to all be at the 705 level. The new binaries are backward compatible so this shouldn't be an issue. This should return life to normal for now.
[Nov 17, 2012 3:48:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

Computer ID 2015741
Windows 7 x64
8 GB DDR3 RAM
BOINC 7.0.36
GPU Radeon HD 7770, not O/C
Do not have a "app_info.xml" file, and I haven't manually customized any BOINC configurations files.

These are listed as IP, but they are not listed in my BOINC client "Tasks" or "stdoutdae" or "stdoutdae.old" file. Most are 705 app (which I'm currently crunching without problem) and one is 656.

X0930076580255200610242352_ 0-- hickory In Progress 11/14/12 12:50:27 11/21/12 12:50:27 0.00 / 0.00 0.0 / 0.0
X0900076591287200610181534_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0
X0900076591296200610181534_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0
X0900076590697200610121550_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0
X0960075851301200609151151_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0
X0900076591297200610181533_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0
X0960074130401200609131417_ 2-- hickory In Progress 11/14/12 12:50:27 11/17/12 08:02:27 0.00 / 0.00 0.0 / 0.0
X0930075120602200610031638_ 0-- hickory In Progress 11/11/12 09:14:50 11/18/12 09:14:50 0.00 / 0.00 0.0 / 0.0

I have about 1.5 pages of "No Reply" for this same device, but a spot check of the WU's were not found in my "stdoutdae.(old)" logs. Very strange, sad

I've manually stopped receiving new work and am finishing up what I have now to see what I can do to help flush/retreive the zombie WU's.

Edit: Changed PV to IP
----------------------------------------
[Edit 1 times, last edit by BSD at Nov 17, 2012 4:09:44 AM]
[Nov 17, 2012 4:07:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

applause
Here is to normality! We have been missing it.

Lawrence
[Nov 17, 2012 4:08:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

... The problem is that discarding the job does not report to the server that the client isn't running it. Thus the job is still assigned to you. Next request to the server, you get sent the job again. This continues. ...
It looks like Ingleside knew these issues were coming even during the beta phase. So how did WCG look at those things then? Specifically, what accounts for the WCG scheduling-server not complying* with what appears to be a 'documented procedure'?

Notes:
*From Ingleside [Nov 9, 2012 12:32:05 AM] post:
While scheduling-servers on other BOINC-projects works as documented, example SETI@home, POEM@home, Einstein@home and so on, the WCG scheduling-server does not work as it should.
;
[Nov 17, 2012 4:18:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

We are the only ones using homogenous_app_version - and I believe we are the first. And after reviewing the BOINC code today, I've discovered that it has a complete lack of support for the app_info.xml/anonymous mechanism. Additionally, not all projects use the resend lost results feature.

This is why I manually moved the app version numbers up so the part of code that deals with deprecated app versions will no longer be executed.

However, since we are using the homogenous_app_version logic, we still traverse a path of code that other projects aren't using. And it is that code in particular that doesn't check which app/platform/plan_class sent by the client for the anonymous platforms.

We need to use this feature because the nvidia results don't match the ati results and this is the only mechanism available to separate the results into different buckets.

We are going to have to modify this section of so that it works correctly.

And just to be clear - the code we are talking about is standard stock BOINC code. It can be found here:

http://boinc.berkeley.edu/trac/browser/boinc/sched/sched_resend.cpp#L155
http://boinc.berkeley.edu/trac/browser/boinc/sched/sched_version.cpp#L536
http://boinc.berkeley.edu/trac/browser/boinc/sched/sched_version.cpp#L420
----------------------------------------
[Edit 2 times, last edit by knreed at Nov 17, 2012 4:54:33 AM]
[Nov 17, 2012 4:28:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

Thanks for the update Kevin. We appreciate all the hard work you and the other techs do keeping everything running as smoothly as possible. applause
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Nov 17, 2012 4:45:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 60   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread