Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 89
Posts: 89   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 463776 times and has 88 replies Next Thread
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

I'm parsing the HTML to pull from the "MY CONTRIBUTION", "Global Statistics" and "Results Status" pages. The data I'm pulling from these pages is everything but the headings, graphics and links. I'm also pulling data from the Member and Team Statistics pages using &xml=true.


For the "My Contribution" stats - how come you don't use: http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=profile#335 ?)

I didn't know it existed. shhh I'll change my capture logic to use that.


If I were to make available something similar to the verification url but that would return data from the result status page, would you use that instead of scrapping the results status page? I'm thinking something like:

http://www.worldcommunitygrid.org/verifyMembe...amp;code=VERIFICATIONCODE

with optional parameters for
project
status of result (valid, invalid, pending verification, pending validation, etc)

What I'm pulling from the Results Status pages are: Result Name, Device Name, Status, Sent Time, Due/Return Time, CPU/Elapsed Time and Claimed/Granted Credit.

If it would return that information I would absolutly use it.

Thank you!!
----------------------------------------
Bill P

[Nov 13, 2013 8:20:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss


Emphasis on the 'if' wink Before I build, I wanted to make sure it was what you need.


Yes that's it exactly. I'm unimportant but I think that SNURK would be very happy if the time could be found to build. smile
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Nov 13, 2013 8:54:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SNURK
Veteran Cruncher
The Netherlands
Joined: Nov 26, 2007
Post Count: 1217
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss


That would get my thumbs up too!! smile
----------------------------------------
[Nov 13, 2013 9:14:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pirogue
Veteran Cruncher
USA
Joined: Dec 8, 2008
Post Count: 685
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss


If I were to make available something similar to the verification url but that would return data from the result status page, would you use that instead of scrapping the results status page? I'm thinking something like:

http://www.worldcommunitygrid.org/verifyMembe...amp;code=VERIFICATIONCODE

with optional parameters for
project
status of result (valid, invalid, pending verification, pending validation, etc)
I didn't see this earlier.
That would work.

It might also be nice to have something like &changedafter=TIMESTAMP and only receive WUs that had a status change after that date/time.
----------------------------------------

[Nov 13, 2013 10:19:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss


If I were to make available something similar to the verification url but that would return data from the result status page, would you use that instead of scrapping the results status page? I'm thinking something like:

http://www.worldcommunitygrid.org/verifyMembe...amp;code=VERIFICATIONCODE

with optional parameters for
project
status of result (valid, invalid, pending verification, pending validation, etc)
I didn't see this earlier.
That would work.

It might also be nice to have something like &changedafter=TIMESTAMP and only receive WUs that had a status change after that date/time.


+1 here. Would save having to look at all results. Not required by me, but as long as we're compiling as wish list... biggrin
----------------------------------------
Bill P

[Nov 14, 2013 2:27:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Hi guys,
after some weeks business travelling, I am back and I would like to bring my small contribution to this hot topic.
  • As mentioned by GIBA, the device manager menu is well hidden. IMHO, it should be easier to reach. @JM Boullier, you can only bookmark something, if you know where it is. I found the device manager topic, but I am not sure that the Lambda user (layman) will realise that such a functionality is available.
  • Even if screen scrapping represents an interesting software development challenge (html parsing and DOM), I miss deeply since many years, the possibility to fetch the current results using XML. It seems to me to be more efficient for both: tool developer and WCG bandwidth as well. I asked already for such a solution since over 5 years, unfortunately without any success. jonnieb-uk's comment shows that I am not alone with this wish.
  • Whatever happens with the web site design, I think that a couple of data connectors (xml-based API) should be clearly defined and that those remain stable over the time. If necessary, a formal Technical Requirements Specification (TRS) could be elaborated by some "advanced" members and agreed and implemented by WCG.
  • Pirogue (thank you for the WCGDAWS update applause) and SNURK make a very good job. I think that better defined and formalised data connectors would help them a lot making their work more reliable and more efficient.

I have currently three master students working on a project for developing a contribution monitoring tool (in the meaning of WCGDAWS). This tool should be a web tool grabbing the results automatically (twice of three time daily) for a specific member (not for all members), generating reports and making the information available other a web site. Additionally - if the project progress allows it - an automated alarming system over e-mail should inform the user if some performance problems occur for a specific host, e.g.: too many invalid or errored results, no work available, etc.

We did already a try last year but unfortunately the project was not completed as expected.
I hope that the current re-design activities of WCG web site will not impact too negatively the progress of this project.
Cheers,
Yves
---
PS: If some members are interested to contribute by defining expectations for data connectors, I could volunteer for writing down the requirements within the scope of a collaborative work.
--
PPS: If some members are interested by the contribution monitoring tool, please let me know. I hope that we could have a reasonable and operating result until year end.
----------------------------------------
[Nov 17, 2013 9:32:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Tullus
Cruncher
Joined: Nov 14, 2008
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Hi KerSamson, have you seen my open source project:
https://code.google.com/p/py-boinc-plotter/
Feel free to glance over the feature page (which is slightly outdated at the moment):
https://code.google.com/p/py-boinc-plotter/wiki/Features

I think it can be used as a starting point for your monitoring tool. Feel free to discuss any issues on the google code issue site, or send me an email.
[Nov 17, 2013 10:14:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Hi Tullus,
I thank you for the information.
We will take a look on your project and we will consider if we can integrate part of it in ours.
I will keep you inform.
However, we have only a limited time for elaborating a runnable solution.
Cheers,
Yves
----------------------------------------
[Nov 20, 2013 10:50:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Hi KerSamson

I'd be interested in trying your monitoring tool when it's available, it sounds interesting.

Thanks for developing it
[Nov 20, 2013 8:27:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Tullus
Cruncher
Joined: Nov 14, 2008
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Any progress on the xml task list?

I would like to point out this:
http://boinc.berkeley.edu/trac/wiki/WebRpc#pending
If the "Get result list with pending credit" could be extended to "Get results" and the reply also contains:
Application name/Project Name, Result Name, Device Name, Status, Sent Time, Time Due, Return Time, CPU Time/Elapsed Time, Claimed, Granted BOINC Credit
that would be fantastic (and feel free to pour in any other data you might have about the task, we like data).

The structure might look something like this (note: not completely backward compatible with pending.php, so should add a new call, say "user_results.php"):

<user_results>
<application>
<name> mcm1 </name>
<user_friendly_name> Mapping Cancer Markers 7.26 </user_friendly_name>
<platform>x86_64-apple-darwin</platform>
<version_num>726</version_num>
<app_version_num>726</app_version_num>
<result>
<name>MCM1_0000269_9260_1</name>
<device_name>coffe.local</device_name>
<sent_time> [...] </sent_time>
<time_due> [...] </time_due>
[ <return_time> [...] </return_time> ]
[ <claimed_credit> N </claimed_credit> ]
[ <granted_credit> N </granted_credit> ]
<final_cpu_time>35314.390000</final_cpu_time>
<final_elapsed_time>36599.654345</final_elapsed_time>
<state>valid</state>
</result>
[...]
</application>
[...]
</user_results>


And please do share the addition with the rest of the boinc community.
----------------------------------------
[Edit 1 times, last edit by Tullus at Dec 3, 2013 8:11:53 PM]
[Dec 3, 2013 8:09:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 89   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread