| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 89
|
|
| Author |
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
My Statistics and My Team to capture daily Project Stats for myself and the UK Team ( Project order is not a concern). Capture of All Time Stats and Last Result Returned for individual Team Members via Multiple Member Comparison . For these you can use the XML - right? I also occasionally use a screen scrape to capture data for members identified as having a Great Britain location in Statistics by Geography . (Data by country does not appear to be available in XML format.) Unfortunately - it is not available. Is it anticipated that the data currently available in XML format will be affected by the ongoing website redesign? Not in the near future. Down the road (mid-2014 at the earliest) we will be developing some better visualizations of the data and that is going to require a better API to be developed than what we have now. We might deprecate it sometime after that. [Edit 1 times, last edit by knreed at Feb 5, 2014 3:34:52 PM] |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
However, if you have fr_fr as your primary language in your browser, then when you arrive on our site, the French formatting rules should be in effect. Let me know if you see otherwise. It works as you say, Kevin, and now I can rebuild the scenario of my mysterious changes:Sometimes I switch languages for whatever reason and when I come back to French the "Canadian" setup leaves me with the wrong dormat. And since I logoff/logon only once a month to enroll my team to new challenges I could stay with the wrong format for several days. Now I know how to quickly fix it by correcting the parameter in the address line if necessary, and anyway I have forced the correct language setting in all my stats bookmarks now. So everything is fine for me. Thanks Kevin. |
||
|
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges:
|
My Statistics and My Team to capture daily Project Stats for myself and the UK Team ( Project order is not a concern). Capture of All Time Stats and Last Result Returned for individual Team Members via Multiple Member Comparison . For these you can use the XML - right? I've switched All Time Stats and Last Result Returned to XML but I can't make XML work for the UK Team Project Stats Any suggestions that is going to require a better API to be developed than what we have now. I'll look forward to that. ![]() |
||
|
|
Tullus
Cruncher Joined: Nov 14, 2008 Post Count: 29 Status: Offline Project Badges:
|
I do scraping of the public xml on: /verifyMember.do?name={name}&code={code}
In addition I scrape the html task list, in a similar manner to WCGDAWS, although my program didn't break ;) My program (which is available here: https://code.google.com/p/py-boinc-plotter/), works for multiple boinc projects, but has to treat worldcommunitygrid as an exception in many parts of the code. If you could integrate better with the standard boinc environment that would be fantastic. Either by contributing so that other boinc projects can utilize your work, or by utilizing/modify the existing boinc webpage structures. In this way your webpage will benefit from the open source community, and the open source community would benefit from you. In addition, extending the xml support would be great, since parsing html is a pain. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
I have three principal instances of screen scrapping: My Statistics and My Team to capture daily Project Stats for myself and the UK Team ( Project order is not a concern). Capture of All Time Stats and Last Result Returned for individual Team Members via Multiple Member Comparison . I also occasionally use a screen scrape to capture data for members identified as having a Great Britain location in Statistics by Geography . (Data by country does not appear to be available in XML format.) @jonnieb-uk If you had: http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true (sort could be any of cpu, points or results) http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true (sort could be any of cpu, points or results) Would that meet your needs? |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
I'm parsing the HTML to pull from the "MY CONTRIBUTION", "Global Statistics" and "Results Status" pages. The data I'm pulling from these pages is everything but the headings, graphics and links. I'm also pulling data from the Member and Team Statistics pages using &xml=true. For the "My Contribution" stats - how come you don't use: http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=profile#335 ? If I were to make available something similar to the verification url but that would return data from the result status page, would you use that instead of scrapping the results status page? I'm thinking something like: http://www.worldcommunitygrid.org/verifyMembe...amp;code=VERIFICATIONCODE with optional parameters for project status of result (valid, invalid, pending verification, pending validation, etc) |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
[I've switched All Time Stats and Last Result Returned to XML but I can't make XML work for the UK Team Project Stats Any suggestions which page is the 'UK Team Project Stats? I can generate this one: http://www.worldcommunitygrid.org/team/viewTe...=L721SPD4BN1&xml=true Were you referring to a different page? |
||
|
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges:
|
[I've switched All Time Stats and Last Result Returned to XML but I can't make XML work for the UK Team Project Stats Any suggestions which page is the 'UK Team Project Stats? I can generate this one: http://www.worldcommunitygrid.org/team/viewTe...=L721SPD4BN1&xml=true Were you referring to a different page? That looks fine Kevin, thank you Any problems I'll let you know.If you had: http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true (sort could be any of cpu, points or results) http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true Thats what I have been using and the additon of &xml=true does not result in .xml output |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
If you had: http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true (sort could be any of cpu, points or results) http://www.worldcommunitygrid.org/stat/viewCo...untryCode=GB&xml=true Thats what I have been using and the additon of &xml=true does not result in .xml output Emphasis on the 'if' Before I build, I wanted to make sure it was what you need. |
||
|
|
pirogue
Veteran Cruncher USA Joined: Dec 8, 2008 Post Count: 685 Status: Offline Project Badges:
|
We rolled out the first change of the changes to our website that we are going to be frequently doing over the next 3-6 months. The HTML structure is going to be changing a fair amount as we do this rework and screen scraping will not be a reliable way to access data on an ongoing basis during this work stream. I'd like to hear from those people who are doing screen scraping and let us know what you are doing and what data you are going after and we can see what we can do to help you let your tools remain stable during these changes. Everybody probably has a slightly different definition of screen scraping and how they implement it. This seems a particuarly apt definition: I'm parsing the HTML from the results pages. This isn't what was broken. Stupidly on my part, a missing "*" brought everything to a grinding halt. I was using what I thought was a good indicator of a successful login to know whether someone was logged in successfully. Parsing the HTML in generated web pages with programs designed to mine out particular patterns of content. In either guise screen-scraping is an ugly, ad-hoc, last-resort technique that is very likely to break on even minor changes to the format of the data A lot of the WCG data is available in XML format which is easier to handle and (hopefully) resilient to changes in website design. So for example if you are interested in the AllTime Runtime stats of XtremeSystems team members shown at http://www.worldcommunitygrid.org/team/viewTe...&numRecordsPerPage=10 adding "&xml=true" will provide the same data in XML format which is easily imported into a spreadsheet (in Excel using "from Web" on the Data tab). http://www.worldcommunitygrid.org/team/viewTe...dsPerPage=10&xml=true Unfortunately in your example Results Status is not available in XML format. I would have suggested that you use pirogue's utility programme WCGDAWS (World Community Grid Device and Workunit Stats) see thread but it's broken until updated for Fridays's changes. I changed it to look for another, hopefully more reliable, indicator.Having a login mechanism to get at personal result data in XML would be a good thing. Doing so would probably eliminate the need for wcgdaws (also possibly a good thing, at least for me ), but I can live with that.I'd be willing to help test it. |
||
|
|
|