Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2202 times and has 4 replies Next Thread
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
WCG website device statistics scrapers heck

Another Rundgren moment... "Hello It's Me" (Again). As an avid scraper of statistics web pages for which there's no XML, today ran the usual URL injection which is equivalent to the Device Statistics Filter - Anytime(All)/Anytime(All) ...

https://www.worldcommunitygrid.org/ms/device/...dSince=0&lastResult=0

yielded more data than before, more than I can see on the website!



BATBK6810J is the last to been seen at bottom on the website, but now suddenly the table into which the fetched data is injected asked if I wanted to allow additional data to be appended. Not the first time, since today the scraped data lands in row 47, the next moment in row 46, so took yes, except this time all these never neverland devices showed up including blank lines. Whilst the routine extracts the Device IDs from the html, they appear to repeat, like 6 with the name ubuntu (probably a failed attempt at installing BOINC on Ubuntu, duh). From recollection, the names except for the ubuntu were UD devices. Closing the work book, reopen, rerun, just pulled the same i.e. this is reproducable.

Anyway, why do I earn this additional stuff all of a sudden?

P.S. The match numbers is just because another routine sums all items points/runtime/results with the same name and the profile column just connects the device statistics with the device profile table, i.e. may be ignored for the discussion.
[Aug 1, 2017 3:58:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WCG website device statistics scrapers heck

I can't find the device BATBK6810J in our database either via the device id you show or via its name. What user is it under?
[Aug 2, 2017 9:18:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: WCG website device statistics scrapers heck

This is the url

https://www.worldcommunitygrid.org/ms/device/...d=149939&deviceType=U

Under my name without * at end.

Had not realized, but the U and the B are a good hook to recover the Agent Type info and compute how many points from the legacy UD agents (the procedure removes all images from the html imported page. Must have run the device stats half a dozen times since writing (as it updates every 3 hours, but the extras after the BATBK6810J device keep coming back unperturbed.
[Aug 2, 2017 9:57:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WCG website device statistics scrapers heck

Ah - I was looking in the BOINC database for the host. Didn't find it since it was a UD device.

What devices were returned that you hadn't previously seen?
[Aug 2, 2017 10:24:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
[RESOLVED] WCG website device statistics scrapers heck

All below the BAT device do not show up on the website pages, the lines where there are no device names most puzzling, and thus no url to squeeze out any device id.

Edit: Overcame part of the missing information problem by coupling the Device Profile listing to the Device Statistics data to get the agent type from there with an index/match search. This leaves the blank lines with just 'Never' on them.

BTW, was watching 'The Ultimate Introduction to Web Scraping and Browser Automation" with some interesting observation 'you will get things to see, that might not be intended to be seen' advising to talk to the webmaster if ethics calls for that. Pretty much all web browsers come with documents inspector features, Firefox having its excellent Firebug addon, so it's clearly a web deployer's problem. (It's my plan to inject tick marks into device profiles to quickly change project selections... after the summer, maybe).

Edit2: Seems you've narrowed down the issue and put a fix in as the 'Never' records did not show up again this morning on one account, but not another. A pity in a way as was integrating Profile & Device Statistics page information, so recoded to make DevProf the primary listing, pulling in the data from DevStats.
----------------------------------------
[Edit 3 times, last edit by SekeRob* at Aug 4, 2017 11:09:37 AM]
[Aug 2, 2017 10:31:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread