Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 17
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3385 times and has 16 replies Next Thread
wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Duplicated results in web API

I've been playing with web API recently, and I realized it returns duplicated results.

In my case, even if I set query limit to 1000, I at most get 250 results back.

Say there are 730 results, I do:
Fetch from offset=0. Returned 250 results
Fetch from offset=250. Returned 250 results
Keys already exist:
OET1_0003699_x3MWPp_rig_61457_0
ugm1_ugm1_25075_0758_1
ugm1_ugm1_25075_0679_0
Fetch from offset=500. Returned 230 results
Results returned from web: 727

When I combined them, the total is not 730 but 727 results. The three duplicated keys have exact same data in both queries.

It's not hard for me to workaround this, as python dict update handles it naturally anyway. Just wondering if it's something WCG team would care, since it might be a bug somewhere.

If this is not the right place to report API issue (or general bugs), please let me know. I can move this to appropriate channel. Thanks.
[Apr 24, 2016 6:21:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

Since the Result Status data is 'live' [dynamic] and the fetch per call is restricted to 250, there is a good chance that a next call will fetch a result again, as in the meantime any of the first 250 or a later result in the fetch order did so to move to top, shifting everything 1 or more down. The faster your internet connection and the more optimal your concatenated queries run, the less chance this may occur.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Apr 24, 2016 8:19:24 PM]
[Apr 24, 2016 8:18:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

Since the Result Status data is 'live' [dynamic] and the fetch per call is restricted to 250, there is a good chance that a next call will fetch a result again, as in the meantime any of the first 250 or a later result in the fetch order did so to move to top, shifting everything 1 or more down. The faster your internet connection and the more optimal your concatenated queries run, the less chance this may occur.

I thought about this, because I do see order changing sometimes. However, overall fetched results should match the number of available results, across multiple back to back runs. Each run only takes seconds... In my tests, these WUs have consistently been shown twice. That's why I think it's duplicated in the data, instead of the timing on my side.

Honestly it's not really a concern to me given that I can tolerate losing some stats. If it's just a timing issue, I would probably end up fetching all results anyway since I run it periodically.

I wonder why cap the limit at 250? It doesn't really help since most people would have to loop and query all results anyway. It might be much more efficient to let people pass in ModTime and only return WU with a modified time older than that, if it's trying to save some server resource.
----------------------------------------
[Edit 1 times, last edit by wujj123456 at Apr 24, 2016 8:51:25 PM]
[Apr 24, 2016 8:50:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Tullus
Cruncher
Joined: Nov 14, 2008
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

Hi wujj123456. If you are playing with the web API from python, you might be interested in part of:

https://code.google.com/archive/p/py-boinc-plotter/

You will most likely want to look at parser.HTMLParser_worldcommunitygrid and task.Task_web_worldcommunitygrid.

I haven't bothered to move the project away from google.code, so some of it might be a bit dated, but should not be too bad.
[Apr 25, 2016 7:40:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

Hi wujj123456. If you are playing with the web API from python, you might be interested in part of:

https://code.google.com/archive/p/py-boinc-plotter/

You will most likely want to look at parser.HTMLParser_worldcommunitygrid and task.Task_web_worldcommunitygrid.

I haven't bothered to move the project away from google.code, so some of it might be a bit dated, but should not be too bad.

Thanks. Well, I guess you missed Google announcement a year ago:
http://google-opensource.blogspot.com/2015/03/farewell-to-google-code.html

Google code is gone, along with all projects hosted on it. :-(

From the method name, are you parsing HTML instead of using the API?
[Apr 25, 2016 7:49:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

Since the Result Status data is 'live' [dynamic] and the fetch per call is restricted to 250, there is a good chance that a next call will fetch a result again, as in the meantime any of the first 250 or a later result in the fetch order did so to move to top, shifting everything 1 or more down. The faster your internet connection and the more optimal your concatenated queries run, the less chance this may occur.

I thought about this, because I do see order changing sometimes. However, overall fetched results should match the number of available results, across multiple back to back runs. Each run only takes seconds... In my tests, these WUs have consistently been shown twice. That's why I think it's duplicated in the data, instead of the timing on my side.

Honestly it's not really a concern to me given that I can tolerate losing some stats. If it's just a timing issue, I would probably end up fetching all results anyway since I run it periodically.

I wonder why cap the limit at 250? It doesn't really help since most people would have to loop and query all results anyway. It might be much more efficient to let people pass in ModTime and only return WU with a modified time older than that, if it's trying to save some server resource.

On the emphasized, no, because between the first and second fetch request one or the other canonical result could already have been migrated off... the misses. To give an approximation, 12 results come on and another 12 come off per second [1 million plus per day], which on accounts that have thousands on their result status pages leads to a continuous reordering due status changes and the [ModTime] of the momentary transaction.

Duplicates on the database: On a willy nilly system that would be possible, yes wink (The Result Status pages are a direct window to what's going on, on the core BOINC task/result scheduling system.)
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Apr 25, 2016 11:39:23 AM]
[Apr 25, 2016 11:18:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

I tried a few more times and looks like the duplicates aren't always the same, even though the number of duplicates I got for same number of total results are mostly the same. So it does look like random timing issue, at least across longer periods. I guess the repeated same offenders I got yesterday might just be a coincidence.

Looking more closely to the fields, it doesn't seem to be ordered in anyway. (I am only querying results with ValidateState=1.) However, on website there is a way to order results.

Do you happen to know if I can specify an order with web API? It's an SQL query at the end, but not necessarily exposed to web API I suppose. All I need is some ordering that would be stable for 1 minute for me to deterministically get all results back. So anything other than ModTime will probably work. (I assume getting all results in one-shot is off-limits given that a cap is implemented at first place.)
[Apr 26, 2016 4:53:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Tullus
Cruncher
Joined: Nov 14, 2008
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

Hi wujj123456. If you are playing with the web API from python, you might be interested in part of:

https://code.google.com/archive/p/py-boinc-plotter/

You will most likely want to look at parser.HTMLParser_worldcommunitygrid and task.Task_web_worldcommunitygrid.

I haven't bothered to move the project away from google.code, so some of it might be a bit dated, but should not be too bad.

Thanks. Well, I guess you missed Google announcement a year ago:
http://google-opensource.blogspot.com/2015/03/farewell-to-google-code.html

Google code is gone, along with all projects hosted on it. :-(

From the method name, are you parsing HTML instead of using the API?


Yes, I know about the google code farewell, just to lazy to move. I was about to tell you that the 'downloads' still work, but found they are mostly empty .tar files, so I must have messed up. Moved it to github:
https://github.com/obtitus/py-boinc-plotter/
I still need to move the wiki somehow.

I was originally parsing the html, so therefore the name :) I am still parsing xml for the badges and stuff.
[Apr 26, 2016 4:20:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

(I am only querying results with ValidateState=1.)
Even doing this you will get duplicate units where all is the same except for the mod time. I have stopped wondering why the mod time would change on a unit which has already been validated. It must make sense to the techs. Once I stick my query into a spreadsheet, I can deal with the duplicate issue.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Apr 26, 2016 9:33:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Duplicated results in web API

(I am only querying results with ValidateState=1.)
I have stopped wondering why the mod time would change on a unit which has already been validated. It must make sense to the techs.

This I happen to see it happening. Modtime can change after validation because the files are deleted. That might eventually happen for all results, but yeah, from user's point of view, it's no longer interesting once results are validated.
[Apr 27, 2016 2:21:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread