| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 24
|
|
| Author |
|
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges:
|
I'm thinking about doing a project to import the API data for my devices into a MYSQL database. I haven't worked with a db in many years so I may be in over my head but it will be interesting to see how far I can get.
----------------------------------------A lot of the fields are probably obvious in terms of what data type I should but if anyone has imported API data into some other format, I'd appreciate your help. If you also happen to know what the SQL data types are, that would be even better. Also, I think for MYSQL, you have to indicate the maximum length for each field as well but I could be wrong about that. ![]() ![]() |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
To download the data, you could use my script wcgresults. For each validated WU, it stores all fields, separated by a tab, on one line, into a plaintext file, and it is possible to change that mechanism by storing the fields into a real database, of course. That shouldn't be too hard, I think.
In the current situation, as the plaintext file grows, the addition of new WUs will get slower, of course, although a significant slowdown may be meaning that you already have processed a million WUs. You can read some more things about the script here (and also elsewhere in that same thread, if needed). To store validated data through the API into a plaintext file, you could use a line like this in your crontab (where N is the value of the minute within the hour): N * * * * $HOME/bin/wcgresults -df $HOME/tmp/wcg.log (and don't forget to chmod +x $HOME/bin/wcgresults) . |
||
|
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges:
|
Thanks but I intend to do everything in Python. I also don't want to have to do any of this manually.
----------------------------------------The plan is to create something that will run in the back ground and grab the current day's XML then add that to a database and clean up any redundancies there. Thanks for the information though. I did look at the source code but I don't read - PERL? Is that the language you used? Still, I'll take another look. There might be code I can repurpose if I look a little closer. ![]() ![]() |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
The plan is to create something that will run in the back ground and grab the current day's XML then add that to a database and clean up any redundancies there. Exactly!Thanks for the information though. I did look at the source code but I don't read - PERL? Is that the language you used? Still, I'll take another look. There might be code I can repurpose if I look a little closer. It's a shellscript with several uses of Perl. The most interesting (Perl)parts for this purpose are the lookup part (to see if the WU already exists or not): if (open(LOGFILE, $ENV{opt_f})) {and the output part (to describe the statistical data belonging to the WU):printf LOGFILE "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", $AppName, $ClaimedCredit, $CpuTime, $ElapsedTime, $ExitStatus, $GrantedCredit, $DeviceId, $DeviceName, $ModTime, $WorkunitId, $ResultId, $Name, $Outcome, $ReceivedTime, $ReportDeadline, $SentTime, $ServerState, $ValidateState, $FileDeleteState; |
||
|
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges:
|
You know, I just created the list of field names and I realized that I have no idea what most of those represent.
----------------------------------------I put a question mark next to the ones I'm not sure about. Some of the single character fields look they might be boolean but I can't tell for sure. I guess once I load them into the database I can query those fields and see what the range of values is. AppName ClaimedCredit CpuTime ? ElapsedTime - is this listed for wu's in process? ? ExitStatus - any idea what the values represent? GrantedCredit DeviceId DeviceName ? ModTime - modified time? WorkunitId ResultId Name ? Outcome - same ? as for exit status ReportDeadline SentTime ? ServerState - same ? as for exit status ? ValidateState - same ? as for exit status ? FileDeleteState - same ? as for exit status If anyone from staff can give me some documentation on these fields I would appreciate it. ![]() ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I haven't seen the data types posted anywhere but they are easily obtained. Just enter in your browser: https://
----------------------------------------www.worldcommunitygrid.org/api/members/twilyth/results?code=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&Limit=50 substitute your verification code found on your My Profile page instead of the long string of Xs: (https://www.worldcommunitygrid.org/ms/viewMyProfile.do) This query will show you 50 of your results in your browser. Just as easy in python from the interactive prompt: >import requests # you may need to install the requests module >req = requests.get('same query string as above') >mydata = req.json() # you get back a list of dicts ready for processing or writing raw to sql, csv or whatever. Very handy. I found it was important to put together the query string in a useful way for the project which I implemented some time ago in python3 and sqlite3 for the database. See: https://www.worldcommunitygrid.org/help/viewTopic.do?shortName=api The data has a lifecycle as it goes from downloaded, to completed, validated, credit granted and then gets deleted. So you can only get recent data and are limited to 500 results in a request. Some of your question marks relate to the life cycle. I think ElapsedTime differs from CpuTime in that if you have other processing using a lot of CPU time then your elapsed time will be a little longer. Normally they are very close. Hope this helps. [Edit 2 times, last edit by Former Member at Mar 9, 2019 9:14:01 PM] |
||
|
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges:
|
Thanks Fuddrux. I have been to the API page but I must not have ever looked carefully at it. Thanks for pointing that out.
----------------------------------------Thanks also for the sql tips. I'm not at that stage but i'll keep this bookmarked. The database part of this is going to be a bit of a detour since it's been so long since I've studied that. Hopefully I can return part of the favor by noting that setting &limit=0 will give you all of the current wu's. ![]() ![]() |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
Hopefully I can return part of the favor by noting that setting &limit=0 will give you all of the current wu's. Awesome! I didn't know that. People using my script 'wcgresults' should be aware that they'll need to download a new version of 'wcgresults' to apply that setting of limit=0. Examples: $ wcgresults -dqqql0 |
||
|
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges:
|
De nada. I live to serve.
----------------------------------------![]() ![]() ![]() |
||
|
|
|