World Community Grid - View Thread - Data types to use to receive data from API in MYSQL

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: Data types to use to receive data from API in MYSQL

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 24

[ ]

Author

This topic has been viewed 7599 times and has 23 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Data types to use to receive data from API in MYSQL

Ah, that's great! Btw, I realized this morning that I should have been checking for database connectivity before I execute the API and load. I had MySQL down and got an error when I ran this morning. So I added a test function for MySQL.

So my next steps are to add error traps in all functions as well as add a case statement to allow various functions that could operate standalone to be specified as arguments at run time.

I also incorporated the content of the README into the script function-by-function to help clarify purposes. wink

Best,

-mark

----------------------------------------
[Edit 2 times, last edit by Former Member at Apr 17, 2019 1:30:02 AM]

[Apr 17, 2019 1:27:55 AM]

twilyth
Master Cruncher
US
Joined: Mar 30, 2007
Post Count: 2130
Status: Offline
Project Badges:

20 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

45 day badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

10 year badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

50 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Data types to use to receive data from API in MYSQL

When I saw your post at the end of March I too was starting almost the same project! cool

To that end, I've got a small script together that uses the WCG API to download current workunits, reformat that output to CSV, and then loads it into a MySQL database.

The code can be found on GitHub here: https://github.com/msellan/wcg_bash The script contains a function to create a table with appropriate schema definitions. So far it's been running for a week without issue. I'm happy to answer questions and am interested in feedback or suggestions from anyone.

I looked at the new overview and code and lifted your table definition parms. I'm going to try to use the mysql python connector and the ~~xtree~~ etree library to try to pull in new table entries.

I'm curious about how you update your table. Do you save all of the data for every work unit from dispatch to validation and credit or do you overwrite obsolete information, for example when a status field is updated do you save that in a new record or overlay the old information. I'm wondering if there might be some useful information to be gleaned from following a wu over it's life, assuming that one is able to catch every set of updates.

Sure, when I first downloaded sample workunits from the API I thought there were 18 fields. But then later when I downloaded a full set of workunits I noticed that some records had an extra field, "ReceivedTime" which is not sent for a workunit until after the work has completed and it's been 'received' by the server. But in the JSON, the field doesn't show up for a record until it has a timestamp. So really a complete record is made of 19 fields.

So I decided to insert the 'ReceievedTime' field, as a placeholder, even for records that don't have it assigned yet. I choose to fill it with the Unix Epoch date, rather than null, as it will stand out and allow me to query all workunits that haven't been received knowing that later UPDATE statements will overwrite the Epoch date once the workunit is processed.

This probably saved me at least a few hours of aggravation. Thank you.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by twilyth at Apr 24, 2019 12:12:59 PM]

[Apr 24, 2019 12:10:25 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Data types to use to receive data from API in MYSQL

That's really a great question. The hardest part of this little project for me was puzzling over what I wanted to do with the data once I started downloading it!

In the end, I decided to simply do UPDATEs on the records rather than INSERTs for all the changes for each work unit. I realized after a while that my main goal was to be able to query the outcomes of each work unit - how many succeeded, how long did it take to process, which projects have failures more often than others, which of my PCs process more units, etc. and so I overwrite the obsolete data, as you mention.

I too wonder if there's interesting or useful information tracking a workunit across its lifespan. I don't think it would be hard to catch all the updates. I just reworked part of the script that was the most inefficient so that now it executes in about 10 seconds from 3-4 minutes before. So it seems conceivable that it could run every 10 minutes (or something like that) - I'm not sure what the minimum frequency would need to be to ensure every change is caught. But since it only makes one call to the API, even at a 10 minute frequency, the WCG server(s) shouldn't be too hammered.

I'm looking forward to seeing what you create. I've been a unix admin/shell programmer most of my career which was why I wrote it in bash. But originally I was thinking about writing it in Python as a Python learning exercise. So seeing a Python solution will be helpful. wink

Best,

-mark

[Apr 27, 2019 9:46:08 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Data types to use to receive data from API in MYSQL

@twilyh - I realized the other day that although I'm overwriting obsolete data in the MySQL table, I'm also archiving the original API data downloads and the script-generated SQL load scripts (which have all the data but formatted) into a folder on the filesystem.

So effectively, I could reload all of it into another table and not do UPDATEs but INSERTs for all the stages of the work units to see what that might look like as a fun comparison. If I get around to that experiment sometime soon, I'll post back the results. wink

Best,

-mark

[May 11, 2019 2:12:19 AM]

[ ]