| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 21
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Simple question: Why did all and every single validated result from the August 30 12:06 to August 31 00:06 get a Modtime change after midnight UTC?
----------------------------------------Wasted hours paining the brain and verifying my program in trying to understand why the last night at 23:56 run XML pull was fine, and this morning not a single valid result had a ModTime stamp of that period EVEN THOUGH, there was no change to the result, quorum, distribution. For example: MCM1_ 0016433_ 6756_ 1-- 735 Valid 8/29/15 22:33:22 8/30/15 09:32:21 6.66 102.8 / 104.5 MCM1_ 0016433_ 6756_ 0-- 735 Valid 8/29/15 22:33:00 8/30/15 19:45:02 3.71 106.1 / 104.5 This one got received by servers, yesterday at 19:45:02 and had a modtime stamp of 1440963908 which is 15-08-30 19:45:08, 6 seconds later. Then this morning after running the period end, the ModTime stamp for this result changed to 1440982019 which is 15-08-31 00:46:59. And this is with ALL the results even though there is no transaction. My program is working on ModTime filtering [Your API], and it's been working great, but this is just plain a mess. What more broke last night? (Apologies to the French for abusing la langue francophone) [Edit 1 times, last edit by Former Member at Aug 31, 2015 11:27:33 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
More broke:
MCM1_ 0016420_ 5655_ 1-- 3113135 Valid 8/29/15 09:47:03 8/30/15 00:15:31 6.90 / 7.05 108.4 / 102.4 There is no quorum distribution info to say when the wingman came in. Project Name: Created: Name: Minimum Quorum: 0 Replication: 0 |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
![]()
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, the above pic depicts the inner mood... going through the valid result quorums one by one [and finding those entirely blank quorums as well], and the stroke of luck having done a previous update at 23:56 CEST before bedding, preserving most 'correct' modtimes, got me to reconstruct what went into the 00:06 UTC stats.
@Techs, the question was asked before, going to ask this again: Is there another field, not part of the API, that sets if a result was included in any past statistics? If yes [have little doubt], can we please get this added to the API. Ideally it's not just a flag, but rather a period statistics timestamp, maybe both. If we get this added, and anything going foul with modtime, being a scheduler tool, is of no consequence to the API utilizers. BTW, this spill item MCM1_ 0016402_ 7603_ 1-- 735 Valid 8/28/15 16:42:59 8/31/15 00:03:10 3.34 110.6 / 108.7 MCM1_ 0016402_ 7603_ 0-- 735 Valid 8/28/15 16:42:48 8/29/15 12:23:19 8.32 106.9 / 108.7 Had it's stamp changed too, so it's no longer recognized as being in the spill... MCM1_0016402_7603_0 146 mcm1 Valid 106,89 8,3235 0:08:19:25 8,5518 0 108,73 13,06 97,33% 1440996668 1 29-08-15 12:23 04-09-15 16:42 28-08-15 16:42 5 1 15-08-31 04:51:08 |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
There is an additional field called file_delete_state that is used to control when a files associated with the result can be deleted from the filesystem. It goes from 0 -> 1 (ready for delete) -> 2 (deleted). It is not currently represented in the API but would affect the field ModTime.
I'll add it to the output so that you can track it as the source of change going forward. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Ok - I've made the code change. The response to the API will look like this once we have released the change.
It should be out late this week or early next (but some factors will play into that so it could be a little longer).
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thx for quick response. I'll play with this additional field. If the count into the stats causes the value to be set [does it?], it's as good as needed. Then it would be something along the line of
If in archive already, and in active data set with 1, [and modtime is on before period timestamp], then count, else ignore. It will have to traced in my app to understand when the value flips, to formulate a conditional count yes/no. Just to be sure, this is not going to break anything for those set up using the current API? Presume if it's not specified in the mapping template it will remain ignored on the fetch after go-live. If you could add that sort key on modtime on the go it would be great. This throws anything changed to the top of the list. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well the whack continues, more and more results that have their modtime changed even though there's no change to quorum, validation state or nothing, just the modtime. As I was already archiving counted results, kind of like the FileDeleteState, using this with an ISNA to conditionally color these and throw them out of totals, until it happens again, that I do not run the update shortly after the stats-commencement [and I did not]! No telling which stats period these results went into, too many. In short, something changed. Before a few days ago, could leisurely run the update even just before the next after stats i.e. had near 12 hours leeway and still get 100% tally. Now it's 'all bets are off', unless running at exactly 00:06 or 12:06... Want to find a volunteer to adapt to that? Effectively, relegated my whole project to -unreliable-, just as pirogue's WCGDAWS was sent up the famous creek.
----------------------------------------To list those that I'm at a loss for... Those with a modtime change between 13:45:41 UTC and 14:10:37... ran the queries at 16:06 Name AppName Status ModTime Outcome ReceivedTime Last Mod Stats_Period MCM1_0016513_7110_2 mcm1 Valid 1441202161 1 02-09-15 10:55 15-09-02 13:56:01 Open for Deletion? OET1_0001174_xSDGP-F_rig_29397_0 oet1 Valid 1441203037 1 02-09-15 09:57 15-09-02 14:10:37 Open for Deletion? E232526_456_S.234.C25F6H16N2.QBZWHKNIMBUFAM-UHFFFAOYSA-N.4_s1_14_1 cep2 Valid 1441201788 1 02-09-15 08:15 15-09-02 13:49:48 Open for Deletion? MCM1_0016505_9025_1 mcm1 Valid 1441202061 1 02-09-15 05:33 15-09-02 13:54:21 Open for Deletion? OET1_0001174_xSDGP-F_rig_55142_0 oet1 Valid 1441202899 1 02-09-15 05:30 15-09-02 14:08:19 Open for Deletion? MCM1_0016503_3838_0 mcm1 Valid 1441202042 1 02-09-15 05:06 15-09-02 13:54:02 Open for Deletion? MCM1_0016492_3308_1 mcm1 Valid 1441201906 1 02-09-15 03:56 15-09-02 13:51:46 Open for Deletion? MCM1_0016507_4136_2 mcm1 Valid 1441202088 1 02-09-15 03:55 15-09-02 13:54:48 Open for Deletion? MCM1_0016508_3522_2 mcm1 Valid 1441202093 1 02-09-15 03:41 15-09-02 13:54:53 Open for Deletion? E232525_117_S.230.C29H19N3O2.GUTHJVDTNRYXRO-UHFFFAOYSA-N.3_s1_14_1 cep2 Valid 1441201771 1 02-09-15 03:19 15-09-02 13:49:31 Open for Deletion? OET1_0001173_xSDGP-F_rig_61747_0 oet1 Valid 1441202677 1 02-09-15 02:22 15-09-02 14:04:37 Open for Deletion? MCM1_0016498_7257_1 mcm1 Valid 1441201981 1 01-09-15 17:52 15-09-02 13:53:01 Open for Deletion? MCM1_0016411_2102_0 mcm1 Valid 1441201609 1 29-08-15 14:50 15-09-02 13:46:49 Open for Deletion? MCM1_0016373_2610_0 mcm1 Valid 1441201541 1 28-08-15 08:50 15-09-02 13:45:41 Open for Deletion? Edit: Several to correct data and highlight. [Edit 3 times, last edit by Former Member at Sep 2, 2015 6:11:51 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Maybe it's time to check out some client-version tolerances at WCG. Again last night a result had a modtime change after the midnight hour when validating before. Just one case, so it was easy to find. First it validated at 22:44:41. This one a 'No Reply' was still started *After* the deadline had expired on a 6.10.58 client. This would not have happened on v7. The client would have decided to abort this, as unstarted.
----------------------------------------MCM1_ 0016347_ 1030_ 2-- 735 Server Aborted 9/2/15 15:14:34 9/3/15 00:12:32 0.00 0.0 / 0.0 MCM1_ 0016347_ 1030_ 1-- 735 Valid 8/26/15 15:14:04 9/2/15 22:44:41 3.46 98.0 / 100.1 MCM1_ 0016347_ 1030_ 0-- 735 Valid 8/26/15 15:13:39 8/27/15 15:31:02 8.16 102.1 / 100.1 The log tells it set off at 11:56 and finished at 15:33:00 local time, then reported 22:44:41. Clients are supposed to report immediately for late tasks, but still it seems there was an 11 minute delay at the least from :33 to :44, but for all the assumption it could have delayed reporting 1:11 hours or 2:11. Computing back from the derived, 22:33 minus 3.57 hours runtime, it was started around 18:36, about 3:15 hours *after* deadline. <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.35_windows_x86_64 -SettingsFile MCM1_0016347_1030.txt -DatabaseFile dataset-17_72_SDG_v1.txt Settings File DateOfDesign = 04/17/2015 [11:56:10] Initializing [11:56:15] Running [11:56:47] EvaluateFitnessOfStartingGeneSignatures 103312 Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.35_windows_x86_64 -SettingsFile MCM1_0016347_1030.txt -DatabaseFile dataset-17_72_SDG_v1.txt [18:23:57] Initializing [18:24:03] Running [18:24:19] EvaluateFitnessOfStartingGeneSignatures 103312 Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.35_windows_x86_64 -SettingsFile MCM1_0016347_1030.txt -DatabaseFile dataset-17_72_SDG_v1.txt [09:41:21] Initializing [09:41:27] Running [09:41:43] EvaluateFitnessOfStartingGeneSignatures 103312 Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.35_windows_x86_64 -SettingsFile MCM1_0016347_1030.txt -DatabaseFile dataset-17_72_SDG_v1.txt [13:56:30] Initializing [13:56:36] Running [13:56:53] EvaluateFitnessOfStartingGeneSignatures 103312 [15:33:00] Writing final output [15:33:00] Closing Output Stream [15:33:00] Cleaning up Result.out = 1871.000000 Run complete, CPU time: 12457.179086 15:33:00 (3316): called boinc_finish </stderr_txt> ]]> Just wonder, whilst autonomous abort may not work, maybe consider a code enhancement for Berkeley, to always force a client to check back with project scheduler before starting a late task. Also for the repair client, checking back with servers before starting a repair. Something that makes the server aware one or the other is working on the task. Right now, it is pure chance that the 3rd copy was not computed. What a waste. Anyway, off topic, but it adds to the period-in / period-out determination problem. Not really sitting here bleary eyed at 02:06:11 in the morning to run my app... it would have been ahead of the issue for sure, though if there were something equivalent to cron in Windows, it might be possible to do this eyes wide shut. Errata: The code that 'should' have initiated a client abort EXIT_UNSTARTED_LATE 200 Task was aborted due to it having not started and already past the deadline. [Edit 1 times, last edit by Former Member at Sep 3, 2015 12:39:04 PM] |
||
|
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 328 Status: Offline Project Badges:
|
How different is the Windows Task Scheduler compared to unix cron command for a daily task?
|
||
|
|
|