Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 10
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2125 times and has 9 replies Next Thread
biscotto
Cruncher
Italy
Joined: Apr 11, 2020
Post Count: 27
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
OPN1 tasks don't respect write to disk interval

Hello,
OPN1 tasks seem to write to disk too much: checking with iotop records up to 1GB of data over 30 mins for each task, which is too high.
It seems OPN1 tasks don't respect the write to disk interval, while MCM tasks do. Is there an official reason for this?
----------------------------------------
Papa Ryzen 5 3600 / Mama Radeon RX 560

----------------------------------------
[Edit 2 times, last edit by Biscotto at Oct 9, 2021 2:18:51 PM]
[Oct 9, 2021 2:17:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

The longest currently running job of OPN on my device is 2 hours and has written 13 checkpoints or 1 every.9.2 minutes. My setting is....'at most., every 600 seconds, so ballpark that's close to what I want it to be. A second one has 90 minutes CPU time with 12 checkpoints done and at 7.5 minutes each that's not abiding by the preference. Concur, something is not right.
[Oct 9, 2021 6:17:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

BTW Notice the completion times are significantly shorter than before. Normally my device completed 25-30 a day, it's now at over 100 and validating.
[Oct 9, 2021 6:20:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12149
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

This is probably related to the reason for the recent outage.
[Oct 9, 2021 6:33:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 875
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

OPN1 writes a single checkpoint for each job within a work unit, and I don't think it ever writes "timed" checkpoints within a job. So, if many of the individual jobs are short as at present, with relatively simple ligands (fewer atoms with few branches) the checkpoints will be close together, but if the jobs are long there may be many minutes between them.

So yes, in a way it is related to the recent issues and outage; there were work units with huge numbers of (mostly very small) jobs! It is what it is - I doubt there's much that can be done about it...

Cheers - Al.

P.S. OPNG is also a "one checkpoint per job" application and work units with large, multi-branch, ligands are much kinder to one's disks there too!

[Edit to relate to previous issues (and a typo!)...]
----------------------------------------
[Edit 2 times, last edit by alanb1951 at Oct 9, 2021 10:03:28 PM]
[Oct 9, 2021 9:59:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 259
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

An example of OPN disk activity over the last several days
Pi 3A+ running OPN - change in SSD read/writes readily apparent



(It survived some monster WUs without error using 2G of swap)
----------------------------------------
[Oct 10, 2021 6:44:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
biscotto
Cruncher
Italy
Joined: Apr 11, 2020
Post Count: 27
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

My problem was really with the amount of data written. Can somebody else check how much data do OPN1 tasks write over the span of 30min-1hr? If you are on gnu/linux a good tool would be iotop
----------------------------------------
Papa Ryzen 5 3600 / Mama Radeon RX 560

[Oct 10, 2021 5:32:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 450
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

We've been complaining about this since OPN1 was in beta!

At best, it's a known issue.
[Oct 11, 2021 5:30:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
biscotto
Cruncher
Italy
Joined: Apr 11, 2020
Post Count: 27
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

We've been complaining about this since OPN1 was in beta!
Oh, bummer. Has this been addressed by the maintainers?
----------------------------------------
Papa Ryzen 5 3600 / Mama Radeon RX 560

[Oct 11, 2021 12:32:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 875
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPN1 tasks don't respect write to disk interval

On why there's lots of I/O, and whether a reduction is possible - a programmer's viewpoint...

If you look at the slots directory for a running OPN1 (or OPNG) task you'll find a lot of files with names of the form wcg_checkpoint_??.ckp. Unsurprisingly, those files are the key to the amount of disk I/O that happens.

The majority of those files start life as copies of the receptor.??.map files generated by AutoGrid. One of the files (usually wcg_checkpoint_13.dat) appears to be an accumulation of the individual AutoDock job dialog files. The correspondence between files can be found in wcg_checkpoint.dat.

The size of the results file(s) depends on the sizes of the ligands, the size of the flexres part of the receptor and the number of jobs in a work unit. The dominant part is, of course, the number of jobs! Hopefully, the accumulated dialogs file and the results file(s) are grown by write-append rather than copy-append!

As for the copied .map files: it appears that these files get copied each time a checkpoint is taken, so that can be quite a lot of I/O activity. The sizes of the larger .map files will be about 6 to 8 times the grid size, so for the current receptor that's 1.0 to 1.4MB per file. I am not sure, but I don't think AutoDock actually alters any of those files; if that is indeed the case and the code doing the copying could be persuaded to hook up the actual .map files instead (using links or whatever...) that would considerably reduce the amount of I/O per job (and hence, per task).

Of course, even if those files are only used as input, the changes would probably require coding that doesn't follow normal BOINC practice (which might explain why it hasn't happened already if it would mitigate the problem!) And the likelihood of a fix would also depend on whether the relevant code is in the wrapper or embedded in the actual science code.

Cheers - Al.

P.S. I'd love to have access to the wrapper code for OPN1/OPNG to see how it actually works -- I used to "tune" software as a part of my work, and this sort of puzzle was within my remit...
[Oct 12, 2021 3:53:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread