Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

The CheckPointing software DMTCP https://dmtcp.sourceforge.io is now available in O2.

Note

DMTCP software is NOT guaranteed to work nor support all applications and languages.

It might be possible that a given process will fail to run or restart from a saved checkpoint.

...

Code Block
languagenone
def main():
# something here
 for it in range(0,some_number_here):
     print(it)
     os.system('dmtcp_command --checkpoint')
     # do something here that takes
     # a very long time
     os.system('dmtcp_command --checkpoint')
     
if __name__ == '__main__':
    main()

Note

CAUTION:

The creation of a checkpoint is a potentially time consuming process that can also generate very large files, depending on the RAM (memory) used by the running processes.

When a checkpoint is created DMTCP will write to file all data currently loaded on RAM, therefore a job using ~100GB of RAM will create a similar size of data, which could fill up your storage quota.
Checkpoint 100 jobs using only 1GB of RAM will also be enough to fill your $HOME storage quota.

...