weixin_39687881 2020-11-20 20:14 采纳率: 0%
浏览 0

Redesign File Job Store layout

This should begin work on #2650.

I've reorganized the FileJobStore's file layout.

  • We now break things up across jobs/, stats/, files/shared, files/global, and files/for-job.
  • Files that get cleaned up when a job is destroyed are no longer under the job's directory. The jobs/ hierarchy holds only actual jobs.
  • We only create multiple levels of random subdirectories when a directory starts to get a lot of files/directories in it.
  • We sort jobs by (and thus prefix their IDs with) a filename-safe version of the jobName. This also applies to files that live in files/for-job.
  • Each file written gets its own uniquely-named directory, so the file itself could potentially have a more reasonable name.

Things I still want to do (eventually):

  • Combine files/for-job and files/global. Have just one directory for all non-shared files written by a job, with a subdirectory for the files that need to get cleaned up when the job is destroyed.
  • Stop naming files based on the call stack, and just keep their full original names in their unique subdirectories.
  • Document the file job store file layout for people who need to dig into it for a failed workflow.

该提问来源于开源项目:DataBiosphere/toil

  • 写回答

6条回答 默认 最新

  • weixin_39687881 2020-11-20 20:14
    关注

    It looks like src/toil/test/utils/utilsTest.py::UtilsTest::testGetPIDStatus has its own opinions on where shared files ought to be found in the file job store. I will need to adjust it before the tests pass.

    评论

报告相同问题?