This should begin work on #2650.
I've reorganized the FileJobStore's file layout.
- We now break things up across
jobs/
,stats/
,files/shared
,files/global
, andfiles/for-job
. - Files that get cleaned up when a job is destroyed are no longer under the job's directory. The
jobs/
hierarchy holds only actual jobs. - We only create multiple levels of random subdirectories when a directory starts to get a lot of files/directories in it.
- We sort jobs by (and thus prefix their IDs with) a filename-safe version of the
jobName
. This also applies to files that live infiles/for-job
. - Each file written gets its own uniquely-named directory, so the file itself could potentially have a more reasonable name.
Things I still want to do (eventually):
- Combine
files/for-job
andfiles/global
. Have just one directory for all non-shared files written by a job, with a subdirectory for the files that need to get cleaned up when the job is destroyed. - Stop naming files based on the call stack, and just keep their full original names in their unique subdirectories.
- Document the file job store file layout for people who need to dig into it for a failed workflow.
该提问来源于开源项目:DataBiosphere/toil