file paths inherently dubious when working data. lets have hypothetical situation program called find_brca
, , data called my.genome
, both in /users/desktop/
directory.
find_brca
takes single argument, genome, runs 4 hours, , returns probability of individual developing breast cancer in lifetime. people, presented high % probability, might have both of breasts removed precaution.
obviously, in scenario, absolutely vital /users/desktop/my.genome
contains genome think does. there no do-overs. "oops used old version of file previous backup" or other technical issue not acceptable patient. how ensure analysing file think analysing?
to make matters trickier, lets assert cannot modify find_brca
itself, because didn't write it, closed source, proprietary, whatever.
you might think md5 or other cryptographic checksums might able come rescue, , while degree, can md5 file before and/or after find_brca
has run, can never know data find_brca
used (without doing serious low-level system probing dtrace/ptrace, etc).
the root of problem file paths not have 1:1 relationship actual data. in filesystem files can requested checksum - , data modified checksum modified - can ensure when feed find_brca
genome's file path 4fded1464736e77865df232cbcb4cd19
, reading correct genome.
are there filesystems work this? if wanted create such filesystem because none exists, how recommend go doing it?
i have doubts stability, hashfs looks want: http://hashfs.readthedocs.io/en/latest/
hashfs content-addressable file management system. mean? simply, hashfs manages directory files saved based on file’s hash. typical use cases kind of system ones where: files written once , never change (e.g. image storage). it’s desirable have no duplicate files (e.g. user uploads). file metadata stored elsewhere (e.g. in database).
note: not confused hashfs, student of mine did couple of years ago: http://dl.acm.org/citation.cfm?id=1849837
Comments
Post a Comment