iDESIGN - dedup - deduplicating backup program Err bitreich.org 70 hgit clone git://bitreich.org/dedup/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/dedup/ URL:git://bitreich.org/dedup/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/dedup/ bitreich.org 70 1Log /scm/dedup/log.gph bitreich.org 70 1Files /scm/dedup/files.gph bitreich.org 70 1Refs /scm/dedup/refs.gph bitreich.org 70 1Tags /scm/dedup/tag bitreich.org 70 1README /scm/dedup/file/README.gph bitreich.org 70 1LICENSE /scm/dedup/file/LICENSE.gph bitreich.org 70 i--- Err bitreich.org 70 iDESIGN (2323B) Err bitreich.org 70 i--- Err bitreich.org 70 i 1 Design notes Err bitreich.org 70 i 2 ============ Err bitreich.org 70 i 3 Err bitreich.org 70 i 4 There are three main abstractions in the design of dedup: Err bitreich.org 70 i 5 Err bitreich.org 70 i 6 - The chunker interface Err bitreich.org 70 i 7 - The snapshot layer Err bitreich.org 70 i 8 - The block layer Err bitreich.org 70 i 9 Err bitreich.org 70 i 10 The block layer Err bitreich.org 70 i 11 --------------- Err bitreich.org 70 i 12 Err bitreich.org 70 i 13 From the outside world, the block layer is just an abstraction for Err bitreich.org 70 i 14 dealing with variable length blocks. All blocks are referenced with Err bitreich.org 70 i 15 their hash. Err bitreich.org 70 i 16 Err bitreich.org 70 i 17 The block layer is arranged into a stack of layers. From top to Err bitreich.org 70 i 18 bottom these are as follows: Err bitreich.org 70 i 19 Err bitreich.org 70 i 20 - Generic layer Err bitreich.org 70 i 21 - The compression layer Err bitreich.org 70 i 22 - The encryption layer Err bitreich.org 70 i 23 - The storage layer Err bitreich.org 70 i 24 Err bitreich.org 70 i 25 The generic layer is the one that client code interfaces with. It is Err bitreich.org 70 i 26 the top level entrypoint to the block layer. The generic layer Err bitreich.org 70 i 27 calculates the hash of the block and passes it down to the compression Err bitreich.org 70 i 28 layer. Err bitreich.org 70 i 29 Err bitreich.org 70 i 30 The compression layer will prepend a compression descriptor to the Err bitreich.org 70 i 31 block and then compress the block using snappy or lz4. It is possible Err bitreich.org 70 i 32 to disable compression in which case a special descriptor is prepended Err bitreich.org 70 i 33 and the data is passed uncompressed to the encryption layer. Err bitreich.org 70 i 34 Err bitreich.org 70 i 35 The encryption layer will prepend an encryption descriptor to the Err bitreich.org 70 i 36 block and then encrypt/authenticate the block using XChaCha20 and Err bitreich.org 70 i 37 Poly1305. It is possible to disable encryption in which case it acts Err bitreich.org 70 i 38 as a bypass with a special type of encryption descriptor. The block Err bitreich.org 70 i 39 is then passed to the storage layer. Err bitreich.org 70 i 40 Err bitreich.org 70 i 41 The storage layer will prepend a storage descriptor and append the Err bitreich.org 70 i 42 descriptor and the data to a single backing file. Err bitreich.org 70 i 43 Err bitreich.org 70 i 44 The snapshot layer Err bitreich.org 70 i 45 ------------------ Err bitreich.org 70 i 46 Err bitreich.org 70 i 47 The snapshot abstraction is currently very simplistic. A snapshot is Err bitreich.org 70 i 48 a file under $repo/archive/. The contents of the file are the Err bitreich.org 70 i 49 block hashes of the data stored in the snapshot. Err bitreich.org 70 i 50 Err bitreich.org 70 i 51 The chunker interface Err bitreich.org 70 i 52 --------------------- Err bitreich.org 70 i 53 Err bitreich.org 70 i 54 The chunker issues variable length blocks. The minimum block size is Err bitreich.org 70 i 55 512KB, the maximum block size is 8MB and the average block size is Err bitreich.org 70 i 56 2MB. These configuration parameters can be modified by editing Err bitreich.org 70 i 57 config.h but it can be tricky to tune it properly. Err bitreich.org 70 i 58 Err bitreich.org 70 i 59 The buzhash[0] rolling hash algorithm is used to fingerprint the input Err bitreich.org 70 i 60 stream. Err bitreich.org 70 i 61 Err bitreich.org 70 i 62 When encryption is enabled, a random seed is generated and stored Err bitreich.org 70 i 63 encrypted in the repository state file. The seed is XOR-ed with the Err bitreich.org 70 i 64 buzhash initial state table to mitigate against length fingerprinting Err bitreich.org 70 i 65 attacks. Err bitreich.org 70 i 66 Err bitreich.org 70 i 67 [0] http://www.serve.net/buz/Notes.1st.year/HTML/C6/rand.012.html Err bitreich.org 70 .