iProvide better comment for chunking decisions - dedup - deduplicating backup program Err bitreich.org 70 hgit clone git://bitreich.org/dedup/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/dedup/ URL:git://bitreich.org/dedup/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/dedup/ bitreich.org 70 1Log /scm/dedup/log.gph bitreich.org 70 1Files /scm/dedup/files.gph bitreich.org 70 1Refs /scm/dedup/refs.gph bitreich.org 70 1Tags /scm/dedup/tag bitreich.org 70 1README /scm/dedup/file/README.gph bitreich.org 70 1LICENSE /scm/dedup/file/LICENSE.gph bitreich.org 70 i--- Err bitreich.org 70 1commit 39f907c919176a8fed990e10e95a7f7371aedf40 /scm/dedup/commit/39f907c919176a8fed990e10e95a7f7371aedf40.gph bitreich.org 70 1parent 90f2ff9fcf896060da4be75c70cd148f18be28e9 /scm/dedup/commit/90f2ff9fcf896060da4be75c70cd148f18be28e9.gph bitreich.org 70 hAuthor: z3bra URL:mailto:contactatz3bradotorg bitreich.org 70 iDate: Sun, 17 Feb 2019 13:04:03 +0100 Err bitreich.org 70 i Err bitreich.org 70 iProvide better comment for chunking decisions Err bitreich.org 70 i Err bitreich.org 70 iDiffstat: Err bitreich.org 70 i M dedup.c | 21 ++++++++++++++------- Err bitreich.org 70 i Err bitreich.org 70 i1 file changed, 14 insertions(+), 7 deletions(-) Err bitreich.org 70 i--- Err bitreich.org 70 1diff --git a/dedup.c b/dedup.c /scm/dedup/file/dedup.c.gph bitreich.org 70 i@@ -74,6 +74,11 @@ char *argv0; Err bitreich.org 70 i /* Err bitreich.org 70 i * Static table for use in buzhash algorithm. Err bitreich.org 70 i * 256 * 32 bits randomly generated unique integers Err bitreich.org 70 i+ * Err bitreich.org 70 i+ * To get better pseudo-random results, there is exactly the same number Err bitreich.org 70 i+ * of 0 and 1 spread amongst these integers. It means that there is Err bitreich.org 70 i+ * exactly 50% chance that a XOR operation would flip all the bits in Err bitreich.org 70 i+ * the hash. Err bitreich.org 70 i */ Err bitreich.org 70 i uint32_t buz[] = { Err bitreich.org 70 i 0xbc9fa594,0x30a8f827,0xced627a7,0xdb46a745,0xcfa4a9e8,0x77cccb59,0xddb66276,0x3adc532f, Err bitreich.org 70 i@@ -115,9 +120,9 @@ uint32_t Err bitreich.org 70 i buzh_init(uint8_t *buf, size_t size) Err bitreich.org 70 i { Err bitreich.org 70 i size_t i; Err bitreich.org 70 i- uint32_t fp = 0; Err bitreich.org 70 i+ uint32_t fp; Err bitreich.org 70 i Err bitreich.org 70 i- for (i = size - 1; i > 0; i--, buf++) Err bitreich.org 70 i+ for (i = size - 1, fp = 0; i > 0; i--, buf++) Err bitreich.org 70 i fp ^= ROTL(buz[*buf], i % 32); Err bitreich.org 70 i Err bitreich.org 70 i return fp ^ buz[*buf]; Err bitreich.org 70 i@@ -136,11 +141,13 @@ chunk_blk(uint8_t *buf, size_t size) Err bitreich.org 70 i uint32_t fp; Err bitreich.org 70 i Err bitreich.org 70 i /* Err bitreich.org 70 i- * Chunking blocks is decided using a rolling hash + binary pattern. Err bitreich.org 70 i- * The buzhash algorithm is used to "fingerprint" a fixed size window. Err bitreich.org 70 i- * Once the lower bits of this fingerprint are all zeros, Err bitreich.org 70 i- * the block is chunked. Err bitreich.org 70 i- * If the pattern can't be matched, then we return the buffer size. Err bitreich.org 70 i+ * To achieve better deduplication, we chunk blocks based on a Err bitreich.org 70 i+ * recurring pattern occuring on the data stream. A fixed window Err bitreich.org 70 i+ * of WINSIZ bytes is slid over the data, and a rolling hash is Err bitreich.org 70 i+ * computed for this window. Err bitreich.org 70 i+ * When the rolling hash matches a given pattern (see HASHMSK), Err bitreich.org 70 i+ * the block is chunked at the end of that window, thus making Err bitreich.org 70 i+ * WINSIZ the smallest possible block size. Err bitreich.org 70 i */ Err bitreich.org 70 i fp = buzh_init(buf, WINSIZ); Err bitreich.org 70 i for (i = 1; i < size - WINSIZ; i++) { Err bitreich.org 70 .