iIncrease dedup throughput by a factor of 2 - dedup - deduplicating backup program Err bitreich.org 70 hgit clone git://bitreich.org/dedup/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/dedup/ URL:git://bitreich.org/dedup/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/dedup/ bitreich.org 70 1Log /scm/dedup/log.gph bitreich.org 70 1Files /scm/dedup/files.gph bitreich.org 70 1Refs /scm/dedup/refs.gph bitreich.org 70 1Tags /scm/dedup/tag bitreich.org 70 1README /scm/dedup/file/README.gph bitreich.org 70 1LICENSE /scm/dedup/file/LICENSE.gph bitreich.org 70 i--- Err bitreich.org 70 1commit 7f8b5b3e7b72b0d64437a87c6b412743f2ab6187 /scm/dedup/commit/7f8b5b3e7b72b0d64437a87c6b412743f2ab6187.gph bitreich.org 70 1parent ba61c65bb274657b7ea643de789db2a24ea836f8 /scm/dedup/commit/ba61c65bb274657b7ea643de789db2a24ea836f8.gph bitreich.org 70 hAuthor: sin URL:mailto:sin@2f30.org bitreich.org 70 iDate: Sun, 10 Mar 2019 09:36:05 +0000 Err bitreich.org 70 i Err bitreich.org 70 iIncrease dedup throughput by a factor of 2 Err bitreich.org 70 i Err bitreich.org 70 iCalculating the hash of the entire snapshot inside the loop slows the Err bitreich.org 70 iprocess down by 2x. This is because we hash the block twice. We hash Err bitreich.org 70 ifirst the raw uncompressed stream (which will become the snapshot Err bitreich.org 70 ihash) and then we hash the compressed block which is stored in the Err bitreich.org 70 iblock descriptor. Err bitreich.org 70 i Err bitreich.org 70 iChange the calcuation so we only hash the compressed block inside Err bitreich.org 70 idedup_chunk(). The hash of the snapshot is the hash of its block Err bitreich.org 70 ihashes. Err bitreich.org 70 i Err bitreich.org 70 iDiffstat: Err bitreich.org 70 i M dedup.c | 24 ++++++++++++++++++------ Err bitreich.org 70 i Err bitreich.org 70 i1 file changed, 18 insertions(+), 6 deletions(-) Err bitreich.org 70 i--- Err bitreich.org 70 1diff --git a/dedup.c b/dedup.c /scm/dedup/file/dedup.c.gph bitreich.org 70 i@@ -229,27 +229,39 @@ dedup(int fd, char *msg) Err bitreich.org 70 i { Err bitreich.org 70 i struct snapshot *snap; Err bitreich.org 70 i struct chunker *chunker; Err bitreich.org 70 i- SHA256_CTX ctx; Err bitreich.org 70 i- ssize_t n; Err bitreich.org 70 i Err bitreich.org 70 i snap = alloc_snap(); Err bitreich.org 70 i chunker = alloc_chunker(fd, BLKSIZE_MIN, BLKSIZE_MAX, Err bitreich.org 70 i HASHMASK_BITS, WINSIZE); Err bitreich.org 70 i Err bitreich.org 70 i- SHA256_Init(&ctx); Err bitreich.org 70 i- while ((n = fill_chunker(chunker)) > 0) { Err bitreich.org 70 i+ while (fill_chunker(chunker) > 0) { Err bitreich.org 70 i uint8_t *chunkp; Err bitreich.org 70 i size_t chunk_size; Err bitreich.org 70 i Err bitreich.org 70 i chunkp = get_chunk(chunker, &chunk_size); Err bitreich.org 70 i- SHA256_Update(&ctx, chunkp, chunk_size); Err bitreich.org 70 i snap = grow_snap(snap, snap->nr_blk_descs + 1); Err bitreich.org 70 i dedup_chunk(snap, chunkp, chunk_size); Err bitreich.org 70 i drain_chunker(chunker); Err bitreich.org 70 i } Err bitreich.org 70 i- SHA256_Final(snap->md, &ctx); Err bitreich.org 70 i Err bitreich.org 70 i if (snap->nr_blk_descs > 0) { Err bitreich.org 70 i+ SHA256_CTX ctx; Err bitreich.org 70 i+ uint64_t i; Err bitreich.org 70 i+ Err bitreich.org 70 i+ /* Err bitreich.org 70 i+ * The snapshot hash is calculated over the Err bitreich.org 70 i+ * hash of its block descriptors. Err bitreich.org 70 i+ */ Err bitreich.org 70 i+ SHA256_Init(&ctx); Err bitreich.org 70 i+ for (i = 0; i < snap->nr_blk_descs; i++) { Err bitreich.org 70 i+ struct blk_desc *blk_desc; Err bitreich.org 70 i+ Err bitreich.org 70 i+ blk_desc = &snap->blk_desc[i]; Err bitreich.org 70 i+ SHA256_Update(&ctx, blk_desc->md, Err bitreich.org 70 i+ sizeof(blk_desc->md)); Err bitreich.org 70 i+ } Err bitreich.org 70 i+ SHA256_Final(snap->md, &ctx); Err bitreich.org 70 i+ Err bitreich.org 70 i if (msg != NULL) { Err bitreich.org 70 i size_t size; Err bitreich.org 70 i Err bitreich.org 70 .