Skip to content

data.dedupestor #

DedupeStore

DedupeStore is a content-addressable key-value store with built-in deduplication. It uses blake2b-160 content hashing to identify and deduplicate data, making it ideal for storing files or data blocks where the same content might appear multiple times.

Features

  • Content-based deduplication using blake2b-160 hashing
  • Efficient storage using RadixTree for hash lookups
  • Persistent storage using OurDB
  • Maximum value size limit of 1MB
  • Fast retrieval of data using content hash
  • Automatic deduplication of identical content

Usage

import freeflowuniverse.herolib.data.dedupestor

fn main() ! {
    // Create a new dedupestore
    mut ds := dedupestor.new(
        path: 'path/to/store'
        reset: false // Set to true to reset existing data
    )!

    // Store some data
    data := 'Hello, World!'.bytes()
    hash := ds.store(data)!
    println('Stored data with hash: ${hash}')

    // Retrieve data using hash
    retrieved := ds.get(hash)!
    println('Retrieved data: ${retrieved.bytestr()}')

    // Check if data exists
    exists := ds.exists(hash)
    println('Data exists: ${exists}')

    // Attempting to store the same data again returns the same hash
    same_hash := ds.store(data)!
    assert hash == same_hash // True, data was deduplicated
}

Implementation Details

DedupeStore uses two main components for storage:

  1. RadixTree: Stores mappings from content hashes to data location IDs
  2. OurDB: Stores the actual data blocks

When storing data:1. The data is hashed using blake2b-1602. If the hash exists in the RadixTree, the existing data location is returned3. If the hash is new:

  • Data is stored in OurDB, getting a new location ID
  • Hash -> ID mapping is stored in RadixTree
  • The hash is returned

When retrieving data:1. The RadixTree is queried with the hash to get the data location ID2. The data is retrieved from OurDB using the ID

Size Limits

  • Maximum value size: 1MB
  • Attempting to store larger values will result in an error

Error Handling

The store methods return results that should be handled with V's error handling:

// Handle potential errors
if hash := ds.store(large_data) {
    // Success
    println('Stored with hash: ${hash}')
} else {
    // Error occurred
    println('Error: ${err}')
}

Testing

The module includes comprehensive tests covering:- Basic store/retrieve operations

  • Deduplication functionality
  • Size limit enforcement
  • Edge cases

Run tests with:

v test lib/data/dedupestor/

Constants #

const max_value_size = 1024 * 1024 // 1MB

fn new #

fn new(args NewArgs) !&DedupeStore

new creates a new deduplication store

struct DedupeStore #

struct DedupeStore {
mut:
	radix &radixtree.RadixTree // For storing hash -> id mappings
	data  &ourdb.OurDB         // For storing the actual data
}

DedupeStore provides a key-value store with deduplication based on content hashing

fn (DedupeStore) store #

fn (mut ds DedupeStore) store(value []u8) !string

store stores a value and returns its hash If the value already exists (same hash), returns the existing hash without storing again

fn (DedupeStore) get #

fn (mut ds DedupeStore) get(hash string) ![]u8

get retrieves a value by its hash

fn (DedupeStore) exists #

fn (mut ds DedupeStore) exists(hash string) bool

exists checks if a value with the given hash exists

struct NewArgs #

@[params]
struct NewArgs {
pub mut:
	path  string // Base path for the store
	reset bool   // Whether to reset existing data
}