data.dedupestor #
DedupeStore
DedupeStore is a content-addressable key-value store with built-in deduplication. It uses blake2b-160 content hashing to identify and deduplicate data, making it ideal for storing files or data blocks where the same content might appear multiple times.
Features
- Content-based deduplication using blake2b-160 hashing
- Efficient storage using RadixTree for hash lookups
- Persistent storage using OurDB
- Maximum value size limit of 1MB
- Fast retrieval of data using content hash
- Automatic deduplication of identical content
Usage
import freeflowuniverse.herolib.data.dedupestor
fn main() ! {
// Create a new dedupestore
mut ds := dedupestor.new(
path: 'path/to/store'
reset: false // Set to true to reset existing data
)!
// Store some data
data := 'Hello, World!'.bytes()
hash := ds.store(data)!
println('Stored data with hash: ${hash}')
// Retrieve data using hash
retrieved := ds.get(hash)!
println('Retrieved data: ${retrieved.bytestr()}')
// Check if data exists
exists := ds.exists(hash)
println('Data exists: ${exists}')
// Attempting to store the same data again returns the same hash
same_hash := ds.store(data)!
assert hash == same_hash // True, data was deduplicated
}
Implementation Details
DedupeStore uses two main components for storage:
- RadixTree: Stores mappings from content hashes to data location IDs
- OurDB: Stores the actual data blocks
When storing data:1. The data is hashed using blake2b-1602. If the hash exists in the RadixTree, the existing data location is returned3. If the hash is new:
- Data is stored in OurDB, getting a new location ID
- Hash -> ID mapping is stored in RadixTree
- The hash is returned
When retrieving data:1. The RadixTree is queried with the hash to get the data location ID2. The data is retrieved from OurDB using the ID
Size Limits
- Maximum value size: 1MB
- Attempting to store larger values will result in an error
Error Handling
The store methods return results that should be handled with V's error handling:
// Handle potential errors
if hash := ds.store(large_data) {
// Success
println('Stored with hash: ${hash}')
} else {
// Error occurred
println('Error: ${err}')
}
Testing
The module includes comprehensive tests covering:- Basic store/retrieve operations
- Deduplication functionality
- Size limit enforcement
- Edge cases
Run tests with:
v test lib/data/dedupestor/
Constants #
const max_value_size = 1024 * 1024 // 1MB
fn new #
fn new(args NewArgs) !&DedupeStore
new creates a new deduplication store
struct DedupeStore #
struct DedupeStore {
mut:
radix &radixtree.RadixTree // For storing hash -> id mappings
data &ourdb.OurDB // For storing the actual data
}
DedupeStore provides a key-value store with deduplication based on content hashing
fn (DedupeStore) store #
fn (mut ds DedupeStore) store(value []u8) !string
store stores a value and returns its hash If the value already exists (same hash), returns the existing hash without storing again
fn (DedupeStore) get #
fn (mut ds DedupeStore) get(hash string) ![]u8
get retrieves a value by its hash
fn (DedupeStore) exists #
fn (mut ds DedupeStore) exists(hash string) bool
exists checks if a value with the given hash exists
struct NewArgs #
struct NewArgs {
pub mut:
path string // Base path for the store
reset bool // Whether to reset existing data
}