data.tst #
Ternary Search Tree (TST) Implementation Plan
STILL BROKEN, CAN'T FIND YET
Overview
A Ternary Search Tree (TST) is a specialized tree structure where each node has three children:- Left: For characters less than the current node's character
- Middle: For characters equal to the current node's character (leads to the next character in the sequence)
- Right: For characters greater than the current node's character
In this implementation, we'll focus on the core TST functionality without any balancing mechanisms. Nodes will be inserted in their natural position based on character comparisons.
Data Structures
classDiagram
class Node {
+char character
+bool is_end_of_string
+[]u8 value
+u32 left_id
+u32 middle_id
+u32 right_id
}
class TST {
+&ourdb.OurDB db
+u32 root_id
+set(key string, value []u8)
+get(key string) []u8
+delete(key string)
+list(prefix string) []string
+getall(prefix string) [][]u8
}
TST --> Node : contains
Core Components
- Node Structure:
character
: The character stored at this nodeis_end_of_string
: Flag indicating if this node represents the end of a keyvalue
: The value associated with the key (if this node is the end of a key)left_id
,middle_id
,right_id
: Database IDs for the three children
- TST Structure:
db
: Reference to the ourdb database for persistenceroot_id
: Database ID of the root node
- Serialization/Deserialization:
- Use the same encoder/decoder as the radix tree
- Serialize node properties efficiently
Key Operations
1. Insertion
flowchart TD
A[Start Insertion] --> B{Is Tree Empty?}
B -->|Yes| C[Create Root Node]
B -->|No| D[Start at Root]
D --> E{Current Char vs Node Char}
E -->|Less Than| F{Left Child Exists?}
F -->|Yes| G[Go to Left Child]
F -->|No| H[Create Left Child]
E -->|Equal To| I{End of Key?}
I -->|Yes| J[Set End of String Flag]
I -->|No| K{Middle Child Exists?}
K -->|Yes| L[Go to Middle Child]
K -->|No| M[Create Middle Child]
E -->|Greater Than| N{Right Child Exists?}
N -->|Yes| O[Go to Right Child]
N -->|No| P[Create Right Child]
G --> E
L --> Q[Move to Next Char]
Q --> E
O --> E
H --> R[Insertion Complete]
J --> R
M --> Q
P --> R
C --> R
The insertion algorithm will:1. Navigate the tree based on character comparisons2. Create new nodes as needed3. Set the end-of-string flag when the entire key is inserted
2. Search
flowchart TD
A[Start Search] --> B{Is Tree Empty?}
B -->|Yes| C[Return Not Found]
B -->|No| D[Start at Root]
D --> E{Current Char vs Node Char}
E -->|Less Than| F{Left Child Exists?}
F -->|Yes| G[Go to Left Child]
F -->|No| C
E -->|Equal To| H{End of Key?}
H -->|Yes| I{Is End of String?}
I -->|Yes| J[Return Value]
I -->|No| C
H -->|No| K{Middle Child Exists?}
K -->|Yes| L[Go to Middle Child]
K -->|No| C
E -->|Greater Than| M{Right Child Exists?}
M -->|Yes| N[Go to Right Child]
M -->|No| C
G --> E
L --> O[Move to Next Char]
O --> E
N --> E
3. Deletion
flowchart TD
A[Start Deletion] --> B{Is Tree Empty?}
B -->|Yes| C[Return Not Found]
B -->|No| D[Find Node]
D --> E{Node Found?}
E -->|No| C
E -->|Yes| F{Is End of String?}
F -->|No| C
F -->|Yes| G{Has Children?}
G -->|Yes| H[Mark Not End of String]
G -->|No| I[Remove Node]
H --> J[Deletion Complete]
I --> J
For deletion, we'll use a simple approach:1. If the node has children, just mark it as not end-of-string2. If the node has no children, remove it from its parent3. No rebalancing is performed
4. Prefix Search
flowchart TD
A[Start Prefix Search] --> B{Is Tree Empty?}
B -->|Yes| C[Return Empty List]
B -->|No| D[Navigate to Prefix Node]
D --> E{Prefix Found?}
E -->|No| C
E -->|Yes| F[Collect All Keys from Subtree]
F --> G[Return Collected Keys]
The prefix search will:1. Navigate to the node representing the end of the prefix2. Collect all keys in the subtree rooted at that node3. Return the collected keys
5. Serialization/Deserialization
flowchart TD
A[Serialize Node] --> B[Add Version Byte]
B --> C[Add Character]
C --> D[Add is_end_of_string Flag]
D --> E[Add Value if End of String]
E --> F[Add Child IDs]
F --> G[Return Serialized Data]
H[Deserialize Node] --> I[Read Version Byte]
I --> J[Read Character]
J --> K[Read is_end_of_string Flag]
K --> L[Read Value if End of String]
L --> M[Read Child IDs]
M --> N[Return Node Object]
Implementation Steps
- Create Basic Structure (tst.v):
- Define the Node and TST structures
- Implement constructor function
- Implement Serialization (serialize.v):
- Create serialize_node and deserialize_node functions
- Ensure compatibility with the encoder
- Implement Core Operations:
- set (insert)
- get (search)
- delete
- list (prefix search)
- getall (get all values with prefix)
- Create Test Files:
- Basic functionality tests
- Prefix search tests
- Serialization tests
- Performance tests for various dataset sizes
- Add Documentation:
- Create README.md with usage examples
- Add inline documentation
File Structure
tst/
├── tst.v # Main implementation
├── serialize.v # Serialization functions
├── tst_test.v # Basic tests
├── prefix_test.v # Prefix search tests
├── serialize_test.v # Serialization tests
└── README.md # Documentation
Performance Considerations
- Unbalanced Tree Characteristics:
- Performance may degrade over time as the tree becomes unbalanced
- Worst-case time complexity for operations could approach O(n) instead of O(log n)
- Best for datasets where keys are inserted in a somewhat random order
- Optimizations:
- Efficient node serialization to minimize storage requirements
- Careful memory management during traversal operations
- Optimized string handling for prefix operations
- Database Interaction:
- Minimize database reads/writes
- Only create new nodes when necessary
- Large Dataset Handling:
- Efficient prefix search algorithm
- Optimize node structure for millions of entries
fn namefix #
fn namefix(s string) string
namefix normalizes a string for consistent key handling- removes leading/trailing whitespace
- converts to lowercase
- replaces special characters with standard ones
fn new #
fn new(args NewArgs) !TST
Creates a new ternary search tree with the specified database path
struct NewArgs #
struct NewArgs {
pub mut:
path string
reset bool
}
struct TST #
struct TST {
mut:
db &ourdb.OurDB // Database for persistent storage
root_id u32 // Database ID of the root node
}
TST represents a ternary search tree data structure
fn (TST) delete #
fn (mut self TST) delete(key string) !
Deletes a key from the tree
fn (TST) get #
fn (mut self TST) get(key string) ![]u8
Gets a value by key from the tree
fn (TST) getall #
fn (mut self TST) getall(prefix string) ![][]u8
Gets all values for keys with a given prefix
fn (TST) list #
fn (mut self TST) list(prefix string) ![]string
Lists all keys with a given prefix
fn (TST) set #
fn (mut self TST) set(key string, value []u8) !
Sets a key-value pair in the tree