core.texttools #
TextTools Module
The TextTools module provides a comprehensive set of utilities for text manipulation and processing in V. It includes functions for cleaning, parsing, formatting, and transforming text in various ways.
Features
Array Operations
to_array(r string) []string
- Converts a comma or newline separated list to an array of stringsto_array_int(r string) []int
- Converts a text list to an array of integersto_map(mapstring string, line string, delimiter_ string) map[string]string
- Intelligent mapping of a line to a map based on a template
Text Cleaning
name_clean(r string) string
- Normalizes names by removing special charactersascii_clean(r string) string
- Removes all non-ASCII charactersremove_empty_lines(text string) string
- Removes empty lines from textremove_double_lines(text string) string
- Removes consecutive empty linesremove_empty_js_blocks(text string) string
- Removes empty code blocks (...
)
Command Line Parsing
cmd_line_args_parser(text string) ![]string
- Parses command line arguments with support for quotes and escapingtext_remove_quotes(text string) string
- Removes quoted sections from textcheck_exists_outside_quotes(text string, items []string) bool
- Checks if items exist in text outside of quotes
Text Expansion
expand(txt_ string, l int, expand_with string) string
- Expands text to a specified length with a given character
Indentation
indent(text string, prefix string) string
- Adds indentation prefix to each linededent(text string) string
- Removes common leading whitespace from every line
String Validation
is_int(text string) bool
- Checks if text contains only digitsis_upper_text(text string) bool
- Checks if text contains only uppercase letters
Multiline Processing
multiline_to_single(text string) !string
- Converts multiline text to a single line with proper escaping- Handles comments, code blocks, and preserves formatting
Name/Path Processing
name_fix(name string) string
- Normalizes filenames and pathsname_fix_keepspace(name string) !string
- Like name_fix but preserves spacesname_fix_no_ext(name_ string) string
- Removes file extensionname_fix_snake_to_pascal(name string) string
- Converts snake_case to PascalCasename_fix_pascal_to_snake(name string) string
- Converts PascalCase to snake_casename_split(name string) !(string, string)
- Splits name into site and page components
Text Splitting
split_smart(t string, delimiter_ string) []string
- Intelligent string splitting that respects quotes
Tokenization
tokenize(text_ string) TokenizerResult
- Tokenizes text into meaningful partstext_token_replace(text string, tofind string, replacewith string) !string
- Replaces tokens in text
Version Parsing
version(text_ string) int
- Converts version strings to comparable integers- Example: "v0.4.36" becomes 4036
- Example: "v1.4.36" becomes 1004036
Usage Examples
Array Operations
// Convert comma-separated list to array
text := 'item1,item2,item3'
array := texttools.to_array(text)
// Result: ['item1', 'item2', 'item3']
// Smart mapping
r := texttools.to_map('name,-,-,-,-,pid,-,-,-,-,path',
'root 304 0.0 0.0 408185328 1360 ?? S 16Dec23 0:34.06 /usr/sbin/distnoted')
// Result: {'name': 'root', 'pid': '1360', 'path': '/usr/sbin/distnoted'}
Text Cleaning
// Clean name
name := texttools.name_clean('Hello@World!')
// Result: "HelloWorld"
// Remove empty lines
text := texttools.remove_empty_lines('line1\n\nline2\n\n\nline3')
// Result: "line1\nline2\nline3"
Command Line Parsing
// Parse command line with quotes
args := texttools.cmd_line_args_parser(''arg with spaces' --flag=value')
// Result: ['arg with spaces', '--flag=value']
Indentation
// Add indentation
text := texttools.indent('line1\nline2', ' ')
// Result: " line1\n line2\n"
// Remove common indentation
text := texttools.dedent(' line1\n line2')
// Result: "line1\nline2"
Name Processing
// Convert to snake case
name := texttools.name_fix_pascal_to_snake('HelloWorld')
// Result: "hello_world"
// Convert to pascal case
name := texttools.name_fix_snake_to_pascal('hello_world')
// Result: "HelloWorld"
Version Parsing
// Parse version string
ver := texttools.version('v0.4.36')
// Result: 4036
ver := texttools.version('v1.4.36')
// Result: 1004036
Error Handling
Many functions in the module return a Result type (indicated by !
in the function signature). These functions can return errors that should be handled appropriately:
// Example of error handling
name := texttools.name_fix_keepspace('some@name') or {
println('Error: ${err}')
return
}
Best Practices
- Always use appropriate error handling for functions that return Results
- Consider using
dedent()
before processing multiline text to ensure consistent formatting - When working with filenames, use the appropriate name_fix variant based on your needs
- For command line parsing, be aware of quote handling and escaping rules
- When using tokenization, consider the context and whether smart splitting is needed
Contributing
The TextTools module is part of the CrystalLib project. Contributions are welcome through pull requests.
fn action_multiline_fix #
fn action_multiline_fix(content string) string
fn ascii_clean #
fn ascii_clean(r string) string
remove all chars which are not ascii
fn check_exists_outside_quotes #
fn check_exists_outside_quotes(text string, items []string) bool
test if an element off the array exists in the text but ignore quotes
fn cmd_line_args_parser #
fn cmd_line_args_parser(text string) ![]string
convert text string to arguments \n supported but will be \n and only supported within '' or "" ' not modified, same for "
fn dedent #
fn dedent(text string) string
remove all leading spaces at same level
fn email_fix #
fn email_fix(name string) !string
fn expand #
fn expand(txt_ string, l int, expand_with string) string
texttools.expand('|', 20, ' ')
fn indent #
fn indent(text string, prefix string) string
fn is_int #
fn is_int(text string) bool
fn is_upper_text #
fn is_upper_text(text string) bool
fn multiline_to_single #
fn multiline_to_single(text string) !string
converst a multiline to a single line, keeping all relevant information empty lines removed (unless if in parameter) commented lines removed as well (starts with // and #) multiline to 'line1\nline2\n' dedent also done before putting in '...' tabs also replaced to 4x space
fn name_clean #
fn name_clean(r string) string
fn name_fix #
fn name_fix(name string) string
fn name_fix_dot_notation_to_pascal #
fn name_fix_dot_notation_to_pascal(name string) string
fn name_fix_dot_notation_to_snake_case #
fn name_fix_dot_notation_to_snake_case(name string) string
fn name_fix_keepext #
fn name_fix_keepext(name_ string) string
fn name_fix_keepspace #
fn name_fix_keepspace(name string) !string
like name_fix but _ becomes space
fn name_fix_list #
fn name_fix_list(name string) []string
fn name_fix_no_ext #
fn name_fix_no_ext(name_ string) string
remove underscores and extension
fn name_fix_no_md #
fn name_fix_no_md(name string) string
get name back keep extensions and underscores, but when end on .md then remove extension
fn name_fix_no_underscore #
fn name_fix_no_underscore(name string) string
fn name_fix_no_underscore_no_ext #
fn name_fix_no_underscore_no_ext(name_ string) string
remove underscores and extension
fn name_fix_no_underscore_token #
fn name_fix_no_underscore_token(name string) string
fn name_fix_pascal #
fn name_fix_pascal(name string) string
fn name_fix_pascal_to_snake #
fn name_fix_pascal_to_snake(name string) string
fn name_fix_snake_to_pascal #
fn name_fix_snake_to_pascal(name string) string
fn name_fix_token #
fn name_fix_token(name string) string
fn name_split #
fn name_split(name string) !(string, string)
return (sitename,pagename) sitename will be empty string if not specified with site:... or site__...
fn remove_double_lines #
fn remove_double_lines(text string) string
fn remove_empty_js_blocks #
fn remove_empty_js_blocks(text string) string
remove ??
, can be over multiple lines . also removes double lines
fn remove_empty_lines #
fn remove_empty_lines(text string) string
fn split_smart #
fn split_smart(t string, delimiter_ string) []string
split strings in intelligent ways, taking into consideration '"`
r0:=texttools.split_smart("'root' 304 0.0 0.0 408185328 1360 ?? S 16Dec23 0:34.06 /usr/sbin/distnoted\n \n")
assert ['root', '304', '0.0', '0.0', '408185328', '1360', '??', 'S', '16Dec23', '0:34.06', '/usr/sbin/distnoted']==r0
fn tel_fix #
fn tel_fix(name_ string) !string
fix string which represenst a tel nr
fn template_replace #
fn template_replace(template_ string) string
replace '^^', '@' . replace '??', '$' . replace '\t', ' ' .
fn text_remove_quotes #
fn text_remove_quotes(text string) string
remove all '..' and "..." from a text, so everything in between the quotes
fn text_token_replace #
fn text_token_replace(text string, tofind string, replacewith string) !string
fn to_array #
fn to_array(r string) []string
a comma or \n separated list gets converted to a list of strings . '..' also gets converted to without '' check also splitsmart which is more intelligent
fn to_array_int #
fn to_array_int(r string) []int
fn to_list_map #
fn to_list_map(mapstring string, txt_ string, delimiter_ string) []map[string]string
r4:=texttools.to_list_map("name,-,-,-,-,pid,-,-,-,-,path",t) assert [{'name': '_cmiodalassistants', 'pid': '1360', 'path': '/usr/sbin/distnoted'}, {'name': '_locationd', 'pid': '1344', 'path': '/usr/sbin/distnoted'}, {'name': 'root', 'pid': '7296', 'path': '/usr/libexec/storagekitd'}, {'name': '_coreaudiod', 'pid': '1344', 'path': '/usr/sbin/distnoted'}] == r4
fn to_map #
fn to_map(mapstring string, line string, delimiter_ string) map[string]string
r3:=texttools.to_map("name,-,-,-,-,pid,-,-,-,-,path", "root 304 0.0 0.0 408185328 1360 ?? S 16Dec23 0:34.06 \n \n") assert {'name': 'root', 'pid': '1360', 'path': ''} == r3
fn tokenize #
fn tokenize(text_ string) TokenizerResult
fn version #
fn version(text_ string) int
v0.4.36 becomes 4036 . v1.4.36 becomes 1004036
fn wiki_fix #
fn wiki_fix(content_ string) string
enum MultiLineStatus #
enum MultiLineStatus {
start
multiline
comment
}
struct TokenizerItem #
struct TokenizerItem {
pub mut:
toreplace string
// is the most fixed string
matchstring string
}
struct TokenizerResult #
struct TokenizerResult {
pub mut:
items []TokenizerItem
}
import regex
fn (TokenizerResult) replace #
fn (mut tr TokenizerResult) replace(text string, tofind string, replacewith string) !string
- README
- fn action_multiline_fix
- fn ascii_clean
- fn check_exists_outside_quotes
- fn cmd_line_args_parser
- fn dedent
- fn email_fix
- fn expand
- fn indent
- fn is_int
- fn is_upper_text
- fn multiline_to_single
- fn name_clean
- fn name_fix
- fn name_fix_dot_notation_to_pascal
- fn name_fix_dot_notation_to_snake_case
- fn name_fix_keepext
- fn name_fix_keepspace
- fn name_fix_list
- fn name_fix_no_ext
- fn name_fix_no_md
- fn name_fix_no_underscore
- fn name_fix_no_underscore_no_ext
- fn name_fix_no_underscore_token
- fn name_fix_pascal
- fn name_fix_pascal_to_snake
- fn name_fix_snake_to_pascal
- fn name_fix_token
- fn name_split
- fn remove_double_lines
- fn remove_empty_js_blocks
- fn remove_empty_lines
- fn split_smart
- fn tel_fix
- fn template_replace
- fn text_remove_quotes
- fn text_token_replace
- fn to_array
- fn to_array_int
- fn to_list_map
- fn to_map
- fn tokenize
- fn version
- fn wiki_fix
- enum MultiLineStatus
- struct TokenizerItem
- struct TokenizerResult