core.texttools #

TextTools Module

The TextTools module provides a comprehensive set of utilities for text manipulation and processing in V. It includes functions for cleaning, parsing, formatting, and transforming text in various ways.

Features

Array Operations

to_array(r string) []string - Converts a comma or newline separated list to an array of strings
to_array_int(r string) []int - Converts a text list to an array of integers
to_map(mapstring string, line string, delimiter_ string) map[string]string - Intelligent mapping of a line to a map based on a template

Text Cleaning

name_clean(r string) string - Normalizes names by removing special characters
ascii_clean(r string) string - Removes all non-ASCII characters
remove_empty_lines(text string) string - Removes empty lines from text
remove_double_lines(text string) string - Removes consecutive empty lines
remove_empty_js_blocks(text string) string - Removes empty code blocks (...)

Command Line Parsing

cmd_line_args_parser(text string) ![]string - Parses command line arguments with support for quotes and escaping
text_remove_quotes(text string) string - Removes quoted sections from text
check_exists_outside_quotes(text string, items []string) bool - Checks if items exist in text outside of quotes

Text Expansion

expand(txt_ string, l int, expand_with string) string - Expands text to a specified length with a given character

Indentation

indent(text string, prefix string) string - Adds indentation prefix to each line
dedent(text string) string - Removes common leading whitespace from every line

String Validation

is_int(text string) bool - Checks if text contains only digits
is_upper_text(text string) bool - Checks if text contains only uppercase letters

Multiline Processing

multiline_to_single(text string) !string - Converts multiline text to a single line with proper escaping
Handles comments, code blocks, and preserves formatting

Name/Path Processing

name_fix(name string) string - Normalizes filenames and paths
name_fix_keepspace(name string) !string - Like name_fix but preserves spaces
name_fix_no_ext(name_ string) string - Removes file extension
name_fix_snake_to_pascal(name string) string - Converts snake_case to PascalCase
name_fix_pascal_to_snake(name string) string - Converts PascalCase to snake_case
name_split(name string) !(string, string) - Splits name into site and page components

Text Splitting

split_smart(t string, delimiter_ string) []string - Intelligent string splitting that respects quotes

Tokenization

tokenize(text_ string) TokenizerResult - Tokenizes text into meaningful parts
text_token_replace(text string, tofind string, replacewith string) !string - Replaces tokens in text

Version Parsing

version(text_ string) int - Converts version strings to comparable integers
Example: "v0.4.36" becomes 4036
Example: "v1.4.36" becomes 1004036

Usage Examples

Array Operations

// Convert comma-separated list to array
text := 'item1,item2,item3'
array := texttools.to_array(text)
// Result: ['item1', 'item2', 'item3']

// Smart mapping
r := texttools.to_map('name,-,-,-,-,pid,-,-,-,-,path',
    'root   304   0.0  0.0 408185328   1360   ??  S    16Dec23   0:34.06 /usr/sbin/distnoted')
// Result: {'name': 'root', 'pid': '1360', 'path': '/usr/sbin/distnoted'}

Text Cleaning

// Clean name
name := texttools.name_clean('Hello@World!')
// Result: "HelloWorld"

// Remove empty lines
text := texttools.remove_empty_lines('line1\n\nline2\n\n\nline3')
// Result: "line1\nline2\nline3"

Command Line Parsing

// Parse command line with quotes
args := texttools.cmd_line_args_parser(''arg with spaces' --flag=value')
// Result: ['arg with spaces', '--flag=value']

Indentation

// Add indentation
text := texttools.indent('line1\nline2', '  ')
// Result: "  line1\n  line2\n"

// Remove common indentation
text := texttools.dedent('    line1\n    line2')
// Result: "line1\nline2"

Name Processing

// Convert to snake case
name := texttools.name_fix_pascal_to_snake('HelloWorld')
// Result: "hello_world"

// Convert to pascal case
name := texttools.name_fix_snake_to_pascal('hello_world')
// Result: "HelloWorld"

Version Parsing

// Parse version string
ver := texttools.version('v0.4.36')
// Result: 4036

ver := texttools.version('v1.4.36')
// Result: 1004036

Error Handling

Many functions in the module return a Result type (indicated by ! in the function signature). These functions can return errors that should be handled appropriately:

// Example of error handling
name := texttools.name_fix_keepspace('some@name') or {
    println('Error: ${err}')
    return
}

Best Practices

Always use appropriate error handling for functions that return Results
Consider using dedent() before processing multiline text to ensure consistent formatting
When working with filenames, use the appropriate name_fix variant based on your needs
For command line parsing, be aware of quote handling and escaping rules
When using tokenization, consider the context and whether smart splitting is needed

Contributing

The TextTools module is part of the CrystalLib project. Contributions are welcome through pull requests.

fn action_multiline_fix #

fn action_multiline_fix(content string) string

fn ascii_clean #

fn ascii_clean(r string) string

remove all chars which are not ascii

fn check_exists_outside_quotes #

fn check_exists_outside_quotes(text string, items []string) bool

test if an element off the array exists in the text but ignore quotes

fn cmd_line_args_parser #

fn cmd_line_args_parser(text string) ![]string

convert text string to arguments \n supported but will be \n and only supported within '' or "" ' not modified, same for "

fn dedent #

fn dedent(text string) string

remove all leading spaces at same level

fn email_fix #

fn email_fix(name string) !string

fn expand #

fn expand(txt_ string, l int, expand_with string) string

texttools.expand('|', 20, ' ')

fn indent #

fn indent(text string, prefix string) string

fn is_int #

fn is_int(text string) bool

fn is_upper_text #

fn is_upper_text(text string) bool

fn multiline_to_single #

fn multiline_to_single(text string) !string

converst a multiline to a single line, keeping all relevant information empty lines removed (unless if in parameter) commented lines removed as well (starts with // and #) multiline to 'line1\nline2\n' dedent also done before putting in '...' tabs also replaced to 4x space

fn name_clean #

fn name_clean(r string) string

fn name_fix #

fn name_fix(name string) string

fn name_fix_dot_notation_to_pascal #

fn name_fix_dot_notation_to_pascal(name string) string

fn name_fix_dot_notation_to_snake_case #

fn name_fix_dot_notation_to_snake_case(name string) string

fn name_fix_keepext #

fn name_fix_keepext(name_ string) string

fn name_fix_keepspace #

fn name_fix_keepspace(name string) !string

like name_fix but _ becomes space

fn name_fix_list #

fn name_fix_list(name string) []string

fn name_fix_no_ext #

fn name_fix_no_ext(name_ string) string

remove underscores and extension

fn name_fix_no_md #

fn name_fix_no_md(name string) string

get name back keep extensions and underscores, but when end on .md then remove extension

fn name_fix_no_underscore #

fn name_fix_no_underscore(name string) string

fn name_fix_no_underscore_no_ext #

fn name_fix_no_underscore_no_ext(name_ string) string

remove underscores and extension

fn name_fix_no_underscore_token #

fn name_fix_no_underscore_token(name string) string

fn name_fix_pascal #

fn name_fix_pascal(name string) string

fn name_fix_pascal_to_snake #

fn name_fix_pascal_to_snake(name string) string

fn name_fix_snake_to_pascal #

fn name_fix_snake_to_pascal(name string) string

fn name_fix_token #

fn name_fix_token(name string) string

fn name_split #

fn name_split(name string) !(string, string)

return (sitename,pagename) sitename will be empty string if not specified with site:... or site__...

fn remove_double_lines #

fn remove_double_lines(text string) string

fn remove_empty_js_blocks #

fn remove_empty_js_blocks(text string) string

remove ?? , can be over multiple lines . also removes double lines

fn remove_empty_lines #

fn remove_empty_lines(text string) string

https://en.wikipedia.org/wiki/Unicode#Standardized_subsets

fn split_smart #

fn split_smart(t string, delimiter_ string) []string

split strings in intelligent ways, taking into consideration '"`

r0:=texttools.split_smart("'root'   304   0.0  0.0 408185328   1360   ??  S    16Dec23   0:34.06 /usr/sbin/distnoted\n \n")
assert ['root', '304', '0.0', '0.0', '408185328', '1360', '??', 'S', '16Dec23', '0:34.06', '/usr/sbin/distnoted']==r0

fn tel_fix #

fn tel_fix(name_ string) !string

fix string which represenst a tel nr

fn template_replace #

fn template_replace(template_ string) string

replace '^^', '@' . replace '??', '$' . replace '\t', ' ' .

fn text_remove_quotes #

fn text_remove_quotes(text string) string

remove all '..' and "..." from a text, so everything in between the quotes

fn text_token_replace #

fn text_token_replace(text string, tofind string, replacewith string) !string

fn to_array #

fn to_array(r string) []string

a comma or \n separated list gets converted to a list of strings . '..' also gets converted to without '' check also splitsmart which is more intelligent

fn to_array_int #

fn to_array_int(r string) []int

fn to_list_map #

fn to_list_map(mapstring string, txt_ string, delimiter_ string) []map[string]string

r4:=texttools.to_list_map("name,-,-,-,-,pid,-,-,-,-,path",t) assert [{'name': '_cmiodalassistants', 'pid': '1360', 'path': '/usr/sbin/distnoted'}, {'name': '_locationd', 'pid': '1344', 'path': '/usr/sbin/distnoted'}, {'name': 'root', 'pid': '7296', 'path': '/usr/libexec/storagekitd'}, {'name': '_coreaudiod', 'pid': '1344', 'path': '/usr/sbin/distnoted'}] == r4

fn to_map #

fn to_map(mapstring string, line string, delimiter_ string) map[string]string

r3:=texttools.to_map("name,-,-,-,-,pid,-,-,-,-,path", "root 304 0.0 0.0 408185328 1360 ?? S 16Dec23 0:34.06 \n \n") assert {'name': 'root', 'pid': '1360', 'path': ''} == r3

fn tokenize #

fn tokenize(text_ string) TokenizerResult

fn version #

fn version(text_ string) int

v0.4.36 becomes 4036 . v1.4.36 becomes 1004036

fn wiki_fix #

fn wiki_fix(content_ string) string

enum MultiLineStatus #

enum MultiLineStatus {
	start
	multiline
	comment
}

struct TokenizerItem #

struct TokenizerItem {
pub mut:
	toreplace string
	// is the most fixed string
	matchstring string
}

struct TokenizerResult #

struct TokenizerResult {
pub mut:
	items []TokenizerItem
}

import regex

fn (TokenizerResult) replace #

fn (mut tr TokenizerResult) replace(text string, tofind string, replacewith string) !string

README
fn action_multiline_fix
fn ascii_clean
fn check_exists_outside_quotes
fn cmd_line_args_parser
fn dedent
fn email_fix
fn expand
fn indent
fn is_int
fn is_upper_text
fn multiline_to_single
fn name_clean
fn name_fix
fn name_fix_dot_notation_to_pascal
fn name_fix_dot_notation_to_snake_case
fn name_fix_keepext
fn name_fix_keepspace
fn name_fix_list
fn name_fix_no_ext
fn name_fix_no_md
fn name_fix_no_underscore
fn name_fix_no_underscore_no_ext
fn name_fix_no_underscore_token
fn name_fix_pascal
fn name_fix_pascal_to_snake
fn name_fix_snake_to_pascal
fn name_fix_token
fn name_split
fn remove_double_lines
fn remove_empty_js_blocks
fn remove_empty_lines
fn split_smart
fn tel_fix
fn template_replace
fn text_remove_quotes
fn text_token_replace
fn to_array
fn to_array_int
fn to_list_map
fn to_map
fn tokenize
fn version
fn wiki_fix
enum MultiLineStatus
struct TokenizerItem
struct TokenizerResult
- fn replace