Module sz

Help

Available on crate feature dep_stringzilla only.

Expand description

The sz module provides a collection of string searching and manipulation functionality, designed for high efficiency and compatibility with no_std environments. This module offers various utilities for byte string manipulation, including search, reverse search, and edit-distance calculations, suitable for a wide range of applications from basic string processing to complex text analysis tasks.

Functions§

alignment_score: Computes the Needleman-Wunsch alignment score for two strings. This function is particularly used in bioinformatics for sequence alignment but is also applicable in other domains requiring detailed comparison between two strings, including gap and substitution penalties.
edit_distance: Computes the Levenshtein edit distance between two strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking, DNA sequence analysis.
edit_distance_bounded: Computes the Levenshtein edit distance between two strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking, DNA sequence analysis.
edit_distance_utf8: Computes the Levenshtein edit distance between two UTF8 strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking.
edit_distance_utf8_bounded: Computes the Levenshtein edit distance between two UTF8 strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking.
find: Locates the first matching substring within haystack that equals needle. This function is similar to the memmem() function in LibC, but, unlike strstr(), it requires the length of both haystack and needle to be known beforehand.
find_char_from: Finds the index of the first character in haystack that is also present in needles. This function is particularly useful for parsing and tokenization tasks where a set of delimiter characters is used.
find_char_not_from: Finds the index of the first character in haystack that is not present in needles. This function is useful for skipping over a known set of characters and finding the first character that does not belong to that set.
hamming_distance: Computes the Hamming edit distance between two strings, counting the number of substituted characters. Difference in length is added to the result as well.
hamming_distance_bounded: Computes the Hamming edit distance between two strings, counting the number of substituted characters. Difference in length is added to the result as well.
hamming_distance_utf8: Computes the Hamming edit distance between two UTF8 strings, counting the number of substituted variable-length characters. Difference in length is added to the result as well.
hamming_distance_utf8_bounded: Computes the Hamming edit distance between two UTF8 strings, counting the number of substituted variable-length characters. Difference in length is added to the result as well.
randomize: Randomizes the contents of a given byte slice text using characters from a specified alphabet. This function mutates text in place, replacing each byte with a random one from alphabet. It is designed for situations where you need to generate random strings or data sequences based on a specific set of characters, such as generating random DNA sequences or testing inputs.
rfind: Locates the last matching substring within haystack that equals needle. This function is useful for finding the most recent or last occurrence of a pattern within a byte slice.
rfind_char_from: Finds the index of the last character in haystack that is also present in needles. This can be used to find the last occurrence of any character from a specified set, useful in parsing scenarios such as finding the last delimiter in a string.
rfind_char_not_from: Finds the index of the last character in haystack that is not present in needles. Useful for text processing tasks such as trimming trailing characters that belong to a specified set.
unary_substitution_costs: Generates a default substitution matrix for use with the Needleman-Wunsch alignment algorithm. This matrix is initialized such that diagonal entries (representing matching characters) are zero, and off-diagonal entries (representing mismatches) are -1. This setup effectively produces distances equal to the negative Levenshtein edit distance, suitable for basic sequence alignment tasks where all mismatches are penalized equally and there are no rewards for matches.

Module szCopy item path

Functions§

Module sz