Module sz
Available on crate feature
dep_stringzilla
only.Expand description
The sz
module provides a collection of string searching and manipulation functionality,
designed for high efficiency and compatibility with no_std environments. This module offers
various utilities for byte string manipulation, including search, reverse search, and
edit-distance calculations, suitable for a wide range of applications from basic string
processing to complex text analysis tasks.
Functionsยง
- Computes the Needleman-Wunsch alignment score for two strings. This function is particularly used in bioinformatics for sequence alignment but is also applicable in other domains requiring detailed comparison between two strings, including gap and substitution penalties.
- Computes the Levenshtein edit distance between two strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking, DNA sequence analysis.
- Computes the Levenshtein edit distance between two strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking, DNA sequence analysis.
- Computes the Levenshtein edit distance between two UTF8 strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking.
- Computes the Levenshtein edit distance between two UTF8 strings, using the Wagner-Fisher algorithm. This measure is widely used in applications like spell-checking.
- Locates the first matching substring within
haystack
that equalsneedle
. This function is similar to thememmem()
function in LibC, but, unlikestrstr()
, it requires the length of both haystack and needle to be known beforehand. - Finds the index of the first character in
haystack
that is also present inneedles
. This function is particularly useful for parsing and tokenization tasks where a set of delimiter characters is used. - Finds the index of the first character in
haystack
that is not present inneedles
. This function is useful for skipping over a known set of characters and finding the first character that does not belong to that set. - Computes the Hamming edit distance between two strings, counting the number of substituted characters. Difference in length is added to the result as well.
- Computes the Hamming edit distance between two strings, counting the number of substituted characters. Difference in length is added to the result as well.
- Computes the Hamming edit distance between two UTF8 strings, counting the number of substituted variable-length characters. Difference in length is added to the result as well.
- Computes the Hamming edit distance between two UTF8 strings, counting the number of substituted variable-length characters. Difference in length is added to the result as well.
- Randomizes the contents of a given byte slice
text
using characters from a specifiedalphabet
. This function mutatestext
in place, replacing each byte with a random one fromalphabet
. It is designed for situations where you need to generate random strings or data sequences based on a specific set of characters, such as generating random DNA sequences or testing inputs. - Locates the last matching substring within
haystack
that equalsneedle
. This function is useful for finding the most recent or last occurrence of a pattern within a byte slice. - Finds the index of the last character in
haystack
that is also present inneedles
. This can be used to find the last occurrence of any character from a specified set, useful in parsing scenarios such as finding the last delimiter in a string. - Finds the index of the last character in
haystack
that is not present inneedles
. Useful for text processing tasks such as trimming trailing characters that belong to a specified set. - Generates a default substitution matrix for use with the Needleman-Wunsch alignment algorithm. This matrix is initialized such that diagonal entries (representing matching characters) are zero, and off-diagonal entries (representing mismatches) are -1. This setup effectively produces distances equal to the negative Levenshtein edit distance, suitable for basic sequence alignment tasks where all mismatches are penalized equally and there are no rewards for matches.