Struct RegexBuilder
pub struct RegexBuilder { /* private fields */ }
dep_regex_lite
only.Expand description
A configurable builder for a Regex
.
This builder can be used to programmatically set flags such as i
(case
insensitive) and x
(for verbose mode). This builder can also be used to
configure things like a size limit on the compiled regular expression.
Implementations§
§impl RegexBuilder
impl RegexBuilder
pub fn new(pattern: &str) -> RegexBuilder
pub fn new(pattern: &str) -> RegexBuilder
Create a new builder with a default configuration for the given pattern.
If the pattern is invalid or exceeds the configured size limits, then
an error will be returned when RegexBuilder::build
is called.
pub fn build(&self) -> Result<Regex, Error> ⓘ
pub fn build(&self) -> Result<Regex, Error> ⓘ
Compiles the pattern given to RegexBuilder::new
with the
configuration set on this builder.
If the pattern isn’t a valid regex or if a configured size limit was exceeded, then an error is returned.
pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder
pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder
This configures whether to enable ASCII case insensitive matching for the entire pattern.
This setting can also be configured using the inline flag i
in the pattern. For example, (?i:foo)
matches foo
case
insensitively while (?-i:foo)
matches foo
case sensitively.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"foo(?-i:bar)quux")
.case_insensitive(true)
.build()
.unwrap();
assert!(re.is_match("FoObarQuUx"));
// Even though case insensitive matching is enabled in the builder,
// it can be locally disabled within the pattern. In this case,
// `bar` is matched case sensitively.
assert!(!re.is_match("fooBARquux"));
pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder
pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder
This configures multi-line mode for the entire pattern.
Enabling multi-line mode changes the behavior of the ^
and $
anchor
assertions. Instead of only matching at the beginning and end of a
haystack, respectively, multi-line mode causes them to match at the
beginning and end of a line in addition to the beginning and end of
a haystack. More precisely, ^
will match at the position immediately
following a \n
and $
will match at the position immediately
preceding a \n
.
The behavior of this option is impacted by the RegexBuilder::crlf
setting. Namely, CRLF mode changes the line terminator to be either
\r
or \n
, but never at the position between a \r
and \
n.
This setting can also be configured using the inline flag m
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"^foo$")
.multi_line(true)
.build()
.unwrap();
assert_eq!(Some(1..4), re.find("\nfoo\n").map(|m| m.range()));
pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder
pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder
This configures dot-matches-new-line mode for the entire pattern.
Perhaps surprisingly, the default behavior for .
is not to match
any character, but rather, to match any character except for the line
terminator (which is \n
by default). When this mode is enabled, the
behavior changes such that .
truly matches any character.
This setting can also be configured using the inline flag s
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"foo.bar")
.dot_matches_new_line(true)
.build()
.unwrap();
let hay = "foo\nbar";
assert_eq!(Some("foo\nbar"), re.find(hay).map(|m| m.as_str()));
pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder
pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder
This configures CRLF mode for the entire pattern.
When CRLF mode is enabled, both \r
(“carriage return” or CR for
short) and \n
(“line feed” or LF for short) are treated as line
terminators. This results in the following:
- Unless dot-matches-new-line mode is enabled,
.
will now match any character except for\n
and\r
. - When multi-line mode is enabled,
^
will match immediately following a\n
or a\r
. Similarly,$
will match immediately preceding a\n
or a\r
. Neither^
nor$
will ever match between\r
and\n
.
This setting can also be configured using the inline flag R
in
the pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"^foo$")
.multi_line(true)
.crlf(true)
.build()
.unwrap();
let hay = "\r\nfoo\r\n";
// If CRLF mode weren't enabled here, then '$' wouldn't match
// immediately after 'foo', and thus no match would be found.
assert_eq!(Some("foo"), re.find(hay).map(|m| m.as_str()));
This example demonstrates that ^
will never match at a position
between \r
and \n
. ($
will similarly not match between a \r
and a \n
.)
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"^")
.multi_line(true)
.crlf(true)
.build()
.unwrap();
let hay = "\r\n\r\n";
let ranges: Vec<_> = re.find_iter(hay).map(|m| m.range()).collect();
assert_eq!(ranges, vec![0..0, 2..2, 4..4]);
pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder
pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder
This configures swap-greed mode for the entire pattern.
When swap-greed mode is enabled, patterns like a+
will become
non-greedy and patterns like a+?
will become greedy. In other words,
the meanings of a+
and a+?
are switched.
This setting can also be configured using the inline flag U
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"a+")
.swap_greed(true)
.build()
.unwrap();
assert_eq!(Some("a"), re.find("aaa").map(|m| m.as_str()));
pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder
pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder
This configures verbose mode for the entire pattern.
When enabled, whitespace will treated as insignifcant in the pattern
and #
can be used to start a comment until the next new line.
Normally, in most places in a pattern, whitespace is treated literally.
For example +
will match one or more ASCII whitespace characters.
When verbose mode is enabled, \#
can be used to match a literal #
and \
can be used to match a literal ASCII whitespace character.
Verbose mode is useful for permitting regexes to be formatted and broken up more nicely. This may make them more easily readable.
This setting can also be configured using the inline flag x
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let pat = r"
\b
(?<first>[A-Z]\w*) # always start with uppercase letter
\s+ # whitespace should separate names
(?: # middle name can be an initial!
(?:(?<initial>[A-Z])\.|(?<middle>[A-Z]\w*))
\s+
)?
(?<last>[A-Z]\w*)
\b
";
let re = RegexBuilder::new(pat)
.ignore_whitespace(true)
.build()
.unwrap();
let caps = re.captures("Harry Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
assert_eq!("Potter", &caps["last"]);
let caps = re.captures("Harry J. Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(Some("J"), caps.name("initial").map(|m| m.as_str()));
assert_eq!(None, caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);
let caps = re.captures("Harry James Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(None, caps.name("initial").map(|m| m.as_str()));
assert_eq!(Some("James"), caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);
pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder
pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder
Sets the approximate size limit, in bytes, of the compiled regex.
This roughly corresponds to the number of heap memory, in bytes, occupied by a single regex. If the regex would otherwise approximately exceed this limit, then compiling that regex will fail.
The main utility of a method like this is to avoid compiling regexes that use an unexpected amount of resources, such as time and memory. Even if the memory usage of a large regex is acceptable, its search time may not be. Namely, worst case time complexity for search is `O(m
- n)
, where
m ~ len(pattern)and
n ~ len(haystack)`. That is, search time depends, in part, on the size of the compiled regex. This means that putting a limit on the size of the regex limits how much a regex can impact search time.
The default for this is some reasonable number that permits most patterns to compile successfully.
§Example
use regex_lite::RegexBuilder;
assert!(RegexBuilder::new(r"\w").size_limit(100).build().is_err());
pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder
pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder
Set the nesting limit for this parser.
The nesting limit controls how deep the abstract syntax tree is allowed to be. If the AST exceeds the given limit (e.g., with too many nested groups), then an error is returned by the parser.
The purpose of this limit is to act as a heuristic to prevent stack overflow for consumers that do structural induction on an AST using explicit recursion. While this crate never does this (instead using constant stack space and moving the call stack to the heap), other crates may.
This limit is not checked until the entire AST is parsed. Therefore, if callers want to put a limit on the amount of heap space used, then they should impose a limit on the length, in bytes, of the concrete pattern string. In particular, this is viable since this parser implementation will limit itself to heap space proportional to the length of the pattern string. See also the untrusted inputs section in the top-level crate documentation for more information about this.
Note that a nest limit of 0
will return a nest limit error for most
patterns but not all. For example, a nest limit of 0
permits a
but
not ab
, since ab
requires an explicit concatenation, which results
in a nest depth of 1
. In general, a nest limit is not something that
manifests in an obvious way in the concrete syntax, therefore, it
should not be used in a granular way.
§Example
use regex_lite::RegexBuilder;
assert!(RegexBuilder::new(r"").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"a").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"(a)").nest_limit(0).build().is_err());
Trait Implementations§
Auto Trait Implementations§
impl Freeze for RegexBuilder
impl RefUnwindSafe for RegexBuilder
impl Send for RegexBuilder
impl Sync for RegexBuilder
impl Unpin for RegexBuilder
impl UnwindSafe for RegexBuilder
Blanket Implementations§
§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
§type ArchivedMetadata = ()
type ArchivedMetadata = ()
§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> ByteSized for T
impl<T> ByteSized for T
Source§const BYTE_ALIGN: usize = _
const BYTE_ALIGN: usize = _
Source§fn byte_align(&self) -> usize ⓘ
fn byte_align(&self) -> usize ⓘ
Source§fn ptr_size_ratio(&self) -> [usize; 2]
fn ptr_size_ratio(&self) -> [usize; 2]
Source§impl<T, R> Chain<R> for Twhere
T: ?Sized,
impl<T, R> Chain<R> for Twhere
T: ?Sized,
Source§impl<T> ExtAny for T
impl<T> ExtAny for T
Source§fn as_any_mut(&mut self) -> &mut dyn Anywhere
Self: Sized,
fn as_any_mut(&mut self) -> &mut dyn Anywhere
Self: Sized,
Source§impl<T> ExtMem for Twhere
T: ?Sized,
impl<T> ExtMem for Twhere
T: ?Sized,
Source§const NEEDS_DROP: bool = _
const NEEDS_DROP: bool = _
Source§fn mem_align_of_val(&self) -> usize ⓘ
fn mem_align_of_val(&self) -> usize ⓘ
Source§fn mem_size_of_val(&self) -> usize ⓘ
fn mem_size_of_val(&self) -> usize ⓘ
Source§fn mem_needs_drop(&self) -> bool
fn mem_needs_drop(&self) -> bool
true
if dropping values of this type matters. Read moreSource§fn mem_forget(self)where
Self: Sized,
fn mem_forget(self)where
Self: Sized,
self
without running its destructor. Read moreSource§fn mem_replace(&mut self, other: Self) -> Selfwhere
Self: Sized,
fn mem_replace(&mut self, other: Self) -> Selfwhere
Self: Sized,
Source§unsafe fn mem_zeroed<T>() -> T
unsafe fn mem_zeroed<T>() -> T
unsafe_layout
only.T
represented by the all-zero byte-pattern. Read moreSource§unsafe fn mem_transmute_copy<Src, Dst>(src: &Src) -> Dst
unsafe fn mem_transmute_copy<Src, Dst>(src: &Src) -> Dst
unsafe_layout
only.T
represented by the all-zero byte-pattern. Read moreSource§fn mem_as_bytes(&self) -> &[u8] ⓘ
fn mem_as_bytes(&self) -> &[u8] ⓘ
unsafe_slice
only.§impl<S> FromSample<S> for S
impl<S> FromSample<S> for S
fn from_sample_(s: S) -> S
Source§impl<T> Hook for T
impl<T> Hook for T
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self> ⓘ
fn instrument(self, span: Span) -> Instrumented<Self> ⓘ
§fn in_current_span(self) -> Instrumented<Self> ⓘ
fn in_current_span(self) -> Instrumented<Self> ⓘ
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more§impl<F, T> IntoSample<T> for Fwhere
T: FromSample<F>,
impl<F, T> IntoSample<T> for Fwhere
T: FromSample<F>,
fn into_sample(self) -> T
§impl<T> LayoutRaw for T
impl<T> LayoutRaw for T
§fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError> ⓘ
fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError> ⓘ
§impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
§unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
§fn resolve_niched(out: Place<NichedOption<T, N1>>)
fn resolve_niched(out: Place<NichedOption<T, N1>>)
out
indicating that a T
is niched.