Struct RegexBuilder

Help

pub struct RegexBuilder { /* private fields */ }

Available on crate feature dep_regex_lite only.

Expand description

A configurable builder for a Regex.

This builder can be used to programmatically set flags such as i (case insensitive) and x (for verbose mode). This builder can also be used to configure things like a size limit on the compiled regular expression.

Implementations§

§

impl RegexBuilder

pub fn new(pattern: &str) -> RegexBuilder

Create a new builder with a default configuration for the given pattern.

If the pattern is invalid or exceeds the configured size limits, then an error will be returned when RegexBuilder::build is called.

pub fn build(&self) -> Result<Regex, Error> ⓘ

Compiles the pattern given to RegexBuilder::new with the configuration set on this builder.

If the pattern isn’t a valid regex or if a configured size limit was exceeded, then an error is returned.

pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder

This configures whether to enable ASCII case insensitive matching for the entire pattern.

This setting can also be configured using the inline flag i in the pattern. For example, (?i:foo) matches foo case insensitively while (?-i:foo) matches foo case sensitively.

The default for this is false.

§Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"foo(?-i:bar)quux")
    .case_insensitive(true)
    .build()
    .unwrap();
assert!(re.is_match("FoObarQuUx"));
// Even though case insensitive matching is enabled in the builder,
// it can be locally disabled within the pattern. In this case,
// `bar` is matched case sensitively.
assert!(!re.is_match("fooBARquux"));

pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder

This configures multi-line mode for the entire pattern.

Enabling multi-line mode changes the behavior of the ^ and $ anchor assertions. Instead of only matching at the beginning and end of a haystack, respectively, multi-line mode causes them to match at the beginning and end of a line in addition to the beginning and end of a haystack. More precisely, ^ will match at the position immediately following a \n and $ will match at the position immediately preceding a \n.

The behavior of this option is impacted by the RegexBuilder::crlf setting. Namely, CRLF mode changes the line terminator to be either \r or \n, but never at the position between a \r and \n.

This setting can also be configured using the inline flag m in the pattern.

The default for this is false.

§Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^foo$")
    .multi_line(true)
    .build()
    .unwrap();
assert_eq!(Some(1..4), re.find("\nfoo\n").map(|m| m.range()));

pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder

This configures dot-matches-new-line mode for the entire pattern.

Perhaps surprisingly, the default behavior for . is not to match any character, but rather, to match any character except for the line terminator (which is \n by default). When this mode is enabled, the behavior changes such that . truly matches any character.

This setting can also be configured using the inline flag s in the pattern.

The default for this is false.

§Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"foo.bar")
    .dot_matches_new_line(true)
    .build()
    .unwrap();
let hay = "foo\nbar";
assert_eq!(Some("foo\nbar"), re.find(hay).map(|m| m.as_str()));

pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder

This configures CRLF mode for the entire pattern.

When CRLF mode is enabled, both \r (“carriage return” or CR for short) and \n (“line feed” or LF for short) are treated as line terminators. This results in the following:

Unless dot-matches-new-line mode is enabled, . will now match any character except for \n and \r.
When multi-line mode is enabled, ^ will match immediately following a \n or a \r. Similarly, $ will match immediately preceding a \n or a \r. Neither ^ nor $ will ever match between \r and \n.

This setting can also be configured using the inline flag R in the pattern.

The default for this is false.

§Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^foo$")
    .multi_line(true)
    .crlf(true)
    .build()
    .unwrap();
let hay = "\r\nfoo\r\n";
// If CRLF mode weren't enabled here, then '$' wouldn't match
// immediately after 'foo', and thus no match would be found.
assert_eq!(Some("foo"), re.find(hay).map(|m| m.as_str()));

This example demonstrates that ^ will never match at a position between \r and \n. ($ will similarly not match between a \r and a \n.)

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^")
    .multi_line(true)
    .crlf(true)
    .build()
    .unwrap();
let hay = "\r\n\r\n";
let ranges: Vec<_> = re.find_iter(hay).map(|m| m.range()).collect();
assert_eq!(ranges, vec![0..0, 2..2, 4..4]);

pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder

This configures swap-greed mode for the entire pattern.

When swap-greed mode is enabled, patterns like a+ will become non-greedy and patterns like a+? will become greedy. In other words, the meanings of a+ and a+? are switched.

This setting can also be configured using the inline flag U in the pattern.

The default for this is false.

§Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"a+")
    .swap_greed(true)
    .build()
    .unwrap();
assert_eq!(Some("a"), re.find("aaa").map(|m| m.as_str()));

pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder

This configures verbose mode for the entire pattern.

When enabled, whitespace will treated as insignifcant in the pattern and # can be used to start a comment until the next new line.

Normally, in most places in a pattern, whitespace is treated literally. For example + will match one or more ASCII whitespace characters.

When verbose mode is enabled, \# can be used to match a literal # and \ can be used to match a literal ASCII whitespace character.

Verbose mode is useful for permitting regexes to be formatted and broken up more nicely. This may make them more easily readable.

This setting can also be configured using the inline flag x in the pattern.

The default for this is false.

§Example

use regex_lite::RegexBuilder;

let pat = r"
    \b
    (?<first>[A-Z]\w*)  # always start with uppercase letter
    \s+                 # whitespace should separate names
    (?: # middle name can be an initial!
        (?:(?<initial>[A-Z])\.|(?<middle>[A-Z]\w*))
        \s+
    )?
    (?<last>[A-Z]\w*)
    \b
";
let re = RegexBuilder::new(pat)
    .ignore_whitespace(true)
    .build()
    .unwrap();

let caps = re.captures("Harry Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
assert_eq!("Potter", &caps["last"]);

let caps = re.captures("Harry J. Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(Some("J"), caps.name("initial").map(|m| m.as_str()));
assert_eq!(None, caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);

let caps = re.captures("Harry James Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(None, caps.name("initial").map(|m| m.as_str()));
assert_eq!(Some("James"), caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);

pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder

Sets the approximate size limit, in bytes, of the compiled regex.

This roughly corresponds to the number of heap memory, in bytes, occupied by a single regex. If the regex would otherwise approximately exceed this limit, then compiling that regex will fail.

The main utility of a method like this is to avoid compiling regexes that use an unexpected amount of resources, such as time and memory. Even if the memory usage of a large regex is acceptable, its search time may not be. Namely, worst case time complexity for search is `O(m

n), where m ~ len(pattern)andn ~ len(haystack)`. That is, search time depends, in part, on the size of the compiled regex. This means that putting a limit on the size of the regex limits how much a regex can impact search time.

The default for this is some reasonable number that permits most patterns to compile successfully.

§Example

use regex_lite::RegexBuilder;

assert!(RegexBuilder::new(r"\w").size_limit(100).build().is_err());

pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder

Set the nesting limit for this parser.

The nesting limit controls how deep the abstract syntax tree is allowed to be. If the AST exceeds the given limit (e.g., with too many nested groups), then an error is returned by the parser.

The purpose of this limit is to act as a heuristic to prevent stack overflow for consumers that do structural induction on an AST using explicit recursion. While this crate never does this (instead using constant stack space and moving the call stack to the heap), other crates may.

This limit is not checked until the entire AST is parsed. Therefore, if callers want to put a limit on the amount of heap space used, then they should impose a limit on the length, in bytes, of the concrete pattern string. In particular, this is viable since this parser implementation will limit itself to heap space proportional to the length of the pattern string. See also the untrusted inputs section in the top-level crate documentation for more information about this.

Note that a nest limit of 0 will return a nest limit error for most patterns but not all. For example, a nest limit of 0 permits a but not ab, since ab requires an explicit concatenation, which results in a nest depth of 1. In general, a nest limit is not something that manifests in an obvious way in the concrete syntax, therefore, it should not be used in a granular way.

§Example

use regex_lite::RegexBuilder;

assert!(RegexBuilder::new(r"").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"a").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"(a)").nest_limit(0).build().is_err());

Trait Implementations§

§

impl Debug for RegexBuilder

§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error> ⓘ

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

Struct RegexBuilderCopy item path

Implementations§

impl RegexBuilder

pub fn new(pattern: &str) -> RegexBuilder

pub fn build(&self) -> Result<Regex, Error> ⓘ

pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder

§Example

pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder

§Example

pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder

§Example

pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder

§Example

pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder

§Example

pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder

§Example

pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder

§Example

pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder

§Example

Trait Implementations§

impl Debug for RegexBuilder

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error> ⓘ

Auto Trait Implementations§

impl Freeze for RegexBuilder

impl RefUnwindSafe for RegexBuilder

impl Send for RegexBuilder

impl Sync for RegexBuilder

impl Unpin for RegexBuilder

impl UnwindSafe for RegexBuilder

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> ByteSized for T

const BYTE_ALIGN: usize = _

const BYTE_SIZE: usize = _

fn byte_align(&self) -> usize

fn byte_size(&self) -> usize

fn ptr_size_ratio(&self) -> [usize; 2]

impl<T, R> Chain<R> for Twhere T: ?Sized,

fn chain<F>(self, f: F) -> Rwhere F: FnOnce(Self) -> R, Self: Sized,

fn chain_ref<F>(&self, f: F) -> Rwhere F: FnOnce(&Self) -> R,

fn chain_mut<F>(&mut self, f: F) -> Rwhere F: FnOnce(&mut Self) -> R,

impl<T> ExtAny for Twhere T: Any + ?Sized,

fn type_id() -> TypeId

fn type_of(&self) -> TypeId

fn type_name(&self) -> &'static str ⓘ

fn type_is<T: 'static>(&self) -> bool

fn type_hash(&self) -> u64

fn type_hash_with<H: Hasher>(&self, hasher: H) -> u64

fn as_any_ref(&self) -> &dyn Anywhere Self: Sized,

fn as_any_mut(&mut self) -> &mut dyn Anywhere Self: Sized,

fn as_any_box(self: Box<Self>) -> Box<dyn Any>where Self: Sized,

fn downcast_ref<T: 'static>(&self) -> Option<&T> ⓘ

fn downcast_mut<T: 'static>(&mut self) -> Option<&mut T> ⓘ

impl<T> ExtMem for Twhere T: ?Sized,

const NEEDS_DROP: bool = _

fn mem_align_of<T>() -> usize

fn mem_align_of_val(&self) -> usize

fn mem_size_of<T>() -> usize

fn mem_size_of_val(&self) -> usize

fn mem_copy(&self) -> Selfwhere Self: Copy,

fn mem_needs_drop(&self) -> bool

fn mem_drop(self)where Self: Sized,

fn mem_forget(self)where Self: Sized,

fn mem_replace(&mut self, other: Self) -> Selfwhere Self: Sized,

fn mem_take(&mut self) -> Selfwhere Self: Default,

fn mem_swap(&mut self, other: &mut Self)where Self: Sized,

unsafe fn mem_zeroed<T>() -> T

unsafe fn mem_transmute_copy<Src, Dst>(src: &Src) -> Dst

fn mem_as_bytes(&self) -> &[u8] ⓘwhere Self: Sync + Unpin,

fn mem_as_bytes_mut(&mut self) -> &mut [u8] ⓘwhere Self: Sync + Unpin,

impl<T> From<T> for T

fn from(t: T) -> T

impl<S> FromSample<S> for S

Struct RegexBuilder

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, R> Chain<R> for T
where T: ?Sized,

fn chain<F>(self, f: F) -> R
where F: FnOnce(Self) -> R, Self: Sized,

fn chain_ref<F>(&self, f: F) -> R
where F: FnOnce(&Self) -> R,

fn chain_mut<F>(&mut self, f: F) -> R
where F: FnOnce(&mut Self) -> R,

impl<T> ExtAny for T
where T: Any + ?Sized,

fn as_any_ref(&self) -> &dyn Any
where Self: Sized,

fn as_any_mut(&mut self) -> &mut dyn Any
where Self: Sized,

fn as_any_box(self: Box<Self>) -> Box<dyn Any>
where Self: Sized,

impl<T> ExtMem for T
where T: ?Sized,

fn mem_copy(&self) -> Self
where Self: Copy,

fn mem_drop(self)
where Self: Sized,

fn mem_forget(self)
where Self: Sized,

fn mem_replace(&mut self, other: Self) -> Self
where Self: Sized,

fn mem_take(&mut self) -> Self
where Self: Default,

fn mem_swap(&mut self, other: &mut Self)
where Self: Sized,

fn mem_as_bytes(&self) -> &[u8] ⓘ
where Self: Sync + Unpin,

fn mem_as_bytes_mut(&mut self) -> &mut [u8] ⓘ
where Self: Sync + Unpin,

fn hook_ref<F>(self, f: F) -> Self
where F: FnOnce(&Self),

fn hook_mut<F>(self, f: F) -> Self
where F: FnOnce(&mut Self),

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
where F: FnOnce(&Self) -> bool,

impl<F, T> IntoSample<T> for F
where T: FromSample<F>,

impl<T, U> ToSample<U> for T
where U: FromSample<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self> ⓘ
where S: Into<Dispatch>,

impl<S, T> Duplex<S> for T
where T: FromSample<S> + ToSample<S>,

impl<T> Ungil for T
where T: Send,