devela::_dep::regex_lite

Struct RegexBuilder

pub struct RegexBuilder { /* private fields */ }
Available on crate feature dep_regex_lite only.
Expand description

A configurable builder for a Regex.

This builder can be used to programmatically set flags such as i (case insensitive) and x (for verbose mode). This builder can also be used to configure things like a size limit on the compiled regular expression.

Implementations§

§

impl RegexBuilder

pub fn new(pattern: &str) -> RegexBuilder

Create a new builder with a default configuration for the given pattern.

If the pattern is invalid or exceeds the configured size limits, then an error will be returned when RegexBuilder::build is called.

pub fn build(&self) -> Result<Regex, Error>

Compiles the pattern given to RegexBuilder::new with the configuration set on this builder.

If the pattern isn’t a valid regex or if a configured size limit was exceeded, then an error is returned.

pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder

This configures whether to enable ASCII case insensitive matching for the entire pattern.

This setting can also be configured using the inline flag i in the pattern. For example, (?i:foo) matches foo case insensitively while (?-i:foo) matches foo case sensitively.

The default for this is false.

§Example
use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"foo(?-i:bar)quux")
    .case_insensitive(true)
    .build()
    .unwrap();
assert!(re.is_match("FoObarQuUx"));
// Even though case insensitive matching is enabled in the builder,
// it can be locally disabled within the pattern. In this case,
// `bar` is matched case sensitively.
assert!(!re.is_match("fooBARquux"));

pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder

This configures multi-line mode for the entire pattern.

Enabling multi-line mode changes the behavior of the ^ and $ anchor assertions. Instead of only matching at the beginning and end of a haystack, respectively, multi-line mode causes them to match at the beginning and end of a line in addition to the beginning and end of a haystack. More precisely, ^ will match at the position immediately following a \n and $ will match at the position immediately preceding a \n.

The behavior of this option is impacted by the RegexBuilder::crlf setting. Namely, CRLF mode changes the line terminator to be either \r or \n, but never at the position between a \r and \n.

This setting can also be configured using the inline flag m in the pattern.

The default for this is false.

§Example
use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^foo$")
    .multi_line(true)
    .build()
    .unwrap();
assert_eq!(Some(1..4), re.find("\nfoo\n").map(|m| m.range()));

pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder

This configures dot-matches-new-line mode for the entire pattern.

Perhaps surprisingly, the default behavior for . is not to match any character, but rather, to match any character except for the line terminator (which is \n by default). When this mode is enabled, the behavior changes such that . truly matches any character.

This setting can also be configured using the inline flag s in the pattern.

The default for this is false.

§Example
use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"foo.bar")
    .dot_matches_new_line(true)
    .build()
    .unwrap();
let hay = "foo\nbar";
assert_eq!(Some("foo\nbar"), re.find(hay).map(|m| m.as_str()));

pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder

This configures CRLF mode for the entire pattern.

When CRLF mode is enabled, both \r (“carriage return” or CR for short) and \n (“line feed” or LF for short) are treated as line terminators. This results in the following:

  • Unless dot-matches-new-line mode is enabled, . will now match any character except for \n and \r.
  • When multi-line mode is enabled, ^ will match immediately following a \n or a \r. Similarly, $ will match immediately preceding a \n or a \r. Neither ^ nor $ will ever match between \r and \n.

This setting can also be configured using the inline flag R in the pattern.

The default for this is false.

§Example
use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^foo$")
    .multi_line(true)
    .crlf(true)
    .build()
    .unwrap();
let hay = "\r\nfoo\r\n";
// If CRLF mode weren't enabled here, then '$' wouldn't match
// immediately after 'foo', and thus no match would be found.
assert_eq!(Some("foo"), re.find(hay).map(|m| m.as_str()));

This example demonstrates that ^ will never match at a position between \r and \n. ($ will similarly not match between a \r and a \n.)

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^")
    .multi_line(true)
    .crlf(true)
    .build()
    .unwrap();
let hay = "\r\n\r\n";
let ranges: Vec<_> = re.find_iter(hay).map(|m| m.range()).collect();
assert_eq!(ranges, vec![0..0, 2..2, 4..4]);

pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder

This configures swap-greed mode for the entire pattern.

When swap-greed mode is enabled, patterns like a+ will become non-greedy and patterns like a+? will become greedy. In other words, the meanings of a+ and a+? are switched.

This setting can also be configured using the inline flag U in the pattern.

The default for this is false.

§Example
use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"a+")
    .swap_greed(true)
    .build()
    .unwrap();
assert_eq!(Some("a"), re.find("aaa").map(|m| m.as_str()));

pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder

This configures verbose mode for the entire pattern.

When enabled, whitespace will treated as insignifcant in the pattern and # can be used to start a comment until the next new line.

Normally, in most places in a pattern, whitespace is treated literally. For example + will match one or more ASCII whitespace characters.

When verbose mode is enabled, \# can be used to match a literal # and \ can be used to match a literal ASCII whitespace character.

Verbose mode is useful for permitting regexes to be formatted and broken up more nicely. This may make them more easily readable.

This setting can also be configured using the inline flag x in the pattern.

The default for this is false.

§Example
use regex_lite::RegexBuilder;

let pat = r"
    \b
    (?<first>[A-Z]\w*)  # always start with uppercase letter
    \s+                 # whitespace should separate names
    (?: # middle name can be an initial!
        (?:(?<initial>[A-Z])\.|(?<middle>[A-Z]\w*))
        \s+
    )?
    (?<last>[A-Z]\w*)
    \b
";
let re = RegexBuilder::new(pat)
    .ignore_whitespace(true)
    .build()
    .unwrap();

let caps = re.captures("Harry Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
assert_eq!("Potter", &caps["last"]);

let caps = re.captures("Harry J. Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(Some("J"), caps.name("initial").map(|m| m.as_str()));
assert_eq!(None, caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);

let caps = re.captures("Harry James Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(None, caps.name("initial").map(|m| m.as_str()));
assert_eq!(Some("James"), caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);

pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder

Sets the approximate size limit, in bytes, of the compiled regex.

This roughly corresponds to the number of heap memory, in bytes, occupied by a single regex. If the regex would otherwise approximately exceed this limit, then compiling that regex will fail.

The main utility of a method like this is to avoid compiling regexes that use an unexpected amount of resources, such as time and memory. Even if the memory usage of a large regex is acceptable, its search time may not be. Namely, worst case time complexity for search is `O(m

  • n), where m ~ len(pattern)andn ~ len(haystack)`. That is, search time depends, in part, on the size of the compiled regex. This means that putting a limit on the size of the regex limits how much a regex can impact search time.

The default for this is some reasonable number that permits most patterns to compile successfully.

§Example
use regex_lite::RegexBuilder;

assert!(RegexBuilder::new(r"\w").size_limit(100).build().is_err());

pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder

Set the nesting limit for this parser.

The nesting limit controls how deep the abstract syntax tree is allowed to be. If the AST exceeds the given limit (e.g., with too many nested groups), then an error is returned by the parser.

The purpose of this limit is to act as a heuristic to prevent stack overflow for consumers that do structural induction on an AST using explicit recursion. While this crate never does this (instead using constant stack space and moving the call stack to the heap), other crates may.

This limit is not checked until the entire AST is parsed. Therefore, if callers want to put a limit on the amount of heap space used, then they should impose a limit on the length, in bytes, of the concrete pattern string. In particular, this is viable since this parser implementation will limit itself to heap space proportional to the length of the pattern string. See also the untrusted inputs section in the top-level crate documentation for more information about this.

Note that a nest limit of 0 will return a nest limit error for most patterns but not all. For example, a nest limit of 0 permits a but not ab, since ab requires an explicit concatenation, which results in a nest depth of 1. In general, a nest limit is not something that manifests in an obvious way in the concrete syntax, therefore, it should not be used in a granular way.

§Example
use regex_lite::RegexBuilder;

assert!(RegexBuilder::new(r"").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"a").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"(a)").nest_limit(0).build().is_err());

Trait Implementations§

§

impl Debug for RegexBuilder

§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
§

impl<T> ArchivePointee for T

§

type ArchivedMetadata = ()

The archived version of the pointer metadata for this type.
§

fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata

Converts some archived metadata to the pointer metadata for itself.
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByteSized for T

Source§

const BYTE_ALIGN: usize = _

The alignment of this type in bytes.
Source§

const BYTE_SIZE: usize = _

The size of this type in bytes.
Source§

fn byte_align(&self) -> usize

Returns the alignment of this type in bytes.
Source§

fn byte_size(&self) -> usize

Returns the size of this type in bytes. Read more
Source§

fn ptr_size_ratio(&self) -> [usize; 2]

Returns the size ratio between Ptr::BYTES and BYTE_SIZE. Read more
Source§

impl<T, R> Chain<R> for T
where T: ?Sized,

Source§

fn chain<F>(self, f: F) -> R
where F: FnOnce(Self) -> R, Self: Sized,

Chain a function which takes the parameter by value.
Source§

fn chain_ref<F>(&self, f: F) -> R
where F: FnOnce(&Self) -> R,

Chain a function which takes the parameter by shared reference.
Source§

fn chain_mut<F>(&mut self, f: F) -> R
where F: FnOnce(&mut Self) -> R,

Chain a function which takes the parameter by exclusive reference.
Source§

impl<T> ExtAny for T
where T: Any + ?Sized,

Source§

fn type_id() -> TypeId

Returns the TypeId of Self. Read more
Source§

fn type_of(&self) -> TypeId

Returns the TypeId of self. Read more
Source§

fn type_name(&self) -> &'static str

Returns the type name of self. Read more
Source§

fn type_is<T: 'static>(&self) -> bool

Returns true if Self is of type T. Read more
Source§

fn as_any_ref(&self) -> &dyn Any
where Self: Sized,

Upcasts &self as &dyn Any. Read more
Source§

fn as_any_mut(&mut self) -> &mut dyn Any
where Self: Sized,

Upcasts &mut self as &mut dyn Any. Read more
Source§

fn as_any_box(self: Box<Self>) -> Box<dyn Any>
where Self: Sized,

Upcasts Box<self> as Box<dyn Any>. Read more
Source§

fn downcast_ref<T: 'static>(&self) -> Option<&T>

Available on crate feature unsafe_layout only.
Returns some shared reference to the inner value if it is of type T. Read more
Source§

fn downcast_mut<T: 'static>(&mut self) -> Option<&mut T>

Available on crate feature unsafe_layout only.
Returns some exclusive reference to the inner value if it is of type T. Read more
Source§

impl<T> ExtMem for T
where T: ?Sized,

Source§

const NEEDS_DROP: bool = _

Know whether dropping values of this type matters, in compile-time.
Source§

fn mem_align_of<T>() -> usize

Returns the minimum alignment of the type in bytes. Read more
Source§

fn mem_align_of_val(&self) -> usize

Returns the alignment of the pointed-to value in bytes. Read more
Source§

fn mem_size_of<T>() -> usize

Returns the size of a type in bytes. Read more
Source§

fn mem_size_of_val(&self) -> usize

Returns the size of the pointed-to value in bytes. Read more
Source§

fn mem_copy(&self) -> Self
where Self: Copy,

Bitwise-copies a value. Read more
Source§

fn mem_needs_drop(&self) -> bool

Returns true if dropping values of this type matters. Read more
Source§

fn mem_drop(self)
where Self: Sized,

Drops self by running its destructor. Read more
Source§

fn mem_forget(self)
where Self: Sized,

Forgets about self without running its destructor. Read more
Source§

fn mem_replace(&mut self, other: Self) -> Self
where Self: Sized,

Replaces self with other, returning the previous value of self. Read more
Source§

fn mem_take(&mut self) -> Self
where Self: Default,

Replaces self with its default value, returning the previous value of self. Read more
Source§

fn mem_swap(&mut self, other: &mut Self)
where Self: Sized,

Swaps the value of self and other without deinitializing either one. Read more
Source§

unsafe fn mem_zeroed<T>() -> T

Available on crate feature unsafe_layout only.
Returns the value of type T represented by the all-zero byte-pattern. Read more
Source§

unsafe fn mem_transmute_copy<Src, Dst>(src: &Src) -> Dst

Available on crate feature unsafe_layout only.
Returns the value of type T represented by the all-zero byte-pattern. Read more
Source§

fn mem_as_bytes(&self) -> &[u8]
where Self: Sync + Unpin,

Available on crate feature unsafe_slice only.
View a Sync + Unpin self as &[u8]. Read more
Source§

fn mem_as_bytes_mut(&mut self) -> &mut [u8]
where Self: Sync + Unpin,

Available on crate feature unsafe_slice only.
View a Sync + Unpin self as &mut [u8]. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<S> FromSample<S> for S

§

fn from_sample_(s: S) -> S

Source§

impl<T> Hook for T

Source§

fn hook_ref<F>(self, f: F) -> Self
where F: FnOnce(&Self),

Applies a function which takes the parameter by shared reference, and then returns the (possibly) modified owned value. Read more
Source§

fn hook_mut<F>(self, f: F) -> Self
where F: FnOnce(&mut Self),

Applies a function which takes the parameter by exclusive reference, and then returns the (possibly) modified owned value. Read more
§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<F, T> IntoSample<T> for F
where T: FromSample<F>,

§

fn into_sample(self) -> T

§

impl<T> LayoutRaw for T

§

fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>

Returns the layout of the type.
§

impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
where T: SharedNiching<N1, N2>, N1: Niching<T>, N2: Niching<T>,

§

unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool

Returns whether the given value has been niched. Read more
§

fn resolve_niched(out: Place<NichedOption<T, N1>>)

Writes data to out indicating that a T is niched.
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> Pointee for T

§

type Metadata = ()

The metadata type for pointers and references to this type.
§

impl<T, U> ToSample<U> for T
where U: FromSample<T>,

§

fn to_sample_(self) -> U

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
§

impl<S, T> Duplex<S> for T
where T: FromSample<S> + ToSample<S>,

§

impl<T> Ungil for T
where T: Send,