Syntax
Quantifiers
... of
- used to express a specific amount of a pattern. equivalent to regex{5}
(assuming5 of ...
)... to ... of
- used to express an amount within a range of a pattern. equivalent to regex{5,9}
(assuming5 to 9 of ...
)over ... of
- used to express more than an amount of a pattern. equivalent to regex{6,}
(assumingover 5 of ...
)some of
- used to express 1 or more of a pattern. equivalent to regex+
any of
- used to express 0 or more of a pattern. equivalent to regex*
option of
- used to express 0 or 1 of a pattern. equivalent to regex?
All quantifiers can be preceded by lazy
to match the least amount of characters rather than the most characters (greedy). Equivalent to regex +?
, *?
, etc.
Symbols
<char>
- matches any single character. equivalent to regex.
<space>
- matches a space character. equivalent to regex<whitespace>
- matches any kind of whitespace character. equivalent to regex\s
or[ \t\n\v\f\r]
<newline>
- matches a newline character. equivalent to regex\n
<tab>
- matches a tab character. equivalent to regex\t
<return>
- matches a carriage return character. equivalent to regex\r
<feed>
- matches a form feed character. equivalent to regex\f
<null>
- matches a null characther. equivalent to regex\0
<digit>
- matches any single digit. equivalent to regex\d
or[0-9]
<vertical>
- matches a vertical tab character. equivalent to regex\v
<word>
- matches a word character (any latin letter, any digit or an underscore). equivalent to regex\w
or[a-zA-Z0-9_]
<alphabetic>
- matches any single latin letter. equivalent to regex[a-zA-Z]
<alphanumeric>
- matches any single latin letter or any single digit. equivalent to regex[a-zA-Z0-9]
<boundary>
- Matches a character between a character matched by<word>
and a character not matched by<word>
without consuming the character. equivalent to regex\b
<backspace>
- matches a backspace control character. equivalent to regex[\b]
All symbols can be preceeded with not
to match any character other than the symbol
Special Symbols
<start>
- matches the start of the string. equivalent to regex^
<end>
- matches the end of the string. equivalent to regex$
Unicode Categories
Note: these are not supported when testing in the CLI (-t
or -f
) as the regex engine used does not support unicode categories. These require using the u
flag.
<category::letter>
- any kind of letter from any language<category::lowercase_letter>
- a lowercase letter that has an uppercase variant<category::uppercase_letter>
- an uppercase letter that has a lowercase variant.<category::titlecase_letter>
- a letter that appears at the start of a word when only the first letter of the word is capitalized<category::cased_letter>
- a letter that exists in lowercase and uppercase variants<category::modifier_letter>
- a special character that is used like a letter<category::other_letter>
- a letter or ideograph that does not have lowercase and uppercase variants
<category::mark>
- a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)<category::non_spacing_mark>
- a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.)<category::spacing_combining_mark>
- a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages)<category::enclosing_mark>
- a character that encloses the character it is combined with (circle, square, keycap, etc.)
<category::separator>
- any kind of whitespace or invisible separator<category::space_separator>
- a whitespace character that is invisible, but does take up space<category::line_separator>
- line separator character U+2028<category::paragraph_separator>
- paragraph separator character U+2029
<category::symbol>
- math symbols, currency signs, dingbats, box-drawing characters, etc<category::math_symbol>
- any mathematical symbol<category::currency_symbol>
- any currency sign<category::modifier_symbol>
- a combining character (mark) as a full character on its own<category::other_symbol>
- various symbols that are not math symbols, currency signs, or combining characters
<category::number>
- any kind of numeric character in any script<category::decimal_digit_number>
- a digit zero through nine in any script except ideographic scripts<category::letter_number>
- a number that looks like a letter, such as a Roman numeral<category::other_number>
- a superscript or subscript digit, or a number that is not a digit 0â9 (excluding numbers from ideographic scripts)
<category::punctuation>
- any kind of punctuation character<category::dash_punctuation>
- any kind of hyphen or dash<category::open_punctuation>
- any kind of opening bracket<category::close_punctuation>
- any kind of closing bracket<category::initial_punctuation>
- any kind of opening quote<category::final_punctuation>
- any kind of closing quote<category::connector_punctuation>
- a punctuation character such as an underscore that connects words<category::other_punctuation>
- any kind of punctuation character that is not a dash, bracket, quote or connectors
<category::other>
- invisible control characters and unused code points<category::control>
- an ASCII or Latin-1 control character: 0x00â0x1F and 0x7Fâ0x9F<category::format>
- invisible formatting indicator<category::private_use>
- any code point reserved for private use<category::surrogate>
- one half of a surrogate pair in UTF-16 encoding<category::unassigned>
- any code point to which no character has been assigned
These descriptions are from regular-expressions.info
Character Ranges
... to ...
- used with digits or alphabetic characters to express a character range. equivalent to regex[5-9]
(assuming5 to 9
) or[a-z]
(assuminga to z
)
Literals
"..."
or'...'
- used to mark a literal part of the match. Melody will automatically escape characters as needed. Quotes (of the same kind surrounding the literal) should be escaped
Raw
`...`
- added directly to the output without any escaping
Groups
capture
- used to open acapture
or namedcapture
block. captured patterns are later available in the list of matches (either positional or named). equivalent to regex(...)
match
- used to open amatch
block, matches the contents without capturing. equivalent to regex(?:...)
either
- used to open aneither
block, matches one of the statements within the block. equivalent to regex(?:...|...)
Assertions
ahead
- used to open anahead
block. equivalent to regex(?=...)
. use after an expressionbehind
- used to open anbehind
block. equivalent to regex(?<=...)
. use before an expression
Assertions can be preceeded by not
to create a negative assertion (equivalent to regex (?!...)
, (?<!...)
)
Variables
-
let .variable_name = { ... }
- defines a variable from a block of statements. can later be used with.variable_name
. Variables must be declared before being used. Variable invocations cannot be quantified directly, use a group if you want to quantify a variable invocationexample:
let .a_and_b = { "a"; "b"; } .a_and_b; "c"; // abc
Extras
/* ... */
,// ...
- used to mark comments (note:// ...
comments must be on separate line)