Welcome

Welcome to the melody documentation!

Melody is a language that compiles to regular expressions and aims to be more easily readable and maintainable

Examples

Note: these are for the currently supported syntax and may change

Batman Theme

16 of "na";

2 of match {
  <space>;
  "batman";
}

// 🦇🦸‍♂️

Turns into

(?:na){16}(?: batman){2}

Twitter Hashtag

"#";
some of <word>;

// #melody

Turns into

#\w+

Introductory Courses

some of <alphabetic>;
<space>;
"1";
2 of <digit>;

// classname 1xx

Turns into

[a-zA-Z]+ 1\d{2}

Indented Code (2 spaces)

some of match {
  2 of <space>;
}

some of <char>;
";";

// let value = 5;

Turns into

(?: {2})+.+;

Semantic Versions

<start>;

option of "v";

capture major {
  some of <digit>;
}

".";

capture minor {
  some of <digit>;
}

".";

capture patch {
  some of <digit>;
}

<end>;

// v1.0.0

Turns into

^v?(?<major>\d+)\.(?<minor>\d+)\.(?<patch>\d+)$

Playground

You can try Melody in your browser by visiting the playground

Install

Cargo

cargo install melody_cli

From Source

git clone https://github.com/yoav-lavi/melody.git
cd melody
cargo install --path crates/melody_cli

Binary

  • macOS binaries (aarch64 and x86_64) can be downloaded from the release page

Community

  • Brew (macOS and Linux)

    Installation instructions
    brew install melody
    
  • Arch Linux (maintained by @ilai-deutel)

    Installation instructions
    1. Installation with an AUR helper, for instance using paru:

      paru -Syu melody
      
    2. Install manually with makepkg:

      git clone https://aur.archlinux.org/melody.git
      cd melody
      makepkg -si
      
  • NixOS (maintained by @jyooru)

    Installation instructions

    Should be the following once the registry is updated.

    If you've successfuly installed via this method please open an issue and let me know.

    Thanks!

    nix-env -i melody
    

CLI

USAGE:
    melody [OPTIONS] [INPUT_FILE_PATH]

ARGS:
    <INPUT_FILE_PATH>    Read from a file
                         Use '-' and or pipe input to read from stdin

OPTIONS:
    -f, --test-file <TEST_FILE>
            Test the compiled regex against the contents of a file

        --generate-completions <COMPLETIONS>
            Outputs completions for the selected shell
            To use, write the output to the appropriate location for your shell

    -h, --help
            Print help information

    -n, --no-color
            Print output with no color

    -o, --output <OUTPUT_FILE_PATH>
            Write to a file

    -r, --repl
            Start the Melody REPL

    -t, --test <TEST>
            Test the compiled regex against a string

    -V, --version
            Print version information

Crates

Syntax

Quantifiers

  • ... of - used to express a specific amount of a pattern. equivalent to regex {5} (assuming 5 of ...)
  • ... to ... of - used to express an amount within a range of a pattern. equivalent to regex {5,9} (assuming 5 to 9 of ...)
  • over ... of - used to express more than an amount of a pattern. equivalent to regex {6,} (assuming over 5 of ...)
  • some of - used to express 1 or more of a pattern. equivalent to regex +
  • any of - used to express 0 or more of a pattern. equivalent to regex *
  • option of - used to express 0 or 1 of a pattern. equivalent to regex ?

All quantifiers can be preceded by lazy to match the least amount of characters rather than the most characters (greedy). Equivalent to regex +?, *?, etc.

Symbols

  • <char> - matches any single character. equivalent to regex .
  • <space> - matches a space character. equivalent to regex
  • <whitespace> - matches any kind of whitespace character. equivalent to regex \s or [ \t\n\v\f\r]
  • <newline> - matches a newline character. equivalent to regex \n
  • <tab> - matches a tab character. equivalent to regex \t
  • <return> - matches a carriage return character. equivalent to regex \r
  • <feed> - matches a form feed character. equivalent to regex \f
  • <null> - matches a null characther. equivalent to regex \0
  • <digit> - matches any single digit. equivalent to regex \d or [0-9]
  • <vertical> - matches a vertical tab character. equivalent to regex \v
  • <word> - matches a word character (any latin letter, any digit or an underscore). equivalent to regex \w or [a-zA-Z0-9_]
  • <alphabetic> - matches any single latin letter. equivalent to regex [a-zA-Z]
  • <alphanumeric> - matches any single latin letter or any single digit. equivalent to regex [a-zA-Z0-9]
  • <boundary> - Matches a character between a character matched by <word> and a character not matched by <word> without consuming the character. equivalent to regex \b
  • <backspace> - matches a backspace control character. equivalent to regex [\b]

All symbols can be preceeded with not to match any character other than the symbol

Special Symbols

  • <start> - matches the start of the string. equivalent to regex ^
  • <end> - matches the end of the string. equivalent to regex $

Unicode Categories

Note: these are not supported when testing in the CLI (-t or -f) as the regex engine used does not support unicode categories. These require using the u flag.

  • <category::letter> - any kind of letter from any language
    • <category::lowercase_letter> - a lowercase letter that has an uppercase variant
    • <category::uppercase_letter> - an uppercase letter that has a lowercase variant.
    • <category::titlecase_letter> - a letter that appears at the start of a word when only the first letter of the word is capitalized
    • <category::cased_letter> - a letter that exists in lowercase and uppercase variants
    • <category::modifier_letter> - a special character that is used like a letter
    • <category::other_letter> - a letter or ideograph that does not have lowercase and uppercase variants
  • <category::mark> - a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)
    • <category::non_spacing_mark> - a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.)
    • <category::spacing_combining_mark> - a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages)
    • <category::enclosing_mark> - a character that encloses the character it is combined with (circle, square, keycap, etc.)
  • <category::separator> - any kind of whitespace or invisible separator
    • <category::space_separator> - a whitespace character that is invisible, but does take up space
    • <category::line_separator> - line separator character U+2028
    • <category::paragraph_separator> - paragraph separator character U+2029
  • <category::symbol> - math symbols, currency signs, dingbats, box-drawing characters, etc
    • <category::math_symbol> - any mathematical symbol
    • <category::currency_symbol> - any currency sign
    • <category::modifier_symbol> - a combining character (mark) as a full character on its own
    • <category::other_symbol> - various symbols that are not math symbols, currency signs, or combining characters
  • <category::number> - any kind of numeric character in any script
    • <category::decimal_digit_number> - a digit zero through nine in any script except ideographic scripts
    • <category::letter_number> - a number that looks like a letter, such as a Roman numeral
    • <category::other_number> - a superscript or subscript digit, or a number that is not a digit 0–9 (excluding numbers from ideographic scripts)
  • <category::punctuation> - any kind of punctuation character
    • <category::dash_punctuation> - any kind of hyphen or dash
    • <category::open_punctuation> - any kind of opening bracket
    • <category::close_punctuation> - any kind of closing bracket
    • <category::initial_punctuation> - any kind of opening quote
    • <category::final_punctuation> - any kind of closing quote
    • <category::connector_punctuation> - a punctuation character such as an underscore that connects words
    • <category::other_punctuation> - any kind of punctuation character that is not a dash, bracket, quote or connectors
  • <category::other> - invisible control characters and unused code points
    • <category::control> - an ASCII or Latin-1 control character: 0x00–0x1F and 0x7F–0x9F
    • <category::format> - invisible formatting indicator
    • <category::private_use> - any code point reserved for private use
    • <category::surrogate> - one half of a surrogate pair in UTF-16 encoding
    • <category::unassigned> - any code point to which no character has been assigned

These descriptions are from regular-expressions.info

Character Ranges

  • ... to ... - used with digits or alphabetic characters to express a character range. equivalent to regex [5-9] (assuming 5 to 9) or [a-z] (assuming a to z)

Literals

  • "..." or '...' - used to mark a literal part of the match. Melody will automatically escape characters as needed. Quotes (of the same kind surrounding the literal) should be escaped

Raw

  • `...` - added directly to the output without any escaping

Groups

  • capture - used to open a capture or named capture block. captured patterns are later available in the list of matches (either positional or named). equivalent to regex (...)
  • match - used to open a match block, matches the contents without capturing. equivalent to regex (?:...)
  • either - used to open an either block, matches one of the statements within the block. equivalent to regex (?:...|...)

Assertions

  • ahead - used to open an ahead block. equivalent to regex (?=...). use after an expression
  • behind - used to open an behind block. equivalent to regex (?<=...). use before an expression

Assertions can be preceeded by not to create a negative assertion (equivalent to regex (?!...), (?<!...))

Variables

  • let .variable_name = { ... } - defines a variable from a block of statements. can later be used with .variable_name. Variables must be declared before being used. Variable invocations cannot be quantified directly, use a group if you want to quantify a variable invocation

    example:

    let .a_and_b = {
      "a";
      "b";
    }
    
    .a_and_b;
    "c";
    
    // abc
    
    

Extras

  • /* ... */, // ... - used to mark comments (note: // ... comments must be on separate line)

Future Feature Status

🐣 - Partially implemented

❌ - Not implemented

❔ - Unclear what the syntax will be

❓ - Unclear whether this will be implemented

MelodyRegexStatus
not "A";[^A]🐣
variables / macros🐣
<...::...>\p{...}🐣
not <...::...>\P{...}🐣
file watcher
multiline groups in REPL
flags: global, multiline, .../.../gm...
(?)\#
(?)\k<name>
(?)\uYYYY
(?)\xYY
(?)\ddd
(?)\cY
(?)$1
(?)$`
(?)$&
(?)x20
(?)x{06fa}
any of "a", "b", "c" *[abc]
multiple ranges *[a-zA-Z0-9]
regex optimization
standard library / patterns
reverse compiler

* these are expressable in the current syntax using other methods

Performance

Last measured on v0.20.0

Measured on an 8 core 2021 MacBook Pro 14-inch, Apple M1 Pro using criterion:

  • 8 lines:

    compiler/normal (8 lines)
                              time:   [4.3556 µs 4.3674 µs 4.3751 µs]
    slope  [4.3556 µs 4.3751 µs] R^2            [0.9996144 0.9996931]
    mean   [4.3377 µs 4.3678 µs] std. dev.      [16.019 ns 30.154 ns]
    median [4.3270 µs 4.3777 µs] med. abs. dev. [3.1402 ns 41.334 ns]
    
  • 1M lines:

    compiler/long input (1M lines)
                              time:   [470.04 ms 472.35 ms 474.78 ms]
    mean   [470.04 ms 474.78 ms] std. dev.      [2.0458 ms 5.3453 ms]
    median [469.54 ms 475.24 ms] med. abs. dev. [734.10 µs 6.8144 ms]
    
  • Deeply nested:

    compiler/deeply nested
                              time:   [4.2357 µs 4.2561 µs 4.2782 µs]
    slope  [4.2357 µs 4.2782 µs] R^2            [0.9988854 0.9988087]
    mean   [4.2474 µs 4.2752 µs] std. dev.      [13.698 ns 29.574 ns]
    median [4.2426 µs 4.2819 µs] med. abs. dev. [2.7127 ns 43.193 ns]
    

To reproduce, run cargo bench or cargo xtask benchmark

Extensions

Packages

Integrations

Changelog

[v0.20.0] - 2024-11-24

Breaking

  • Sets the MSRV to Rust 1.70.0

Fixes

  • Removes use of atty as it is unmaintained and has a low CVE

Dependencies

  • Updates dependencies

Refactoring

  • Clippy fixes

[v0.19.0] - 2023-07-16

Breaking

  • Sets the MSRV to Rust 1.65.0

Features

  • Adds console.error output for panics on the Wasm version
  • Deno no longer requires an init function

Fixes

  • Fixes a few edge cases with hyphens and slashes

Dependencies

  • Updates dependencies

Refactoring

  • Clippy fixes

[v0.18.1] - 2022-06-25

Fixes

  • Fixes playground link

Dependencies

  • Updates dependencies

Refactoring

  • Clippy fixes

[v0.18.0] - 2022-04-24

Features

  • Adds support for unicode categories

Misc.

  • Update dependencies

[v0.17.0] - 2022-04-23

Features

  • Add support for testing matches in a file in the CLI

Refactoring

  • Remove anyhow in compiler in favor of emitting specific error variants

[v0.16.0] - 2022-04-13

Features

  • Add support for testing matches in CLI

[v0.15.0] - 2022-04-13

Features

  • Add shell completions for CLI
  • Add Deno support

[v0.14.0] - 2022-04-11

Features

  • Support stdin in CLI
  • Emit proper exit codes on specific errors

[v0.13.10] - 2022-03-11

Fixes

  • Fixes unnecessary grouping in quantifiers

[v0.13.9] - 2022-03-11

Misc.

  • Version bump for documentation update

[v0.13.8] - 2022-03-11

Misc.

  • Version bump for documentation update

[v0.13.7] - 2022-03-11

Misc.

  • Version bump for documentation update

[v0.13.6] - 2022-03-11

Fixes

  • Handles a few possible panics

[v0.13.5] - 2022-03-11

Misc.

  • Version bump

[v0.13.4] - 2022-03-11

Tooling

  • Strips binaries

Dependencies

  • Updates dependencies

[v0.13.3] - 2022-03-09

Refactoring

  • Replaces lazy_static with once_cell

[v0.13.2] - 2022-03-09

Performance

  • Improves literal parse performance

Refactoring

  • Reports a few possible panics with a ParseError

[v0.13.1] - 2022-03-08

Fixes

  • Fixes an issue with single letter variable identifiers matching a following space
  • Fixes a clash between REPL commands and variables

[v0.13.0] - 2022-03-08

Breaking

  • <alphabet> is now <alphabetic>

Features

  • Support for lazy quantifiers
  • All symbols now have negative counterparts
  • <alphanumeric> symbol added
  • Adds an experimental implementation of variables

[v0.12.4] - 2022-03-06

Misc.

  • Version bump

[v0.12.3] - 2022-03-06

Fixes

  • Fixes an issue with identifying negative char ranges

[v0.12.2] - 2022-03-05

Refactoring

  • Performance improvements

Misc.

  • Adds keywords and categories to cargo.toml files

[v0.12.1] - 2022-03-04

Misc.

  • CLI documentation update

[v0.12.0] - 2022-03-04

Breaking

  • Produces clean output (no // and new newline after output)

Features

  • Adds favicons for documentation and playground
  • The Melody playground now supports add to homescreen
  • Adds #![forbid(unsafe_code)]

Benchmarks

  • Adds benchmarks

[v0.11.1] - 2022-03-03

Fixes

  • Fixes possible panics

Tests

  • Adds tests
  • Adds tests for CLI

Refactoring

  • Removes duplicated code

[v0.11.0] - 2022-03-02

Breaking

  • ParseError now contains only one message field, may be changed in the future
  • Line comments (//) may only be used in a separate line
  • The REPL currently accepts blocks on a single line but not multiple lines
  • Semicolons are no longer optional

Features

  • Uses a Pest grammar and an AST to parse Melody
  • Adds support for nested groups
  • Adds support for negative ranges
  • Adds initial support for negative character classes
  • Adds support for <backspace>, <boundary>
  • Adds support for inline comments
  • Enforces group closing
  • Supports NO_COLOR in CLI
  • -n removes color from REPL as well

[v0.10.3] - 2022-02-26

Fixes

  • Removes quantifiers after newlines

[v0.10.2] - 2022-02-26

Fixes

  • Fixes the handling of some newline issues in the REPL
  • Adds an error message for a read error in the REPL

[v0.10.1] - 2022-02-26

Fixes

  • Trims only the end off of REPL input

[v0.10.0] - 2022-02-26

Breaking

  • Changes the -f, --file CLI argument to -o, --output

Features

  • Adds descriptions to CLI commands

[v0.9.0] - 2022-02-26

Features

  • Adds ahead, not ahead, behind and not behind assertions

[v0.8.0] - 2022-02-26

Features

  • Changes <space> to <whitespace> (thanks @amirali #34)
  • Adds <space> and <alphabet> (thanks @amirali #34)
  • Adds long versions for REPL commands
  • Adds .s, .source to print the current source in the REPL
  • Adds .c, .clear to clear REPL history
  • Adds better error reporting to the playground

Fixes

  • Fixes some undo / redo issues in the REPL

Refactoring

  • Better error handling in the CLI

[v0.7.0] - 2022-02-24

Features

  • Adds a REPL for melody_cli
  • Adds better error messages for the playground

[v0.6.0] - 2022-02-23

Features

  • Adds support for raw sequences (`...`)
  • Allows any word character in capture names
  • Adds auto escaping for literals
  • Adds the Melody version number to the documentation

Syntax Changes

  • Changes start, end, and char to symbols (<start>, <end>, <char>)
  • either creates a non capturing group

Refactoring

  • cargo clippy fixes in melody_wasm

Fixes

  • Uses the correct url in the documentation site config