Skip to content

Commit

Permalink
Add verbose helper (#43)
Browse files Browse the repository at this point in the history
* Add verbose helper

* Cosmetic change

* Fix linter
  • Loading branch information
hedhyw authored Jun 23, 2022
1 parent 77da81f commit cd2ab1d
Show file tree
Hide file tree
Showing 7 changed files with 414 additions and 206 deletions.
272 changes: 71 additions & 201 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,23 @@

![rex-gopher](_docs/gopher.png)

This is a regular expressions builder for gophers!
**This is a regular expressions builder for gophers!**

- **[Why?](#why)**
- **[FAQ](#faq)**
- **[Documentation](_docs/library.md)**
- **[Examples](pkg/examples_test.go)**
- **[License](#license)**

## Why?

It makes readability better and helps to construct regular expressions using human-friendly constructions. Also, it allows commenting and reusing blocks, which improves the quality of code. It provides a convenient way to use parameterized patterns. It is easy to implement custom patterns or use a combination of others.

It is just a builder, so it returns standart [`*regexp.Regexp`](https://pkg.go.dev/regexp#Regexp).

The library supports [groups](#groups), [composits](#simple-composite), [classes](#character-classes), [flags](#flags), [repetitions](#repetitions) and if you want you can even use [raw regular expressions](#raw-regular-expression) in any place. Also it contains a set of [predefined helpers](#helper) for matching phones, emails, etc...
The library supports [groups](_docs/library.md#groups), [composits](_docs/library.md#groups), [classes](_docs/library.md#character-classes), [flags](_docs/library.md#flags), [repetitions](_docs/library.md#repetitions) and if you want you can even use `raw regular expressions` in any place. Also it contains a set of [predefined helpers](_docs/library.md#helper) with patterns for number ranges, phones, emails, etc...

Let's see an example of validating or matching `some_id[#]` using verbose patterns:
Let's see an example of validating or matching `someid[#]` using a verbose pattern:
```golang
re := rex.New(
rex.Chars.Begin(), // `^`
Expand All @@ -36,231 +42,95 @@ re := rex.New(
Yes, it requires more code, but it has its advantages.
> More, but simpler code, fewer bugs.
You can still use original regular expressions. Example of matching
numbers between `-111.99` and `1111.99` using a combination of patterns
and raw regular expression:
You can still use original regular expressions as is in any place. Example of
matching numbers between `-111.99` and `1111.99` using a combination of
patterns and raw regular expression:

```golang
re := rex.New(
rex.Common.Raw(`^`),
rex.Helper.NumberRange(-111, 1111),
rex.Common.Raw(`\.[0-9]{2}$`),
rex.Common.RawVerbose(`
# RawVerbose is a synonym to Raw,
# but ignores comments, spaces and new lines.
\. # Decimal delimter.
[0-9]{2} # Only two digits.
$ # The end.
`),
).MustCompile()

// Produces:
// ^((?:\x2D(?:0|(?:[1-9])|(?:[1-9][0-9])|(?:10[0-9])|(?:11[0-1])))|(?:0|(?:[1-9])|(?:[1-9][0-9])|(?:[1-9][0-9][0-9])|(?:10[0-9][0-9])|(?:110[0-9])|(?:111[0-1])))\.[0-9]{2}$
```

> The style you prefer is up to you.
## Meme

<img alt="Drake Hotline Bling meme" width=350px src="_docs/meme.png" />

_The picture contains two frame fragments from [the video](https://www.youtube.com/watch?v=uxpDa-c-4Mc)._

## Documentation

```golang
import "github.com/hedhyw/rex/pkg/rex"

func main() {
rex.New(/* tokens */).MustCompile() // The same as `regexp.MustCompile`.
rex.New(/* tokens */).Compile() // The same as `regexp.Compile`.
rex.New(/* tokens */).String() // Get constructed regular expression as a string.
}
```

### Common

Common operators for core operations.

```golang
rex.Common.Raw(raw string) // Raw regular expression.
rex.Common.Text(text string) // Escaped text.
rex.Common.Class(tokens ...dialect.ClassToken) // Include specified characters.
rex.Common.NotClass(tokens ...dialect.ClassToken) // Exclude specified characters.
```

### Character classes

Single characters and classes, that can be used as-is, as well as childs to `rex.CommonClass` or `rex.CommonNotClass`.

```golang
rex.Chars.Begin() // `^`
rex.Chars.End() // `$`
rex.Chars.Any() // `.`
rex.Chars.Range('a', 'z') // `[a-z]`
rex.Chars.Runes("abc") // `[abc]`
rex.Chars.Single('r') // `r`
rex.Chars.Unicode(unicode.Greek) // `\p{Greek}`
rex.Chars.UnicodeByName("Greek") // `\p{Greek}`

rex.Chars.Digits() // `[0-9]`
rex.Chars.Alphanumeric() // `[0-9A-Za-z]`
rex.Chars.Alphabetic() // `[A-Za-z]`
rex.Chars.ASCII() // `[\x00-\x7F]`
rex.Chars.Whitespace() // `[\t\n\v\f\r ]`
rex.Chars.WordCharacter() // `[0-9A-Za-z_]`
rex.Chars.Blank() // `[\t ]`
rex.Chars.Control() // `[\x00-\x1F\x7F]`
rex.Chars.Graphical() // `[[:graph:]]`
rex.Chars.Lower() // `[a-z]`
rex.Chars.Printable() // `[ [:graph:]]`
rex.Chars.Punctuation() // `[!-/:-@[-`{-~]`
rex.Chars.Upper() // `[A-Z]`
rex.Chars.HexDigits() // `[0-9A-Fa-f]`
```

If you want to combine mutiple character classes, use `rex.Common.Class`:
```golang
// Only specific characters:
rex.Common.Class(rex.Chars.Digits(), rex.Chars.Single('a'))
// It will produce `[0-9a]`.

// All characters except:
rex.Common.NotClass(rex.Chars.Digits(), rex.Chars.Single('a'))
// It will produce `[^0-9a]`.
```

### Groups

Helpers for grouping expressions.

```golang
// Define a captured group. That can help to select part of the text.
rex.Group.Define(rex.Chars.Single('a'), rex.Chars.Single('b')) // (ab)
// A group that defines "OR" condition for given expressions.
// Example: "a" or "rex", ...
rex.Group.Composite(rex.Chars.Single('a'), rex.Common.Text("rex")) // (?:a|rex)
// Define non-captured group. The result will not be captured.
rex.Group.NonCaptured(rex.Chars.Single('a')) // (?:a)

// Define a group with a name.
rex.Group.Define(rex.Chars.Single('a')).WithName("my_name") // (?P<my_name>a)
```

## Flags

```golang
// TODO: https://github.com/hedhyw/rex/issues/31
```

### Repetitions
## FAQ

Helpers that specify how to repeat characters. They can be called on character class tokens.
1. **It is too verbose. Too much code.**

```golang
RepetableClassToken.Repeat().OneOrMore() // `+`
RepetableClassToken.ZeroOrMore() // `*`
RepetableClassToken.ZeroOrOne() // `?`
RepetableClassToken.EqualOrMoreThan(n int) // `{n,}`
RepetableClassToken.Between(n, m int) // `{n,m}`

// Example:
rex.Chars.Digits().Repeat().OneOrMore() // [0-9]+
rex.Group.Define(rex.Chars.Single('a')).Repeat().OneOrMore() // (a)+
```
More, but simpler code, fewer bugs.
Anyway, you can still use the raw regular expressions syntax in combination with helpers.
```golang
rex.New(
rex.Chars.Begin(),
rex.Group.Define(
// `Raw` can be placed anywhere in blocks.
rex.Common.Raw(`[a-z]+\d+[A-Z]*`),
),
rex.Chars.End(),
)
```
Or just raw regular expression with comments:
```golang
rex.Common.RawVerbose(`
^ # Start of the line.
[a-zA-Z0-9]+ # Local part.
@ # delimeter.
[a-zA-Z0-9\.]+ # Domain part.
$ # End of the line.
`)
```

## Helper
2. **Should I know regular expressions?**

Common regular expression patters that are ready to use.
> ⚠️ These patterns are likely to be changed in new versions.
It is better to know them in order to use this library most effectively.
But in any case, it is not strictly necessary.

```golang
rex.Helper.NumberRange(-5, 123) // Defines number range pattern without leading zeros.
rex.Helper.Phone() // Combines PhoneE164 and PhoneE123.
rex.Helper.PhoneE164() // +155555555
rex.Helper.PhoneE123() // Combines PhoneNationalE123 and PhoneInternationalE123.
rex.Helper.PhoneNationalE123() // (607) 123 4567
rex.Helper.PhoneInternationalE123() // +22 607 123 4567
rex.Helper.HostnameRFC952() // Hostname by RFC-952 (stricter).
rex.Helper.HostnameRFC1123() // Hostname by RFC-1123.
rex.Helper.Email() // Unquoted email pattern, it doesn't check RFC 5322 completely, due to high complexity.
rex.Helper.IP() // IPv4 or IPv6.
rex.Helper.IPv4() // 127.0.0.1 (without leading zeros)
rex.Helper.IPv6() // 2001:0db8:85a3:0000:0000:8a2e:0370:7334
rex.Helper.MD5Hex() // d41d8cd98f00b204e9800998ecf8427e
rex.Helper.SHA1Hex() // da39a3ee5e6b4b0d3255bfef95601890afd80709
rex.Helper.SHA256Hex() // e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
```
3. **Is it language-dependent? Is it transferable to other languages?**

## Examples
We can use this library only in Go. If you want to use any parts
in other places, then just call `rex.New(...).String()` and copy-paste
generated regular expression.

### Simple email validator
4. **What about my favourite `DSL`?**

Let's describe a simple email regular expression in order to show the basic functionality (there is a more advanced helper `rex.Helper.Email()`):
Every IDE has convenient auto-completion for languages. So all helpers
of this library are easy to use out of the box. Also, it is easier
to create custom parameterized helpers.

```golang
// We can define a set of characters and reuse the block.
customCharacters := rex.Common.Class(
rex.Chars.Range('a', 'z'), // `[a-z]`
rex.Chars.Upper(), // `[A-Z]`
rex.Chars.Single('-'), // `\x2D`
rex.Chars.Digits(), // `[0-9]`
) // `[a-zA-Z-0-9]`
5. **Is it stable?**

re := rex.New(
rex.Chars.Begin(), // `^`
customCharacters.Repeat().OneOrMore(),

// Email delimeter.
rex.Chars.Single('@'), // `@`

// Allow dot after delimter.
rex.Common.Class(
rex.Chars.Single('.'), // \.
customCharacters,
).Repeat().OneOrMore(),

// Email should contain at least one dot.
rex.Chars.Single('.'), // `\.`
rex.Chars.Alphanumeric().Repeat().Between(2, 3),
It is `0.X.Y` version, but there are some backward compatibility guarantees:
- `rex.Chars` helpers can change output to an alternative synonym.
- `rex.Common` helpers can be deprecated, but not removed.
- `rex.Group` some methods can be deprecated.
- `rex.Helper` can be changed with breaking changes due to specification complexities.
- The test coverage should be `~100%` without covering [test helpers](internal/test/test.go).
- Any breaking change will be prevented as much as possible.

rex.Chars.End(), // `$`
).MustCompile()
```
_All of the above may not be respected when upgrading the major version._

#### Simple composite
6. **I have another question. I found an issue. I have a feature request. I want to contribute.**

```golang
re := rex.New(
rex.Chars.Begin(),
rex.Group.Composite(
// Text matches exact text (symbols will be escaped).
rex.Common.Text("hello."),
// OR one or more numbers.
rex.Chars.Digits().Repeat().OneOrMore(),
),
rex.Chars.End(),
).MustCompile()

re.MatchString("hello.") // true
re.MatchString("hello") // false
re.MatchString("123") // true
re.MatchString("hello.123") // false
```

## Example match usage

```golang
re := rex.New(
// Define a named group.
rex.Group.Define(
rex.Helper.Phone(),
).WithName("phone"),
).MustCompile()

const text = `
E.164: +15555555
E.123.Intl: (607) 123 4567
E.123.Natl: +22 607 123 4567
`

submatches := re.FindAllStringSubmatch(text, -1)
// submatches[0]: +15555555
// submatches[1]: (607) 123 4567
// submatches[2]: +22 607 123 4567
```
Please, [create an issue](https://github.com/hedhyw/rex/issues/new?labels=question&title=I+have+a+question).

#### More examples
## License

More examples can be found here: [examples_test.go](examples_test.go).
- The library is under [MIT Lecense](LICENSE)
- [The gopher](_docs/gopher.png) is under [Creative Commons Attribution 3.0](https://creativecommons.org/licenses/by/3.0/) license. It was originally created by [Renée French](https://en.wikipedia.org/wiki/Ren%C3%A9e_French) and redrawed by me.
- [The meme](_docs/meme.png) contains two frame fragments from [the video](https://www.youtube.com/watch?v=uxpDa-c-4Mc).
Loading

0 comments on commit cd2ab1d

Please sign in to comment.