Skip to main content

Regular Expression

Regular expression is a simpler way to describe a regular language.

Regular Expression

Definition

Regular expression is an expression that can yield a language (later proved to be a regular language). We note L()\mathcal{L}(\cdot) as the language generated by the regular expression \cdot.

Regular expression is defined as,

Any sentence vv is a valid regular expression, L(v)={v}\mathcal{L}(v) = \{ v \}.

The * of a regular expression rr is still a regular expression, L(r)=L(r)\mathcal{L}(r^*) = \mathcal{L}(r)^*

The ++ of a regular expression rr is still a regular expression, L(r+)=L(r)+\mathcal{L}(r^+) = \mathcal{L}(r)^+

If rr is a regular expression, rnr^n is also a regular expression, L(rn)=L(r)n\mathcal{L}(r^n) = \mathcal{L}(r)^n

If r1r_1 and r2r_2 are regular expressions, then r1r2r_1 r_2 is also a regular expression, L(r1r2)=L(r1)L(r2)\mathcal{L}(r_1r_2) = \mathcal{L}(r_1)\mathcal{L}(r_2).

If r1r_1 and r2r_2 are regular expressions, then r1+r2r_1 + r_2 is also a regular expression, L(r1+r2)=L(r1)L(r2)\mathcal{L}(r_1 + r_2) = \mathcal{L}(r_1) \cup \mathcal{L}(r_2)

Example

For example, the regular expression,

(0+1)1(0+1)(0+1)^*1(0+1)^*

Yields the language,

{ss{0,1}s contains the symbol 1}\{s | s \in \{0,1\}^* \land s \text{ contains the symbol 1}\}

The expression

0+(0+1)0^+(0 + 1)^*

Yields the language,

{ss{0,1}s starts with the symbol 0}\{s | s \in \{0,1\}^* \land s \text{ starts with the symbol 0}\} (0+1)13(0+1)(0+1)^*1^3(0+1)^*

This regular expression yields the language,

{ss{0,1}s contains three consecutive 1s}\{s | s \in \{0,1\}^* \land s \text{ contains three consecutive 1s}\}

Regular Expression can be Represented by e-NFA

We can convert regular expression to e-NFA based on patterns above.

We will firstly simplify the expression by using as little symbols as possible, so that we need to remember fewer patterns.

  • We can eliminate ana^n with aaa\ldots a (nn consecutive aa)
  • We can eliminate a+a^+ with aaaa^*

Now we only have three patterns,

  • r1+r2r_1+r_2
  • r1r2r_1r_2
  • rr^*
  • vv

Sentence

Obviously, if a regular expression is a sentence vv, we can convert it to an e-NFA directly, with,

The qsqs and qfqf are the two proximity. Of course, we can remove them to get the same e-NFA, but during the conversion from regular expression to e-NFA, we always keep two nodes with empty transition at the proximity so that later it will be convenient for joining the patterns.

Star Closure

We note,

As a sub graph that accepts the language of the regular expression rr.

And the e-NFA for rr^* is,

Please note that all transition here are empty transition.

Concatenation

Union

Example

Let's convert,

r=0+(00+11)r = 0^+(00 + 11)^*

To e-NFA.

We first remove the ++,

r=00(00+11)r = 00^*(00 + 11)^*

Then we factor it into,

r=r0r1(r2+r3)r = r_0r_1^*(r_2+r_3)^*

Where r0=0r_0 = 0, r1=0r_1 = 0, r2=00r_2 = 00, r3=11r_3 = 11.

When can convert r0r_0 to,

r1r_1^* to,

r2+r3r_2+r_3 to,

(r2+r3)(r_2+r_3)^* to,

Finally we concatenate them,

We can remove some obviously redundant nodes,

We can further reduce this into NFA, then DFA, minimized DFA.

e-NFA can Test Regular Expressions

tip

DFA and NFA are special e-NFA.

If we get an e-NFA, we can convert it to a regular expression. The method is step-by-step pattern matching.

We note,

Stands for T({q0},v)={q1}vL(r)\mathcal{T}(\{q_0\}, v) = \{q_1\} \, v \in \mathcal{L}(r).

For convenience.

Please note that, during the process, we need to mark all Fλ\mathcal{F}^\lambda in each step as accept states.

Patterns

Typical patterns contains,

Concatenation

Into,

Union

Into,

Star Closure

Into,

Example

Let's take a look back at the e-NFA we got in the previous section,

We can merge some obvious patterns,

Two parallel edges matches our previous union pattern, so,

We can remove an empty transition,

Then,

Finally,

Thus the given e-NFA can be converted into 00(00+11)0^* 0(00 + 11)^*.

This seems a bit different from 00(00+11)00^*(00+11)^*, but they generates the same language.