Character Classes

Character Classes

Character Classes


Character classes are used for distinguishing characters like distinguishing between digits and letters.

Let’s start with

a practical case. Imagine you have a phone number like +3(522) -865-42-76, and wish to turn it into pure numbers (35228654276). To meet that goal, it is necessary to find and remove everything that’s not a number.Character classes are there to help you with that.

So, a character class can be described as a specific notation that corresponds to any symbol from a certain set.

We will start with the “digit” class. It should be written as \d and matches any single digit.

In the example below, let’s find the first digit:

let str = "+3(522)865-42-76";

let regexp = /\d/;

console.log(str.match(regexp)); // 3

With no flag g, the regular expression searches for the first match, which is the first \d.

Adding the g flag will enable finding all the digits, like this:

let str = "+3(522)865-42-76";let regexp = /\d/g; 

console.log(str.match(regexp)); // array of matches: 3,5,2,2,8,6,5,4,2,7,6

// the digits-only phone number of them:

console.log(str.match(regexp).join('')); // 35228654276

So, it is a character class for digits. But there exist other character classes, too.

The most used character classes are as follows:

  • \d ( comes from digit): a digit (a character from 0 to 9).
  • \s ( comes from space): a space symbol. It contains \t (tabs),\n (newlines), and other characters (\v, \f,\r ).
  • \w (comes from word): it is either a letter of the Latin alphabet, a digit, or an underscore (_). Non-latin letters don't belong to this class.

A regular expression can include regular symbols, as well as character classes.

Let’s see an example where CSS\d corresponds to a string CSS with a digit following it:

let str = "It is CSS3?";

let regexp = /CSS\d/

console.log(str.match(regexp)); // CSS3

Multiple character classes can be used, like this:

alert("It is HTML5!".match(/\s\w\w\w\w\d/)); // ' HTML5'

Inverse Classes

There is an “inverse class” for every character class, denoted with the same but uppercase letter.

“Inverse” means that it corresponds to all other characters:

  • \D - non-digit. It accepts any character, except \d (for instance, a letter).
  • \S - non-space. Accepts any character, except \s (for instance, a letter).
  • \W- non-worldly character. Accepts anything, except \w ( non-Latin letter or space).

A Dot

A dot (.) is considered a special character class corresponding to “any character except a newline”.

The example will look like this:

console.log("W".match(/./)); // W

In the example below, the dot is in the middle of a regexp:

let regexp = /HTM.5/;

console.log("HTML5".match(regexp)); // HTML5

console.log("HTM-5".match(regexp)); // HTM-5

console.log("HTM 5".match(regexp)); // HTM 5(space is a character, too)

So, the dot is considered “any character”, but not the “absence of a character”.

There should be a character for matching it, like here:

console.log("HTM5".match(/HTM.5/)); // null, no match, as there's no character for the dot

A dot doesn’t correspond to the newline character \n by default.

For example, the regexp A.B corresponds to A, and then B with any character between them, except for an \n newline, like this:

console.log("W\nD".match(/W.D/)); // null (no match)

There are circumstances when one wants a dot to mean “any character”, including a newline.

The flag s is used for that. In case a regexp has it, then a dot corresponds literally to any character, like this:

console.log("W\nD".match(/W.D/s)); //W\nD (match!)

It is important to pay special attention to the spaces. For example, the strings 1-5 and 1 - 5 are similar to each other. But, in case a regexp doesn’t take spaces into account, it might not work.

For finding the digits, separated by a hyphen, you can act like this:

console.log("1 - 5".match(/\d-\d/)); // null, no match!

Now, let’s fix it by adding spaces in the regular expression \d - \d, like here:

console.log("1 - 5".match(/\d - \d/)); // 1 - 5, now it works

// or we can use \s class:

console.log("1 - 5".match(/\d\s-\s\d/)); // 1 - 5, also works

A space is considered a character. In importance, it is equal to any other character. You can add or remove spaces from a regexp, expecting to work the same way. That is, in a regexp all the characters matter.

Reactions

Post a Comment

0 Comments

close