TripleTen QUANTIFIERS
Sun Apr 07 2024 22:57:21 GMT+0000 (Coordinated Universal Time)
Saved by @Marcelluki
Quantifiers In the previous lesson, we discussed sets and ranges. Now it's time to learn about quantifiers, which we can use to indicate how many times a character or expression must occur in order to be counted as a match. We'll also talk about some issues you might face when using quantifiers and how we can overcome those issues. Basic Quantifiers Let's say you want to find the word "millennium", but you forgot how many "L"s are in the word: const str = 'How many Ls in the word "millennium"?'; Save To do this sort of search, you'll need to write a regular expression that will search for a string that starts with 'mi', followed by one or two 'l's, and ends with 'ennium'. One Occurrence to Infinite Occurrences — The + Quantifier If you place the + quantifier to the right of a specific character, the engine will look for all words in which this character occurs one or more times. const str = 'The correct spelling of the word "millennium" is with two Ls'; const regex = /mil+ennium/; // this regular expression will find both variants: with one "L" and with two "L"s str.match(regex); // [ 'millennium' ] Save Zero Occurrences to Infinite Ones — The * Quantifier Another quantifier used for searching repeated characters is the asterisk quantifier — *, which works in a similar fashion to + . Let's take a moment to contrast these two similarly behaving quantifiers. A character followed by + must be placed in a string. This quantifier is telling the engine "Look for this specific character, and after it, there could be any number of similar characters." On the other hand, if we put the * quantifier after a certain character, our regular expression can find a match even if that character isn't present. Consider the following example: const exc = 'artist'; const esc = 'artiste'; const regex = /artiste*/; // the letter "e" may or may not occur exc.match(regex); // [ 'artist' ] esc.match(regex); // [ 'artiste' ] Save To reiterate, if you place + after a character, the engine will look for matches where the character occurs at least once. If you place * after a character, the engine instead looks for matches with zero or more occurrences. So, in cases like the example above, you can use the * quantifier to create regular expressions that match different spellings of a certain word. The * quantifier tells the engine that the preceding character may or may not be included in the word. An Optional Character — the ? Quantifier There's one more way we can make a character optional, which is by using the ? quantifier. The asterisk * can work with any amount of characters, but the question mark will only match either zero occurrences or one occurrence of your chosen character: /* makes the letter u optional and matches both spelling variants: favourite and favorite. */ const regex = /favou?rite/g; const str = 'favourite or favorite'; str.match(regex); // ['favourite', 'favorite'] Save Either One Character or the Other — The | Quantifier This quantifier allows us to create "forks" of our characters and search for alternatives in our regular expression. For example, if you write a|b, you're telling the engine that either a or b is fine: const someSymbol = /cent(er|re)/g const str = 'center or centre'; console.log(str.match(someSymbol)); // ['center', 'centre'] Save A good situation for using the | quantifier is when you have to work with texts that are written in both American and British English as there are often a lot of subtle spelling differences between the two. Managing The Number of Repetitions — The {} Quantifier In order to search for a group of repeated characters, you can usually list them in a regular expression: const regionCode = /\d\d\d/; const phoneNumber = 'My phone number: +1(555)324-41-5'; phoneNumber.match(regionCode); // [ '555' ] Save The quantifier {} enables us to avoid going through this process. All you have to do is specify the number of matches inside curly braces: const regionCode = /\d{3}/; const phoneNumber = 'My phone number: +1(555)324-41-5'; phoneNumber.match(regionCode); // [ '555' ] Save In addition to indicating an exact number of repetitions, you can also create a range. For example, we can do this when we need to find matches ranging from 2 to 5 repetitions: const str = 'this much, thiiis much, thiiiiiiis much'; const regex = /thi{2,5}s/; str.match(regex); // [ 'thiiis' ] // in the word "this" the letter "i" occurs only once // and in the word "thiiiiiiis" the letter "i" occurs more than 5 times Save Additionally, you can omit the maximum number of repetitions. For instance, you can set the quantifier to search a range from one repetition of a character to an infinite number of repetitions. To do that, omit the second number inside curly brackets, while keeping the comma. const someSymbol = /a{1,}/g; const str = 'alohaa'; console.log(str.match(someSymbol)); // ['a', 'aa'] Save Lazy and Greedy Quantifiers When deciding what to return, a regular expression sometimes faces a choice. For example, let's imagine we want to find a string that both starts and ends with "e", using the quantifier {2,11} between the opening and closing letters to limit our search: const str = 'Everyone else knows that book, you know'; const someSymbols = /e.{2,11}e/gi; console.log(str.match(someSymbols)); // ['Everyone else'] Save The match() method returned the string 'Everyone else'. But the word 'Everyone' could have also qualified here. After all, it starts with "e", ends in "e", and has 6 letters between the first letter and the last one. What happened? Apparently, the engine chose the longer match over the shorter one. That's why such behavior is called greedy matching. In fact, greediness is the default behavior of the {} quantifier: const str = 'eeeeeeeee'; // 9 repetitions of the letter "e" const regex = /e{1,11}/; str.match(regex); // [ 'eeeeeeeee' ] — the greedy quantifier found all "e"s Save Sometimes, our matches are greedy. Hey, nobody's perfect! The flip side of this greediness is laziness. Simply stated, a lazy quantifier will find a short match instead of a long one. To make {} lazy, we just need to put a question mark after it: {}?: const someSymbols = /e.{2,11}?e/gi; const str = 'Everyone else knows that book, you know'; console.log(str.match(someSymbols)); // ['Everyone', 'else'] /* the lazy quantifier has found the shortest matches this time */ Save You can also make the + quantifier lazy by adding ? in the same way: const someSymbols = /e.+?n/gi; const str = 'Ed\'s son can swim like a fish'; console.log(str.match(someSymbols));// [ "Ed's son" ] Save To emphasize the difference again, notice what happens to the code above when we remove the ?. Instead of 'Ed\'s son', the larger possible match, 'Ed\'s son can', is returned instead: const someSymbols = /e.+n/gi; const str = 'Ed\'s son can swim like a fish'; console.log(str.match(someSymbols));// [ "Ed's son can" ] Save Quantifier Matches + matches a character 1 or more times in a row. * matches a character 0 or more times in a row. ׀ matches one of two characters (the one on the left of ׀ or the one on the right) ? matches a character 0 or 1 times. {} matches an exact number of occurrences or a range of repeated characters. Remember that quantifiers can be greedy and lazy. If a quantifier has to choose between a longer and a shorter string belonging to a regular expression, a lazy quantifier will go for the shorter match, while a greedy one ends up returning the longer string: const str = 'free from worries'; const regexLazy = /fr.+?[es]/; // lazy quantifier const regexGreedy = /fr.+[es]/; // greedy quantifier console.log(str.match(regexLazy)); // [ 'free' ] console.log(str.match(regexGreedy)); // [ 'free from worries' ] Save Before moving on to the tasks, let's take a 2 question quiz to help review quantifiers themselves, and the difference between greedy and lazy quantifiers
Comments