Combining Regular Expressions In Python - \w And \s

January 21, 2024 Post a Comment

I want my code to only return the special characters ['.', '*', '=', ','] I want to remove all digits/alphabetical characters ('\W') and all white spaces ('\S') import re original

Solution 1:

You were close, but you need to specify these escape sequences inside a character class.

re.findall(r'[^\w\s]', original_string)
# ['.', '*', '=', ',']

Note that the caret ^ indicates negation (i.e., don't match these characters).

Alternatively, instead of removing what you don't need, why not extract what you do?

re.findall(r'[.*=,]', original_string)
# ['.', '*', '=', ',']

Solution 2:

Here, we can also add our desired special chars in a [], swipe everything else, and then collect only those chars:

([\s\S].*?)([.*=,])?

Python Test

# coding=utf8# the above tag defines encoding for this document and is for Python 2.x compatibilityimport re

regex = r"([\s\S].*?)([.*=,])?"

test_str = "John is happy. He owns 3*4=12, apples"

subst = "\\2"# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

JavaScript Demo

const regex = /([\s\S].*?)([.*=,])?/gm;
const str = `John is happy. He owns 3*4=12, apples`;
const subst = `$2`;

// The substituted value will be contained in the result variableconst result = str.replace(regex, subst);

console.log('Substitution result: ', result);

RegEx

If this wasn't our desired expression, we can modify/change it in regex101.com.

RegEx Circuit

We can also visualize expressions in jex.im:

Demo

Solution 3:

The regular expression \W\S matches a sequence of two characters; one non-word, and one non-space. If you want to combine them, that's [^\w\s] which matches one character which does not belong to either the word or the whitespace group.

However, there are many characters which are not one of the ones you enumerate which match this expression. If you want to remove characters which are not in your set, the character class containing exactly all those characters is simply [^.*=,]

Perhaps it's worth noting that inside [...] you don't need to (and in fact should not) backslash-escape e.g. the literal dot. By default, a character class cannot match a newline character, though there is an option re.DOTALL to change this.

If you are trying to extract and parse numerical expressions, regex can be a useful part of the lexical analysis, but you really want a proper parser.

Introduction to Python Course