Skip to content Skip to sidebar Skip to footer

Regex For Name Extraction On Text File

I've got a plain text file containing a list of authors and abstracts and I'm trying to extract just the author names to use for network analysis. My text follows this pattern and

Solution 1:

If you are trying to match the names, I would try to match the entire substring instead of part of it.

You could use the following regular expression and modify it if needed.

>>> regex = re.compile(r'\b([A-Z][a-z]+(?: [A-Z]\.)? [A-Z][a-z]+),')
>>> print regex.findall(text)
['David L. Gallimore', 'Katherine Garduno', 'Russell C. Keller']

Working Demo | Explanation

Solution 2:

try this one

[A-Za-z]* ?([A-Za-z]+.) [A-Za-z]*(?:,+)

It makes the middle name optional, plus it excludes the comma from the result by putting it in a non capturing group

Post a Comment for "Regex For Name Extraction On Text File"