Regex For Name Extraction On Text File
I've got a plain text file containing a list of authors and abstracts and I'm trying to extract just the author names to use for network analysis. My text follows this pattern and
Solution 1:
If you are trying to match the names, I would try to match the entire substring instead of part of it.
You could use the following regular expression and modify it if needed.
>>> regex = re.compile(r'\b([A-Z][a-z]+(?: [A-Z]\.)? [A-Z][a-z]+),')
>>> print regex.findall(text)
['David L. Gallimore', 'Katherine Garduno', 'Russell C. Keller']
Solution 2:
try this one
[A-Za-z]* ?([A-Za-z]+.) [A-Za-z]*(?:,+)
It makes the middle name optional, plus it excludes the comma from the result by putting it in a non capturing group
Post a Comment for "Regex For Name Extraction On Text File"