Skip to content Skip to sidebar Skip to footer

Python/pandas Regex For A Wide Variety Of Dates

I have a task to extract a wide variety of dates from a text file using Python. As per the requirements, the following date formats must be properly extracted from the text file:

Solution 1:

Code

See regex in use here

\d+/\d+(?:/\d+)?|(?:\d+ )?(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)[.,]?(?:-\d+-\d+| \d+(?:th|rd|st|nd)?,? \d+| \d+)|\d{4}

Results

Input

04/20/2009; 04/20/09; 4/20/09; 4/3/09 Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009 Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009 Feb 2009; Sep 2009; Oct 2010 (shall be parsed to 02/01/2009, 09/01/2009 etc) 6/2008; 12/2009 (shall be parsed to 06/01/2008 etc). 2009; 2010 (shall be parsed to 01/01/2009 and 01/01/2010) AFeb 1977: Symmes Hospital\n NV fire fighter died Sep 2007 while working. Was friend from deployment to San Marino and trainings for years prior. Still troubling to pt. Didn't go to his funeral. Spiritual/Religion: 's Cathy Bowers is a 50 yo single Caucasian female who presents to the ANH Eating Disorders Department for an evaluation and treatment recommendations for low weight. She shared that she has recently lost a great deal of weight and is having difficulty meeting her calorie needs due to difficulties with gagging/swallowing, and aversions to specific food textures. Specifically, since May 2012, she has lost 18 lbs, going from 128 lbs (BMI = 19.5, normal range) to 110.2 lbs (BMI = 16.8, underweight range) at a height of 5\'8" tall. She has had amenorrhea for 2 months. Her current weight is her lowest since high school, when she was a model and weighed 98 lbs (BMI = 14.9, underweight range). At that time, she had amenorrhea, felt pressure to be thin in order to keep her job, and most likely met criteria for frank anorexia nervosa nervosa-restricting type.

Output

Below shows matches only.

04/20/200904/20/094/20/094/3/09Mar-20-2009Mar20,2009March20,2009Mar.20,2009Mar20200920Mar200920March200920Mar.200920March,2009Mar20th,2009Mar21st,2009Mar22nd,2009Feb2009Sep2009Oct201002/01/200909/01/20096/200812/200906/01/20082009201001/01/200901/01/2010Feb1977Sep2007May2012

Explanation

  • Match either of the following options
    • \d+/\d+(?:/\d+)? Match one or more digits followed by / followed by one or more digits, followed by the possibility of another / with one or more digits
    • (?:\d+ )?(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)[.,]?(?:-\d+-\d+| \d+(?:th|rd|st|nd)?,? \d+| \d+) Match a possibility of one or more digits followed by a space, followed by month names (or their short forms), followed by the possibility of a dot . or comma ,, followed by either - digits - digits; or space digits with the possibility of th, rd, st, or nd and the possibility of a following comma, then a space and more digits; or a space followed by a digit
    • \d{4} Match any digit 4 times (this is for single years, but may catch other valid numbers, you may need to change this to your needs. Adding word boundaries as \b\d{4}\b might be a good first step.

Post a Comment for "Python/pandas Regex For A Wide Variety Of Dates"