Skip to content Skip to sidebar Skip to footer

Python - Iterate Over A List Of Attributes

I have a feature in my data set that is a pandas timestamp object. It has (among many others) the following attributes: year, hour, dayofweek, month. I can create new features base

Solution 1:

Don't use .apply here, pandas has various built-in utilities for handling datetime objects, use the dt attribute on the series objects:

In [11]: start = datetime(2011, 1, 1)
    ...: end = datetime(2012, 1, 1)
    ...:

In [12]: df = pd.DataFrame({'data':pd.date_range(start, end)})

In [13]: df.dtypes
Out[13]:
data    datetime64[ns]
dtype: object

In [14]: df['year'] = df.data.dt.year

In [15]: df['hour'] = df.data.dt.hour

In [16]: df['month'] = df.data.dt.month

In [17]: df['dayofweek'] = df.data.dt.dayofweek

In [18]: df.head()
Out[18]:
        data  year  hour  month  dayofweek
02011-01-01201101512011-01-02201101622011-01-03201101032011-01-04201101142011-01-052011012

Or, dynamically as you wanted using getattr:

In [24]:df=pd.DataFrame({'data':pd.date_range(start,end)})In [25]:nomtimes= ["year", "hour", "month", "dayofweek"]
    ...:In [26]:df.head()Out[26]:data02011-01-0112011-01-0222011-01-0332011-01-0442011-01-05In [27]:for t in nomtimes:...:df[t]=getattr(df.data.dt,t)...:In [28]:df.head()Out[28]:datayearhourmonthdayofweek02011-01-01  2011     01512011-01-02  2011     01622011-01-03  2011     01032011-01-04  2011     01142011-01-05  2011     012

And if you must use a one-liner, go with:

In [30]:df=pd.DataFrame({'data':pd.date_range(start,end)})In [31]:df.head()Out[31]:data02011-01-0112011-01-0222011-01-0332011-01-0442011-01-05In [32]:df=df.assign(**{t:getattr(df.data.dt,t)fortinnomtimes})In [33]:df.head()Out[33]:datadayofweekhourmonthyear02011-01-01          501201112011-01-02          601201122011-01-03          001201132011-01-04          101201142011-01-05          2012011

Solution 2:

You just need getattr():

df[i] = df["timeStamp"].apply(lambda x : getattr(x, i))

Solution 3:

operator.attrgetter

You can extract attributes in a loop:

from operator import attrgetter

for i in nomtimes:
    df[i] = df['timeStamp'].apply(attrgetter(i))

Here's a complete example:

df = pd.DataFrame({'timeStamp': ['2018-05-05 15:00', '2015-01-30 11:00']})
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

nomtimes = ['year', 'hour', 'month', 'dayofweek']

for i in nomtimes:
    df[i] = df['timeStamp'].apply(attrgetter(i))

print(df)

            timeStamp  year  hour  month  dayofweek
0 2018-05-05 15:00:00  2018    15      5          5
1 2015-01-30 11:00:00  2015    11      1          4

Your code will not work because you are attempting to pass a string rather than extracting an attribute by name. Yet this isn't what's happening: the syntax does not feed the string but tries to access i directly, as demonstrated in your first example.

Getting rid of the for loop

You might ask if there's any way to extract all attributes from a datetime object in one go rather than sequentially. The benefit of attrgetter is you can specify multiple attributes directly to avoid a for loop altogether:

attributes = df['timeStamp'].apply(attrgetter(*nomtimes))
df[nomtimes] = pd.DataFrame(attributes.values.tolist())

Using dt accessor instead of apply

But pd.Series.apply is just a thinly veiled loop. Often, it's not necessary. Borrowing @juanpa.arrivillaga's idea, you an access attributes directly via the pd.Series.dt accessor:

attributes = pd.concat(attrgetter(*nomtimes)(df['timeStamp'].dt), axis=1, keys=nomtimes)
df = df.join(attributes)

Post a Comment for "Python - Iterate Over A List Of Attributes"