Most Efficient Way To Un-dummy Variables In Pandas Df
So in the screenshot below, we have 3 different energy sites, ID01, ID18, and ID31. They're in a dummy variable type of format, and for visualization purposes I want to just create
Solution 1:
Setup
data = pd.DataFrame([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 1, 0]
], columns=['ID01', 'ID18', 'ID31']).assign(A=1, B=2)
data
ID01 ID18 ID31 A B
010012101012200112310012401012
dot
product with strings and objects.
This works if these are truly dummy values 0
or 1
def undummy(d):
return d.dot(d.columns)
data.assign(Site=data.filter(regex='^ID').pipe(undummy))
ID01 ID18 ID31 A B Site
010012 ID01
101012 ID18
200112 ID31
310012 ID01
401012 ID18
argmax
slicing
This works but can produce unexpected results if data is not as represented in question.
def undummy(d):
return d.columns[d.values.argmax(1)]
data.assign(Site=data.filter(regex='^ID').pipe(undummy))
ID01 ID18 ID31 A B Site
010012 ID01
101012 ID18
200112 ID31
310012 ID01
401012 ID18
Post a Comment for "Most Efficient Way To Un-dummy Variables In Pandas Df"