Pandas Data Frame Removing The First Row Of Every Numbers
So, basically I have a data frame that has the first column looks like this: #1 #2 #2 #3 #3 #3 #3 #4 #4 #5 As you can see, first column is consisting of randomly repeated numbers
Solution 1:
Let's assume you have a dataframe with two columns named df
Setup
col1 = """#1
#2
#2
#3
#3
#3
#3
#4
#4
#5""".splitlines()
df = pd.DataFrame(dict(col1=col1, col2=3.14))
df
col1 col2
0#1 3.141#2 3.142#2 3.143#3 3.144#3 3.145#3 3.146#3 3.147#4 3.148#4 3.149#5 3.14
Solution
We can use Numpy's unique
function with the return_index
set to True
. What that does is return the position of the first instance of each unique value. We then use that to identify index values and drop them.
_, i = np.unique(df.col1.values, return_index=True)
df.drop(df.index[i]).assign(col1=lambda d: d.col1.str[1:])
col1 col2
223.14433.14533.14633.14843.14
Solution 2:
Use duplicated
with boolean indexing
, last remove #
by position with str[1:]
or by str.strip
:
print (df)
a
0 #1
1 #2
2 #2
3 #3
4 #3
5 #3
6 #3
7 #4
8 #4
9 #5
df = df.loc[df['a'].duplicated(), 'a'].str[1:]
print (df)
2 2
4 3
5 3
6 3
8 4
Name: a, dtype: object
Or:
df = df.loc[df['a'].duplicated(), 'a'].str.strip('#')
print (df)
2 2
4 3
5 3
6 3
8 4
Name: a, dtype: object
Detail:
print (df['a'].duplicated())
0False1False2True3False4True5True6True7False8True9False
Name: a, dtype: bool
EDIT:
df = df[df['a'].duplicated()]
df['a'] = df['a'].str.strip('#')
Post a Comment for "Pandas Data Frame Removing The First Row Of Every Numbers"