Skip to content Skip to sidebar Skip to footer

Pandas Data Frame Removing The First Row Of Every Numbers

So, basically I have a data frame that has the first column looks like this: #1 #2 #2 #3 #3 #3 #3 #4 #4 #5 As you can see, first column is consisting of randomly repeated numbers

Solution 1:

Let's assume you have a dataframe with two columns named df

Setup

col1 = """#1
#2
#2
#3
#3
#3
#3
#4
#4
#5""".splitlines()

df = pd.DataFrame(dict(col1=col1, col2=3.14))

df

  col1  col2
0#1  3.141#2  3.142#2  3.143#3  3.144#3  3.145#3  3.146#3  3.147#4  3.148#4  3.149#5  3.14

Solution We can use Numpy's unique function with the return_index set to True. What that does is return the position of the first instance of each unique value. We then use that to identify index values and drop them.

_, i = np.unique(df.col1.values, return_index=True)
df.drop(df.index[i]).assign(col1=lambda d: d.col1.str[1:])

  col1  col2
223.14433.14533.14633.14843.14

Solution 2:

Use duplicated with boolean indexing, last remove # by position with str[1:] or by str.strip:

print (df)
    a
0  #1
1  #2
2  #2
3  #3
4  #3
5  #3
6  #3
7  #4
8  #4
9  #5

df = df.loc[df['a'].duplicated(), 'a'].str[1:]
print (df)
2    2
4    3
5    3
6    3
8    4
Name: a, dtype: object

Or:

df = df.loc[df['a'].duplicated(), 'a'].str.strip('#')
print (df)
2    2
4    3
5    3
6    3
8    4
Name: a, dtype: object

Detail:

print (df['a'].duplicated())
0False1False2True3False4True5True6True7False8True9False
Name: a, dtype: bool

EDIT:

df = df[df['a'].duplicated()]
df['a'] = df['a'].str.strip('#')

Post a Comment for "Pandas Data Frame Removing The First Row Of Every Numbers"