Beautifulsoup - Fetching Text Either Side Of A Br Tag
I have unfortunately become stuck with the following problem: 'TEXT ONE' 'TEXT TWO' I need text one and text two separately. I
Solution 1:
I would avoid relying on the presence of the br
element and would instead locate all the text nodes inside the a
:
In [1]: from bs4 import BeautifulSoup
In [2]: html = """ <a href="someurl">
...: "TEXT ONE"
...: <br>
...: "TEXT TWO"
...: </a>"""
In [3]: soup = BeautifulSoup(html, "html.parser")
In [4]: [item.strip() for item in soup.a(text=True)]
Out[4]: ['"TEXT ONE"', '"TEXT TWO"']
Note that a(text=True)
is a short version of a.find_all(text=True)
.
You can, of course, unpack it into separate variables if needed:
In [5]: text_one, text_two = [item.strip() for item in soup.a(text=True)]
In [6]: text_one
Out[6]: '"TEXT ONE"'
In [7]: text_two
Out[7]: '"TEXT TWO"'
Solution 2:
You could use .previousSibiling
and .nextSibling
attributes after finding the br
tag:
>>> container.a.find("br").previousSibling
' \n"TEXT ONE"\n '
>>> container.a.find("br").nextSibling
'\n "TEXT TWO"\n '
Solution 3:
You can do the same in several ways. Here is another way:
from bs4 import BeautifulSoup
content='''
<a href="someurl">
"TEXT ONE"
<br>
"TEXT TWO"
</a>
'''
soup = BeautifulSoup(content,'lxml')
for items in soup.select('a'):
elem = [' '.join(item.split()) for item in items.strings]
print(elem)
Output:
['"TEXT ONE"', '"TEXT TWO"']
Post a Comment for "Beautifulsoup - Fetching Text Either Side Of A Br Tag"