Skip to content Skip to sidebar Skip to footer

BeautifulSoup - Extracting Texts Within One Class

I'm trying to extract texts from this webpage below:
Category1: Text1 I want > Category2:

Solution 1:

Since the id of an element is unique, you can find the first <a> tag using id="category1". To find the next <a> tag, you can use find_next() method.

html = '''<div class="MYCLASS">Category1: <a id=category1 href="SomeURL" >Text1 I want</a> &gt; Category2: <a href="SomeURL" >Text2 I want</a></div>'''
soup = BeautifulSoup(html, 'lxml')

a_tag1 = soup.find('a', id='category1')
print(a_tag1)    # or use `a_tag1.text` to get the text
a_tag2 = a_tag1.find_next('a')
print(a_tag2)

Output:

<a href="SomeURL" id="category1">Text1 I want</a>
<a href="SomeURL">Text2 I want</a>


Solution 2:

You need a your code a little

from bs4 import BeautifulSoup
soup = BeautifulSoup("<div class=\"MYCLASS\">Category1: <a id=category1 href=\"SomeURL\" > \
Text1 I want</a> &gt; Category2: <a href=\"SomeURL\" >Text2 I want</a></div> \
I","lxml")
for div in soup.find_all('div', class_='MYCLASS'):
    for url in soup.find_all('a'):
        print(url.text.strip())

Remove id for 'a' tag and run the same code.

If you need text of specify ids, you need to know the ids.

ids = [id1,id2]
for div in soup.find_all('div', class_='MYCLASS'):
    for id in ids:
        for url in soup.find_all('a',id=id):
            print(url.text.strip())

Post a Comment for "BeautifulSoup - Extracting Texts Within One Class"