Fetching Lawyers Details From A Set Of Urls Using Bs4 In Python
I am an absolute beginner to Web Scraping using Python and know very little about programming in Python. I am just trying to extract the information of the lawyers in the Tennessee
Solution 1:
You need to extract those details by visiting each lawyer's page and using the appropriate selectors. Something like:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
records = []
final = []
with requests.Session() as s:
res = s.get('https://attorneys.superlawyers.com/tennessee/', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')
cities = [item['href'] for item in soup.select('#browse_view a')]
for c in cities:
r = s.get(c)
s1 = bs(r.content,'lxml')
categories = [item['href'] for item in s1.select('.three_browse_columns:nth-of-type(2) a')]
for c1 in categories:
r1 = s.get(c1)
s2 = bs(r1.content,'lxml')
lawyers = [item['href'].split('*')[1] if '*' in item['href'] else item['href'] for item in s2.select('.indigo_text .directory_profile')]
final.append(lawyers)
final_list = {item for sublist in final for item in sublist}
for link in final_list:
r = s.get(link)
soup = bs(r.content, 'lxml')
name = soup.select_one('#lawyer_name').text
firm = soup.select_one('#firm_profile_page').text
address = ' '.join([string for string in soup.select_one('#poap_postal_addr_block').stripped_strings][1:])
practices = ' '.join([item.text for item in soup.select('#pa_list li')])
row = [name, firm, address, practices]
records.append(row)
df = pd.DataFrame(records, columns = ['Name', 'Firm', 'Address', 'Practices'])
print(df)
df.to_csv(r'C:\Users\User\Desktop\Lawyers.csv', sep=',', encoding='utf-8-sig',index = False )
Post a Comment for "Fetching Lawyers Details From A Set Of Urls Using Bs4 In Python"