While Web Scraping For A Table In Python, An Empty Table Is Returned
Solution 1:
Very simple - it's because there's an extra space in the class you're searching for.
If you change the class to g-summary-table svelte-2wimac
, the tags should be correctly returned.
The following code should work:
import requests
from bs4 import BeautifulSoup
#
url = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', class_="g-summary-table svelte-2wimac")
print(table)
I've also done similar scraping on the NYTimes interactive website, and spaces can be very tricky. If you added an extra space or missed one, an empty result is returned.
If you cannot find the tags, I would recommend printing the entire document first using print(soup.prettify())
and find the desired tags you plan to scrape. Make sure you copy the exact text of the class name from the contents printed by BeautifulSoup.
Solution 2:
As an alternative, if you want to download the data in json format, then read into pandas, you can do this. same starting code from above and working off the soup object
There are several apis that are available (below are three), but pulled out of the html like:
import re
import pandas as pd
latest_dataset = soup.find(string=re.compile('latest')).splitlines()[2].split('"')[1]
requests.get(latest_dataset).json()
latest_timeseries = soup.find(string=re.compile('timeseries')).splitlines()[2].split('"')[3]
requests.get(latest_timeseries).json()
allwithrate = soup.find(string=re.compile('all_with_rate')).splitlines()[2].split('"')[1]
requests.get(allwithrate).json()
pd.DataFrame(requests.get(allwithrate).json())
output of the last one
geoid location last_updated total_vaccinations people_vaccinated display_name ... Region IncomeGroup country gdp_per_cap vaccinations_rate people_fully_vaccinated
0 MUS Mauritius 2021-02-17 3843.0 3843.0 Mauritius ... Sub-Saharan Africa High income Mauritius 11099.24028 0.3037 NaN
1 DZA Algeria 2021-02-19 75000.0 NaN Algeria ... Middle East & North Africa Lower middle income Algeria 3973.964072 0.1776 NaN
2 LAO Laos 2021-03-17 40732.0 40732.0 Laos ... East Asia & Pacific Lower middle income Lao PDR 2534.89828 0.5768 NaN
3 MOZ Mozambique 2021-03-23 57305.0 57305.0 Mozambique ... Sub-Saharan Africa Low income Mozambique 503.5707727 0.1943 NaN
4 CPV Cape Verde 2021-03-24 2184.0 2184.0 Cape Verde ... Sub-Saharan Africa Lower middle income Cabo Verde 3603.781793 0.4016 NaN
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
243 GUF NaN NaN NaN NaN French Guiana ... NaN NaN NaN NaN NaN NaN
244 KOS NaN NaN NaN NaN Kosovo ... NaN NaN NaN NaN NaN NaN
245 CUW NaN NaN NaN NaN Cura�ao ... Latin America & Caribbean High income Curacao 19689.13982 NaN NaN
246 CHI NaN NaN NaN NaN Channel Islands ... Europe & Central Asia High income Channel Islands 74462.64675 NaN NaN
247 SXM NaN NaN NaN NaN Sint Maarten ... Latin America & Caribbean High income Sint Maarten (Dutch part) 29160.10381 NaN NaN
[248 rows x 17 columns]
Post a Comment for "While Web Scraping For A Table In Python, An Empty Table Is Returned"