Skip to content Skip to sidebar Skip to footer

While Web Scraping For A Table In Python, An Empty Table Is Returned

I need to grab a table from a web site by web scraping using BeautifulSoup library in Python. From the URL https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker

Solution 1:

Very simple - it's because there's an extra space in the class you're searching for.

If you change the class to g-summary-table svelte-2wimac, the tags should be correctly returned.

The following code should work:

import requests
from bs4 import BeautifulSoup
#
url = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', class_="g-summary-table svelte-2wimac")
print(table)

I've also done similar scraping on the NYTimes interactive website, and spaces can be very tricky. If you added an extra space or missed one, an empty result is returned.

If you cannot find the tags, I would recommend printing the entire document first using print(soup.prettify()) and find the desired tags you plan to scrape. Make sure you copy the exact text of the class name from the contents printed by BeautifulSoup.


Solution 2:

As an alternative, if you want to download the data in json format, then read into pandas, you can do this. same starting code from above and working off the soup object

There are several apis that are available (below are three), but pulled out of the html like:

import re
import pandas as pd

latest_dataset = soup.find(string=re.compile('latest')).splitlines()[2].split('"')[1]
requests.get(latest_dataset).json()

latest_timeseries = soup.find(string=re.compile('timeseries')).splitlines()[2].split('"')[3]
requests.get(latest_timeseries).json()

allwithrate = soup.find(string=re.compile('all_with_rate')).splitlines()[2].split('"')[1]
requests.get(allwithrate).json()
pd.DataFrame(requests.get(allwithrate).json())

output of the last one

    geoid    location last_updated  total_vaccinations  people_vaccinated     display_name  ...                      Region          IncomeGroup                    country  gdp_per_cap  vaccinations_rate people_fully_vaccinated
0     MUS   Mauritius   2021-02-17              3843.0             3843.0        Mauritius  ...          Sub-Saharan Africa          High income                  Mauritius  11099.24028             0.3037                     NaN
1     DZA     Algeria   2021-02-19             75000.0                NaN          Algeria  ...  Middle East & North Africa  Lower middle income                    Algeria  3973.964072             0.1776                     NaN
2     LAO        Laos   2021-03-17             40732.0            40732.0             Laos  ...         East Asia & Pacific  Lower middle income                    Lao PDR   2534.89828             0.5768                     NaN
3     MOZ  Mozambique   2021-03-23             57305.0            57305.0       Mozambique  ...          Sub-Saharan Africa           Low income                 Mozambique  503.5707727             0.1943                     NaN
4     CPV  Cape Verde   2021-03-24              2184.0             2184.0       Cape Verde  ...          Sub-Saharan Africa  Lower middle income                 Cabo Verde  3603.781793             0.4016                     NaN
..    ...         ...          ...                 ...                ...              ...  ...                         ...                  ...                        ...          ...                ...                     ...
243   GUF         NaN          NaN                 NaN                NaN    French Guiana  ...                         NaN                  NaN                        NaN          NaN                NaN                     NaN
244   KOS         NaN          NaN                 NaN                NaN           Kosovo  ...                         NaN                  NaN                        NaN          NaN                NaN                     NaN
245   CUW         NaN          NaN                 NaN                NaN          Cura�ao  ...   Latin America & Caribbean          High income                    Curacao  19689.13982                NaN                     NaN
246   CHI         NaN          NaN                 NaN                NaN  Channel Islands  ...       Europe & Central Asia          High income            Channel Islands  74462.64675                NaN                     NaN
247   SXM         NaN          NaN                 NaN                NaN     Sint Maarten  ...   Latin America & Caribbean          High income  Sint Maarten (Dutch part)  29160.10381                NaN                     NaN

[248 rows x 17 columns]

Post a Comment for "While Web Scraping For A Table In Python, An Empty Table Is Returned"