Skip to content Skip to sidebar Skip to footer

Create Loop To Extract Urls To Json And Csv

I set up a loop to scrape with 37900 records. Due to the way the url/ server is being set up, there's a limit of 200 records displayed in each url. Each url ends with 'skip=200', o

Solution 1:

I have nothing to test against

I think you massively over-complicated this. You've since edited the question but there's a couple of points to make:

  1. You define jsnlist = [] but never use it. Why?
  2. You called your own object json (now gone but I'm not sure whether you understand why). Calling your own object json will just supersede the actual module, and the whole code will grind to a halt before you even got into a loop
  3. There is no reason at all to save this data to disk before trying to create a dataframe
  4. Opening the .json file in write mode ('w') will wipe all existing data on each iteration of your loop
  5. Appending JSON to a file will not give a valid format to be parsed when read back in. At best, it might be JSONLines
  6. Appending DataFrames in a loop has terrible complexity because it requires copying of the original data each time.

Your approach will be something like this:

import pandas as pd
import requests
import json

records = range(37900)
skip = records[0::200]

Page = []
for i in skip:
    endpoint = "https://~/Projects?&$skip={}".format(i)
    Page.append(endpoint)

jsnlist = []
for j in Page:
    response = session.get(j) #session here refers to requests.Session() I had to set up to authenticate my access to these urls
    responsejs = response.json()
    responsejsval = responsejs['value'] #I only want to extract header called 'value' in each json
    jsnlist.append(responsejsval)

df = pd.DataFrame(jsnlist)

df = pd.DataFrame(jsnlist) might take some work, but you'll need to show what we're up against. I'd need to see responsejs['value'] to answer fully.

Post a Comment for "Create Loop To Extract Urls To Json And Csv"