Creating a python dataframe by parsing JSON API response











up vote
3
down vote

favorite












In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()









share|improve this question




















  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    Nov 19 at 22:07















up vote
3
down vote

favorite












In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()









share|improve this question




















  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    Nov 19 at 22:07













up vote
3
down vote

favorite









up vote
3
down vote

favorite











In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()









share|improve this question















In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()






python beginner python-3.x web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago

























asked Nov 19 at 18:39









QHarr

20319




20319








  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    Nov 19 at 22:07














  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    Nov 19 at 22:07








1




1




The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
Nov 19 at 22:07




The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
Nov 19 at 22:07










3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted











I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer





















  • Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
    – QHarr
    yesterday












  • What is the purpose of the * in the above please? Is it unpacking?
    – QHarr
    yesterday












  • @QHarr Yes, it's the unpacking operator.
    – Reinderien
    yesterday










  • Many thanks for that.
    – QHarr
    yesterday


















up vote
1
down vote













The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    Nov 19 at 19:19


















up vote
1
down vote













Instead of splitting the data into a separate variables by column, you could convert each JSON object into a flat dictionary using a function similar to this



def course_dict(item):
return {'Course Title': item['title'],
'Vendor': item['vendors'][0]['display'] if item['vendors'] else None,
# and so on
}


and construct the dataframe using



data = response.json()
df = pd.DataFrame([course_dict(item) for item in data])


Keeping related data together makes the code easier to follow. Also, since your final output is a csv file, you could skip the dataframe and use csv.DictWriter instead.




I am not sure what the usual placeholder value is for missing string values in Python.




None is the usual placeholder for missing values of any type.






share|improve this answer





















  • Thanks. How would I expand this for the other items? Create a series of functions?
    – QHarr
    yesterday










  • @QHarr Just expand the function, adding the other items next to the two
    – Janne Karila
    yesterday










  • Thank you very much.
    – QHarr
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207999%2fcreating-a-python-dataframe-by-parsing-json-api-response%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted











I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer





















  • Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
    – QHarr
    yesterday












  • What is the purpose of the * in the above please? Is it unpacking?
    – QHarr
    yesterday












  • @QHarr Yes, it's the unpacking operator.
    – Reinderien
    yesterday










  • Many thanks for that.
    – QHarr
    yesterday















up vote
1
down vote



accepted











I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer





















  • Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
    – QHarr
    yesterday












  • What is the purpose of the * in the above please? Is it unpacking?
    – QHarr
    yesterday












  • @QHarr Yes, it's the unpacking operator.
    – Reinderien
    yesterday










  • Many thanks for that.
    – QHarr
    yesterday













up vote
1
down vote



accepted







up vote
1
down vote



accepted







I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer













I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 19 at 19:31









Reinderien

1,342516




1,342516












  • Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
    – QHarr
    yesterday












  • What is the purpose of the * in the above please? Is it unpacking?
    – QHarr
    yesterday












  • @QHarr Yes, it's the unpacking operator.
    – Reinderien
    yesterday










  • Many thanks for that.
    – QHarr
    yesterday


















  • Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
    – QHarr
    yesterday












  • What is the purpose of the * in the above please? Is it unpacking?
    – QHarr
    yesterday












  • @QHarr Yes, it's the unpacking operator.
    – Reinderien
    yesterday










  • Many thanks for that.
    – QHarr
    yesterday
















Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
– QHarr
yesterday






Sorry for delay. If I replace with item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' , this works beautifully. This is basically where I was trying to get to. Thank you.
– QHarr
yesterday














What is the purpose of the * in the above please? Is it unpacking?
– QHarr
yesterday






What is the purpose of the * in the above please? Is it unpacking?
– QHarr
yesterday














@QHarr Yes, it's the unpacking operator.
– Reinderien
yesterday




@QHarr Yes, it's the unpacking operator.
– Reinderien
yesterday












Many thanks for that.
– QHarr
yesterday




Many thanks for that.
– QHarr
yesterday












up vote
1
down vote













The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    Nov 19 at 19:19















up vote
1
down vote













The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    Nov 19 at 19:19













up vote
1
down vote










up vote
1
down vote









The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]






share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered Nov 19 at 19:17









Anthony Herrera

111




111




New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    Nov 19 at 19:19


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    Nov 19 at 19:19
















Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
Nov 19 at 19:19




Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
Nov 19 at 19:19










up vote
1
down vote













Instead of splitting the data into a separate variables by column, you could convert each JSON object into a flat dictionary using a function similar to this



def course_dict(item):
return {'Course Title': item['title'],
'Vendor': item['vendors'][0]['display'] if item['vendors'] else None,
# and so on
}


and construct the dataframe using



data = response.json()
df = pd.DataFrame([course_dict(item) for item in data])


Keeping related data together makes the code easier to follow. Also, since your final output is a csv file, you could skip the dataframe and use csv.DictWriter instead.




I am not sure what the usual placeholder value is for missing string values in Python.




None is the usual placeholder for missing values of any type.






share|improve this answer





















  • Thanks. How would I expand this for the other items? Create a series of functions?
    – QHarr
    yesterday










  • @QHarr Just expand the function, adding the other items next to the two
    – Janne Karila
    yesterday










  • Thank you very much.
    – QHarr
    yesterday















up vote
1
down vote













Instead of splitting the data into a separate variables by column, you could convert each JSON object into a flat dictionary using a function similar to this



def course_dict(item):
return {'Course Title': item['title'],
'Vendor': item['vendors'][0]['display'] if item['vendors'] else None,
# and so on
}


and construct the dataframe using



data = response.json()
df = pd.DataFrame([course_dict(item) for item in data])


Keeping related data together makes the code easier to follow. Also, since your final output is a csv file, you could skip the dataframe and use csv.DictWriter instead.




I am not sure what the usual placeholder value is for missing string values in Python.




None is the usual placeholder for missing values of any type.






share|improve this answer





















  • Thanks. How would I expand this for the other items? Create a series of functions?
    – QHarr
    yesterday










  • @QHarr Just expand the function, adding the other items next to the two
    – Janne Karila
    yesterday










  • Thank you very much.
    – QHarr
    yesterday













up vote
1
down vote










up vote
1
down vote









Instead of splitting the data into a separate variables by column, you could convert each JSON object into a flat dictionary using a function similar to this



def course_dict(item):
return {'Course Title': item['title'],
'Vendor': item['vendors'][0]['display'] if item['vendors'] else None,
# and so on
}


and construct the dataframe using



data = response.json()
df = pd.DataFrame([course_dict(item) for item in data])


Keeping related data together makes the code easier to follow. Also, since your final output is a csv file, you could skip the dataframe and use csv.DictWriter instead.




I am not sure what the usual placeholder value is for missing string values in Python.




None is the usual placeholder for missing values of any type.






share|improve this answer












Instead of splitting the data into a separate variables by column, you could convert each JSON object into a flat dictionary using a function similar to this



def course_dict(item):
return {'Course Title': item['title'],
'Vendor': item['vendors'][0]['display'] if item['vendors'] else None,
# and so on
}


and construct the dataframe using



data = response.json()
df = pd.DataFrame([course_dict(item) for item in data])


Keeping related data together makes the code easier to follow. Also, since your final output is a csv file, you could skip the dataframe and use csv.DictWriter instead.




I am not sure what the usual placeholder value is for missing string values in Python.




None is the usual placeholder for missing values of any type.







share|improve this answer












share|improve this answer



share|improve this answer










answered yesterday









Janne Karila

9,6001430




9,6001430












  • Thanks. How would I expand this for the other items? Create a series of functions?
    – QHarr
    yesterday










  • @QHarr Just expand the function, adding the other items next to the two
    – Janne Karila
    yesterday










  • Thank you very much.
    – QHarr
    yesterday


















  • Thanks. How would I expand this for the other items? Create a series of functions?
    – QHarr
    yesterday










  • @QHarr Just expand the function, adding the other items next to the two
    – Janne Karila
    yesterday










  • Thank you very much.
    – QHarr
    yesterday
















Thanks. How would I expand this for the other items? Create a series of functions?
– QHarr
yesterday




Thanks. How would I expand this for the other items? Create a series of functions?
– QHarr
yesterday












@QHarr Just expand the function, adding the other items next to the two
– Janne Karila
yesterday




@QHarr Just expand the function, adding the other items next to the two
– Janne Karila
yesterday












Thank you very much.
– QHarr
yesterday




Thank you very much.
– QHarr
yesterday


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207999%2fcreating-a-python-dataframe-by-parsing-json-api-response%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre