Web scraping the titles and descriptions of trending YouTube videos
This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])
for content in soup.find_all('div', class_= "yt-lockup-content"):
try:
title = content.h3.a.text
print(title)
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
print(description)
except Exception as e:
description = None
print('n')
csv_writer.writerow([title, description])
csv_file.close()
python csv web-scraping beautifulsoup youtube
add a comment |
This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])
for content in soup.find_all('div', class_= "yt-lockup-content"):
try:
title = content.h3.a.text
print(title)
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
print(description)
except Exception as e:
description = None
print('n')
csv_writer.writerow([title, description])
csv_file.close()
python csv web-scraping beautifulsoup youtube
add a comment |
This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])
for content in soup.find_all('div', class_= "yt-lockup-content"):
try:
title = content.h3.a.text
print(title)
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
print(description)
except Exception as e:
description = None
print('n')
csv_writer.writerow([title, description])
csv_file.close()
python csv web-scraping beautifulsoup youtube
This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])
for content in soup.find_all('div', class_= "yt-lockup-content"):
try:
title = content.h3.a.text
print(title)
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
print(description)
except Exception as e:
description = None
print('n')
csv_writer.writerow([title, description])
csv_file.close()
python csv web-scraping beautifulsoup youtube
python csv web-scraping beautifulsoup youtube
edited 1 hour ago
Jamal♦
30.3k11116226
30.3k11116226
asked 2 hours ago
austingae
48113
48113
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular
list of videos? If you make a GET
request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular
, you will get the same information in a documented JSON format.
Using the Python client, the code looks like:
import csv
import googleapiclient.discovery
def most_popular(yt, **kwargs):
popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
for video in popular['items']:
yield video['snippet']
yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title', 'Description'])
csv_writer.writerows(
[snip['title'], snip['description']]
for snip in most_popular(yt, maxResults=20, regionCode=…)
)
I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: …
block.
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
1
@Josay When I include theregionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
– 200_success
1 hour ago
add a comment |
Context manager
You open a file at the beginning of the program and close it explicitly at the end.
Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.
In your case, you could write:
with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
....
Exception
All exceptions are caught by except Exception as e
. It may look like a good idea at first but this can lead to various issues:
- it's hard to know what types of error are actually expected here
- most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored
NameError
orAttributeError
and debugging will be more painful than it should be.
Also, from the content of the except
block, it looks like you are only expecting the logic about description
to fail. If so, it would be clearer to put in the try (...) except
the smallest amount of code.
For instance:
title = content.h3.a.text
try:
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
description = None
print(title)
print(description)
print('n')
Proper solution
Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular
list of videos? If you make a GET
request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular
, you will get the same information in a documented JSON format.
Using the Python client, the code looks like:
import csv
import googleapiclient.discovery
def most_popular(yt, **kwargs):
popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
for video in popular['items']:
yield video['snippet']
yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title', 'Description'])
csv_writer.writerows(
[snip['title'], snip['description']]
for snip in most_popular(yt, maxResults=20, regionCode=…)
)
I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: …
block.
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
1
@Josay When I include theregionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
– 200_success
1 hour ago
add a comment |
Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular
list of videos? If you make a GET
request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular
, you will get the same information in a documented JSON format.
Using the Python client, the code looks like:
import csv
import googleapiclient.discovery
def most_popular(yt, **kwargs):
popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
for video in popular['items']:
yield video['snippet']
yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title', 'Description'])
csv_writer.writerows(
[snip['title'], snip['description']]
for snip in most_popular(yt, maxResults=20, regionCode=…)
)
I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: …
block.
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
1
@Josay When I include theregionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
– 200_success
1 hour ago
add a comment |
Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular
list of videos? If you make a GET
request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular
, you will get the same information in a documented JSON format.
Using the Python client, the code looks like:
import csv
import googleapiclient.discovery
def most_popular(yt, **kwargs):
popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
for video in popular['items']:
yield video['snippet']
yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title', 'Description'])
csv_writer.writerows(
[snip['title'], snip['description']]
for snip in most_popular(yt, maxResults=20, regionCode=…)
)
I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: …
block.
Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular
list of videos? If you make a GET
request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular
, you will get the same information in a documented JSON format.
Using the Python client, the code looks like:
import csv
import googleapiclient.discovery
def most_popular(yt, **kwargs):
popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
for video in popular['items']:
yield video['snippet']
yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title', 'Description'])
csv_writer.writerows(
[snip['title'], snip['description']]
for snip in most_popular(yt, maxResults=20, regionCode=…)
)
I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: …
block.
edited 1 hour ago
answered 1 hour ago
200_success
128k15150412
128k15150412
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
1
@Josay When I include theregionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
– 200_success
1 hour ago
add a comment |
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
1
@Josay When I include theregionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
– 200_success
1 hour ago
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
– Josay
1 hour ago
1
1
@Josay When I include the
regionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.– 200_success
1 hour ago
@Josay When I include the
regionCode
in the API call ('CA'
for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.– 200_success
1 hour ago
add a comment |
Context manager
You open a file at the beginning of the program and close it explicitly at the end.
Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.
In your case, you could write:
with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
....
Exception
All exceptions are caught by except Exception as e
. It may look like a good idea at first but this can lead to various issues:
- it's hard to know what types of error are actually expected here
- most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored
NameError
orAttributeError
and debugging will be more painful than it should be.
Also, from the content of the except
block, it looks like you are only expecting the logic about description
to fail. If so, it would be clearer to put in the try (...) except
the smallest amount of code.
For instance:
title = content.h3.a.text
try:
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
description = None
print(title)
print(description)
print('n')
Proper solution
Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...
add a comment |
Context manager
You open a file at the beginning of the program and close it explicitly at the end.
Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.
In your case, you could write:
with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
....
Exception
All exceptions are caught by except Exception as e
. It may look like a good idea at first but this can lead to various issues:
- it's hard to know what types of error are actually expected here
- most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored
NameError
orAttributeError
and debugging will be more painful than it should be.
Also, from the content of the except
block, it looks like you are only expecting the logic about description
to fail. If so, it would be clearer to put in the try (...) except
the smallest amount of code.
For instance:
title = content.h3.a.text
try:
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
description = None
print(title)
print(description)
print('n')
Proper solution
Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...
add a comment |
Context manager
You open a file at the beginning of the program and close it explicitly at the end.
Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.
In your case, you could write:
with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
....
Exception
All exceptions are caught by except Exception as e
. It may look like a good idea at first but this can lead to various issues:
- it's hard to know what types of error are actually expected here
- most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored
NameError
orAttributeError
and debugging will be more painful than it should be.
Also, from the content of the except
block, it looks like you are only expecting the logic about description
to fail. If so, it would be clearer to put in the try (...) except
the smallest amount of code.
For instance:
title = content.h3.a.text
try:
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
description = None
print(title)
print(description)
print('n')
Proper solution
Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...
Context manager
You open a file at the beginning of the program and close it explicitly at the end.
Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.
In your case, you could write:
with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
....
Exception
All exceptions are caught by except Exception as e
. It may look like a good idea at first but this can lead to various issues:
- it's hard to know what types of error are actually expected here
- most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored
NameError
orAttributeError
and debugging will be more painful than it should be.
Also, from the content of the except
block, it looks like you are only expecting the logic about description
to fail. If so, it would be clearer to put in the try (...) except
the smallest amount of code.
For instance:
title = content.h3.a.text
try:
description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
description = None
print(title)
print(description)
print('n')
Proper solution
Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...
answered 1 hour ago
Josay
25.2k13886
25.2k13886
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown