Web scraping the titles and descriptions of trending YouTube videos












1














This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



from bs4 import BeautifulSoup
import requests
import csv

source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')

csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])

for content in soup.find_all('div', class_= "yt-lockup-content"):
try:
title = content.h3.a.text
print(title)

description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
print(description)

except Exception as e:
description = None

print('n')
csv_writer.writerow([title, description])

csv_file.close()









share|improve this question





























    1














    This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



    from bs4 import BeautifulSoup
    import requests
    import csv

    source = requests.get("https://www.youtube.com/feed/trending").text
    soup = BeautifulSoup(source, 'lxml')

    csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(['Title', 'Description'])

    for content in soup.find_all('div', class_= "yt-lockup-content"):
    try:
    title = content.h3.a.text
    print(title)

    description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
    print(description)

    except Exception as e:
    description = None

    print('n')
    csv_writer.writerow([title, description])

    csv_file.close()









    share|improve this question



























      1












      1








      1







      This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



      from bs4 import BeautifulSoup
      import requests
      import csv

      source = requests.get("https://www.youtube.com/feed/trending").text
      soup = BeautifulSoup(source, 'lxml')

      csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
      csv_writer = csv.writer(csv_file)
      csv_writer.writerow(['Title', 'Description'])

      for content in soup.find_all('div', class_= "yt-lockup-content"):
      try:
      title = content.h3.a.text
      print(title)

      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
      print(description)

      except Exception as e:
      description = None

      print('n')
      csv_writer.writerow([title, description])

      csv_file.close()









      share|improve this question















      This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



      from bs4 import BeautifulSoup
      import requests
      import csv

      source = requests.get("https://www.youtube.com/feed/trending").text
      soup = BeautifulSoup(source, 'lxml')

      csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
      csv_writer = csv.writer(csv_file)
      csv_writer.writerow(['Title', 'Description'])

      for content in soup.find_all('div', class_= "yt-lockup-content"):
      try:
      title = content.h3.a.text
      print(title)

      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
      print(description)

      except Exception as e:
      description = None

      print('n')
      csv_writer.writerow([title, description])

      csv_file.close()






      python csv web-scraping beautifulsoup youtube






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago









      Jamal

      30.3k11116226




      30.3k11116226










      asked 2 hours ago









      austingae

      48113




      48113






















          2 Answers
          2






          active

          oldest

          votes


















          3














          Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



          Using the Python client, the code looks like:



          import csv
          import googleapiclient.discovery

          def most_popular(yt, **kwargs):
          popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
          for video in popular['items']:
          yield video['snippet']

          yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
          with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
          csv_writer = csv.writer(f)
          csv_writer.writerow(['Title', 'Description'])
          csv_writer.writerows(
          [snip['title'], snip['description']]
          for snip in most_popular(yt, maxResults=20, regionCode=…)
          )


          I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






          share|improve this answer























          • Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
            – Josay
            1 hour ago






          • 1




            @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
            – 200_success
            1 hour ago





















          1














          Context manager



          You open a file at the beginning of the program and close it explicitly at the end.



          Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



          In your case, you could write:



          with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
          ....


          Exception



          All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




          • it's hard to know what types of error are actually expected here

          • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


          Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



          For instance:



          title = content.h3.a.text
          try:
          description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
          except Exception as e:
          description = None
          print(title)
          print(description)
          print('n')


          Proper solution



          Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






          share|improve this answer





















            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



            Using the Python client, the code looks like:



            import csv
            import googleapiclient.discovery

            def most_popular(yt, **kwargs):
            popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
            for video in popular['items']:
            yield video['snippet']

            yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
            with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
            csv_writer = csv.writer(f)
            csv_writer.writerow(['Title', 'Description'])
            csv_writer.writerows(
            [snip['title'], snip['description']]
            for snip in most_popular(yt, maxResults=20, regionCode=…)
            )


            I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






            share|improve this answer























            • Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
              – Josay
              1 hour ago






            • 1




              @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
              – 200_success
              1 hour ago


















            3














            Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



            Using the Python client, the code looks like:



            import csv
            import googleapiclient.discovery

            def most_popular(yt, **kwargs):
            popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
            for video in popular['items']:
            yield video['snippet']

            yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
            with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
            csv_writer = csv.writer(f)
            csv_writer.writerow(['Title', 'Description'])
            csv_writer.writerows(
            [snip['title'], snip['description']]
            for snip in most_popular(yt, maxResults=20, regionCode=…)
            )


            I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






            share|improve this answer























            • Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
              – Josay
              1 hour ago






            • 1




              @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
              – 200_success
              1 hour ago
















            3












            3








            3






            Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



            Using the Python client, the code looks like:



            import csv
            import googleapiclient.discovery

            def most_popular(yt, **kwargs):
            popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
            for video in popular['items']:
            yield video['snippet']

            yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
            with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
            csv_writer = csv.writer(f)
            csv_writer.writerow(['Title', 'Description'])
            csv_writer.writerows(
            [snip['title'], snip['description']]
            for snip in most_popular(yt, maxResults=20, regionCode=…)
            )


            I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






            share|improve this answer














            Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



            Using the Python client, the code looks like:



            import csv
            import googleapiclient.discovery

            def most_popular(yt, **kwargs):
            popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
            for video in popular['items']:
            yield video['snippet']

            yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
            with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
            csv_writer = csv.writer(f)
            csv_writer.writerow(['Title', 'Description'])
            csv_writer.writerows(
            [snip['title'], snip['description']]
            for snip in most_popular(yt, maxResults=20, regionCode=…)
            )


            I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 1 hour ago

























            answered 1 hour ago









            200_success

            128k15150412




            128k15150412












            • Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
              – Josay
              1 hour ago






            • 1




              @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
              – 200_success
              1 hour ago




















            • Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
              – Josay
              1 hour ago






            • 1




              @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
              – 200_success
              1 hour ago


















            Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
            – Josay
            1 hour ago




            Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
            – Josay
            1 hour ago




            1




            1




            @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
            – 200_success
            1 hour ago






            @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
            – 200_success
            1 hour ago















            1














            Context manager



            You open a file at the beginning of the program and close it explicitly at the end.



            Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



            In your case, you could write:



            with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
            ....


            Exception



            All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




            • it's hard to know what types of error are actually expected here

            • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


            Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



            For instance:



            title = content.h3.a.text
            try:
            description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
            except Exception as e:
            description = None
            print(title)
            print(description)
            print('n')


            Proper solution



            Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






            share|improve this answer


























              1














              Context manager



              You open a file at the beginning of the program and close it explicitly at the end.



              Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



              In your case, you could write:



              with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
              ....


              Exception



              All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




              • it's hard to know what types of error are actually expected here

              • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


              Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



              For instance:



              title = content.h3.a.text
              try:
              description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
              except Exception as e:
              description = None
              print(title)
              print(description)
              print('n')


              Proper solution



              Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






              share|improve this answer
























                1












                1








                1






                Context manager



                You open a file at the beginning of the program and close it explicitly at the end.



                Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



                In your case, you could write:



                with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
                ....


                Exception



                All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




                • it's hard to know what types of error are actually expected here

                • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


                Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



                For instance:



                title = content.h3.a.text
                try:
                description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                except Exception as e:
                description = None
                print(title)
                print(description)
                print('n')


                Proper solution



                Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






                share|improve this answer












                Context manager



                You open a file at the beginning of the program and close it explicitly at the end.



                Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



                In your case, you could write:



                with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
                ....


                Exception



                All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




                • it's hard to know what types of error are actually expected here

                • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


                Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



                For instance:



                title = content.h3.a.text
                try:
                description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                except Exception as e:
                description = None
                print(title)
                print(description)
                print('n')


                Proper solution



                Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 1 hour ago









                Josay

                25.2k13886




                25.2k13886






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Code Review Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Morgemoulin

                    Scott Moir

                    Souastre