Reaching the “philosophy” wiki page [on hold]











up vote
-1
down vote

favorite












Is my code correct?



import requests
import bs4
import time
import urllib.request

def input_link():

start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup

def right_link(soup):

for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link

def loop(link):

for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')

return("http://en.wikipedia.org"+link)

def continue_crawl(link):

target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")

def main():

for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you

main()









share|improve this question









New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 3




    What do you mean is it correct? Does it work?
    – jonrsharpe
    2 days ago










  • elif link == link: is always going to be true, so at least that part is probably not correct.
    – Graipher
    yesterday










  • @ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
    – IEatBagels
    yesterday















up vote
-1
down vote

favorite












Is my code correct?



import requests
import bs4
import time
import urllib.request

def input_link():

start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup

def right_link(soup):

for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link

def loop(link):

for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')

return("http://en.wikipedia.org"+link)

def continue_crawl(link):

target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")

def main():

for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you

main()









share|improve this question









New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 3




    What do you mean is it correct? Does it work?
    – jonrsharpe
    2 days ago










  • elif link == link: is always going to be true, so at least that part is probably not correct.
    – Graipher
    yesterday










  • @ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
    – IEatBagels
    yesterday













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











Is my code correct?



import requests
import bs4
import time
import urllib.request

def input_link():

start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup

def right_link(soup):

for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link

def loop(link):

for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')

return("http://en.wikipedia.org"+link)

def continue_crawl(link):

target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")

def main():

for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you

main()









share|improve this question









New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Is my code correct?



import requests
import bs4
import time
import urllib.request

def input_link():

start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup

def right_link(soup):

for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link

def loop(link):

for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')

return("http://en.wikipedia.org"+link)

def continue_crawl(link):

target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")

def main():

for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you

main()






python web-scraping beautifulsoup wikipedia






share|improve this question









New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









200_success

127k15148412




127k15148412






New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









ozge

1




1




New contributor




ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 3




    What do you mean is it correct? Does it work?
    – jonrsharpe
    2 days ago










  • elif link == link: is always going to be true, so at least that part is probably not correct.
    – Graipher
    yesterday










  • @ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
    – IEatBagels
    yesterday














  • 3




    What do you mean is it correct? Does it work?
    – jonrsharpe
    2 days ago










  • elif link == link: is always going to be true, so at least that part is probably not correct.
    – Graipher
    yesterday










  • @ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
    – IEatBagels
    yesterday








3




3




What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago




What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago












elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday




elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday












@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday




@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday










1 Answer
1






active

oldest

votes

















up vote
0
down vote













This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:



It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.



There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.






share|improve this answer




























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:



    It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.



    There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.






    share|improve this answer

























      up vote
      0
      down vote













      This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:



      It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.



      There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:



        It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.



        There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.






        share|improve this answer












        This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:



        It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.



        There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        Reinderien

        1,472616




        1,472616















            Popular posts from this blog

            Morgemoulin

            Scott Moir

            Souastre