Reaching the “philosophy” wiki page [on hold]

up vote
-1
down vote

favorite

Is my code correct?

import requests

import bs4

import time

import urllib.request



def input_link():



    start_url = "https://en.wikipedia.org/wiki/Special:Random"

    response = requests.get(start_url)

    html = response.text

    soup = bs4.BeautifulSoup(html,"html.parser")

    return soup



def right_link(soup):



    for element in soup.find_all("p"):

        if element.find("a"):

            article_link = element.find("a").get('href')

            link = "http://en.wikipedia.org" + article_link

            return link



def loop(link):



    for element in range(50):

        response = requests.get(link)

        html = response.text

        soup = bs4.BeautifulSoup(html,"html.parser") 

        for element in soup.find_all("p"):

            if element.find("a"):

                link = element.find("a").get('href')



    return("http://en.wikipedia.org"+link)



def continue_crawl(link):



    target_url = "https://en.wikipedia.org/wiki/Philosophy"

    if link == target_url:

        print("We've found the target article!")

    elif len(link) > 50:

        print("The search has gone on suspiciously long, aborting search!")

    elif link == link:

        print("We've arrived at an article we've already seen, aborting search!")



def main():



    for i in range(10):

        x = input_link()

        y = right_link(x)

        z = loop(y)

        a = continue_crawl(z)

        time.sleep(2) # slow down otherwise wiki server will block you



main()

edited 2 days ago

200_success

127k15148412

asked 2 days ago

ozge

New contributor

put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

3

What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago

elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday

@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday

add a comment |

up vote
-1
down vote

favorite

Is my code correct?

import requests

import bs4

import time

import urllib.request



def input_link():



    start_url = "https://en.wikipedia.org/wiki/Special:Random"

    response = requests.get(start_url)

    html = response.text

    soup = bs4.BeautifulSoup(html,"html.parser")

    return soup



def right_link(soup):



    for element in soup.find_all("p"):

        if element.find("a"):

            article_link = element.find("a").get('href')

            link = "http://en.wikipedia.org" + article_link

            return link



def loop(link):



    for element in range(50):

        response = requests.get(link)

        html = response.text

        soup = bs4.BeautifulSoup(html,"html.parser") 

        for element in soup.find_all("p"):

            if element.find("a"):

                link = element.find("a").get('href')



    return("http://en.wikipedia.org"+link)



def continue_crawl(link):



    target_url = "https://en.wikipedia.org/wiki/Philosophy"

    if link == target_url:

        print("We've found the target article!")

    elif len(link) > 50:

        print("The search has gone on suspiciously long, aborting search!")

    elif link == link:

        print("We've arrived at an article we've already seen, aborting search!")



def main():



    for i in range(10):

        x = input_link()

        y = right_link(x)

        z = loop(y)

        a = continue_crawl(z)

        time.sleep(2) # slow down otherwise wiki server will block you



main()

edited 2 days ago

200_success

127k15148412

asked 2 days ago

ozge

New contributor

put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday

3

What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago

elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday

@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday

add a comment |

up vote
-1
down vote

favorite

Is my code correct?

import requests

import bs4

import time

import urllib.request



def input_link():



    start_url = "https://en.wikipedia.org/wiki/Special:Random"

    response = requests.get(start_url)

    html = response.text

    soup = bs4.BeautifulSoup(html,"html.parser")

    return soup



def right_link(soup):



    for element in soup.find_all("p"):

        if element.find("a"):

            article_link = element.find("a").get('href')

            link = "http://en.wikipedia.org" + article_link

            return link



def loop(link):



    for element in range(50):

        response = requests.get(link)

        html = response.text

        soup = bs4.BeautifulSoup(html,"html.parser") 

        for element in soup.find_all("p"):

            if element.find("a"):

                link = element.find("a").get('href')



    return("http://en.wikipedia.org"+link)



def continue_crawl(link):



    target_url = "https://en.wikipedia.org/wiki/Philosophy"

    if link == target_url:

        print("We've found the target article!")

    elif len(link) > 50:

        print("The search has gone on suspiciously long, aborting search!")

    elif link == link:

        print("We've arrived at an article we've already seen, aborting search!")



def main():



    for i in range(10):

        x = input_link()

        y = right_link(x)

        z = loop(y)

        a = continue_crawl(z)

        time.sleep(2) # slow down otherwise wiki server will block you



main()

edited 2 days ago

200_success

127k15148412

asked 2 days ago

ozge

New contributor

Is my code correct?

import requests

import bs4

import time

import urllib.request



def input_link():



    start_url = "https://en.wikipedia.org/wiki/Special:Random"

    response = requests.get(start_url)

    html = response.text

    soup = bs4.BeautifulSoup(html,"html.parser")

    return soup



def right_link(soup):



    for element in soup.find_all("p"):

        if element.find("a"):

            article_link = element.find("a").get('href')

            link = "http://en.wikipedia.org" + article_link

            return link



def loop(link):



    for element in range(50):

        response = requests.get(link)

        html = response.text

        soup = bs4.BeautifulSoup(html,"html.parser") 

        for element in soup.find_all("p"):

            if element.find("a"):

                link = element.find("a").get('href')



    return("http://en.wikipedia.org"+link)



def continue_crawl(link):



    target_url = "https://en.wikipedia.org/wiki/Philosophy"

    if link == target_url:

        print("We've found the target article!")

    elif len(link) > 50:

        print("The search has gone on suspiciously long, aborting search!")

    elif link == link:

        print("We've arrived at an article we've already seen, aborting search!")



def main():



    for i in range(10):

        x = input_link()

        y = right_link(x)

        z = loop(y)

        a = continue_crawl(z)

        time.sleep(2) # slow down otherwise wiki server will block you



main()

python web-scraping beautifulsoup wikipedia

edited 2 days ago

200_success

127k15148412

asked 2 days ago

ozge

New contributor

edited 2 days ago

200_success

127k15148412

asked 2 days ago

ozge

New contributor

edited 2 days ago

200_success

127k15148412

edited 2 days ago

200_success

127k15148412

edited 2 days ago

200_success

127k15148412

asked 2 days ago

ozge

New contributor

asked 2 days ago

ozge

asked 2 days ago

ozge

New contributor

ozge is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday

3

What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago

elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday

@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday

add a comment |

3

What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago

elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday

@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday

What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago

elif link == link: is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday

@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:

It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.

There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.

answered 2 days ago

Reinderien

1,472616

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:

There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.

answered 2 days ago

Reinderien

1,472616

add a comment |

up vote
0
down vote

This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:

There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.

answered 2 days ago

Reinderien

1,472616

add a comment |

up vote
0
down vote

This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:

There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.

answered 2 days ago

Reinderien

1,472616

This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:

There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main.

answered 2 days ago

Reinderien

1,472616

answered 2 days ago

Reinderien

1,472616

answered 2 days ago

Reinderien

1,472616

answered 2 days ago

Reinderien

1,472616

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk