Reaching the “philosophy” wiki page [on hold]
up vote
-1
down vote
favorite
Is my code correct?
import requests
import bs4
import time
import urllib.request
def input_link():
start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup
def right_link(soup):
for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link
def loop(link):
for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')
return("http://en.wikipedia.org"+link)
def continue_crawl(link):
target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")
def main():
for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you
main()
python web-scraping beautifulsoup wikipedia
New contributor
put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-1
down vote
favorite
Is my code correct?
import requests
import bs4
import time
import urllib.request
def input_link():
start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup
def right_link(soup):
for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link
def loop(link):
for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')
return("http://en.wikipedia.org"+link)
def continue_crawl(link):
target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")
def main():
for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you
main()
python web-scraping beautifulsoup wikipedia
New contributor
put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
3
What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago
elif link == link:
is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday
@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Is my code correct?
import requests
import bs4
import time
import urllib.request
def input_link():
start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup
def right_link(soup):
for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link
def loop(link):
for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')
return("http://en.wikipedia.org"+link)
def continue_crawl(link):
target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")
def main():
for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you
main()
python web-scraping beautifulsoup wikipedia
New contributor
Is my code correct?
import requests
import bs4
import time
import urllib.request
def input_link():
start_url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(start_url)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
return soup
def right_link(soup):
for element in soup.find_all("p"):
if element.find("a"):
article_link = element.find("a").get('href')
link = "http://en.wikipedia.org" + article_link
return link
def loop(link):
for element in range(50):
response = requests.get(link)
html = response.text
soup = bs4.BeautifulSoup(html,"html.parser")
for element in soup.find_all("p"):
if element.find("a"):
link = element.find("a").get('href')
return("http://en.wikipedia.org"+link)
def continue_crawl(link):
target_url = "https://en.wikipedia.org/wiki/Philosophy"
if link == target_url:
print("We've found the target article!")
elif len(link) > 50:
print("The search has gone on suspiciously long, aborting search!")
elif link == link:
print("We've arrived at an article we've already seen, aborting search!")
def main():
for i in range(10):
x = input_link()
y = right_link(x)
z = loop(y)
a = continue_crawl(z)
time.sleep(2) # slow down otherwise wiki server will block you
main()
python web-scraping beautifulsoup wikipedia
python web-scraping beautifulsoup wikipedia
New contributor
New contributor
edited 2 days ago
200_success
127k15148412
127k15148412
New contributor
asked 2 days ago
ozge
1
1
New contributor
New contributor
put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by t3chb0t, Toby Speight, Sᴀᴍ Onᴇᴌᴀ, Heslacher, Graipher yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
3
What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago
elif link == link:
is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday
@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday
add a comment |
3
What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago
elif link == link:
is always going to be true, so at least that part is probably not correct.
– Graipher
yesterday
@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday
3
3
What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago
What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago
elif link == link:
is always going to be true, so at least that part is probably not correct.– Graipher
yesterday
elif link == link:
is always going to be true, so at least that part is probably not correct.– Graipher
yesterday
@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday
@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:
It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.
There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main
.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:
It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.
There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main
.
add a comment |
up vote
0
down vote
This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:
It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.
There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main
.
add a comment |
up vote
0
down vote
up vote
0
down vote
This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:
It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.
There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main
.
This is essentially a wild guess, given that you haven't really stated your requirements - nor the definition of 'correct' - but:
It appears that you're crawling Wikipedia to play the degrees-of-separation game. The way you're doing it is essentially the worst one. You shouldn't be scraping the web page itself. You should be calling the API. Better yet - download a dump of the Wikipedia database and operate on that.
There are other minor problems: you're missing a shebang, and you aren't checking the module name before calling main
.
answered 2 days ago
Reinderien
1,472616
1,472616
add a comment |
add a comment |
3
What do you mean is it correct? Does it work?
– jonrsharpe
2 days ago
elif link == link:
is always going to be true, so at least that part is probably not correct.– Graipher
yesterday
@ozge If you could add some introduction to what your code is supposed to do and confirm that your code works, I'm sure your question will be reopened :)
– IEatBagels
yesterday