Bash script to read a web page list from a text file
up vote
-1
down vote
favorite
I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?
Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.
bash shell-script wget
New contributor
add a comment |
up vote
-1
down vote
favorite
I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?
Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.
bash shell-script wget
New contributor
Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon♦
Nov 23 at 12:28
Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon♦
Nov 23 at 12:28
Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon♦
Nov 23 at 12:29
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?
Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.
bash shell-script wget
New contributor
I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?
Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.
bash shell-script wget
bash shell-script wget
New contributor
New contributor
edited Nov 23 at 12:32
New contributor
asked Nov 23 at 11:50
Βάσω Κουπετσιδου
11
11
New contributor
New contributor
Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon♦
Nov 23 at 12:28
Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon♦
Nov 23 at 12:28
Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon♦
Nov 23 at 12:29
add a comment |
Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon♦
Nov 23 at 12:28
Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon♦
Nov 23 at 12:28
Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon♦
Nov 23 at 12:29
Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon♦
Nov 23 at 12:28
Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon♦
Nov 23 at 12:28
Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon♦
Nov 23 at 12:28
Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon♦
Nov 23 at 12:28
Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon♦
Nov 23 at 12:29
Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon♦
Nov 23 at 12:29
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
#!/bin/sh
i=1
while IFS= read -r url; do
file="data-$i.out"
curl -o "$file.new" "$url"
if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi
mv -f "$file.new" "$file"
i=$(( i + 1 ))
done <url-list.txt
This would read the URLs from url-list.txt
, line by line, and use curl
to fetch each, saving the output in a file called data-N.out.new
where N
is an integer (the URL ordinal number in the file).
If there is no old data-N.out
file, or if this file differs from data-N.out.new
, then the URL is printed to standard output.
The fetched data file is then renamed for when you run the script again.
The first time you run the script, all URLs will be outputted as they have never been seen before.
Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i
.
Whether you use curl
or wget
or some other Web client is essentially unimportant.
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
#!/bin/sh
i=1
while IFS= read -r url; do
file="data-$i.out"
curl -o "$file.new" "$url"
if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi
mv -f "$file.new" "$file"
i=$(( i + 1 ))
done <url-list.txt
This would read the URLs from url-list.txt
, line by line, and use curl
to fetch each, saving the output in a file called data-N.out.new
where N
is an integer (the URL ordinal number in the file).
If there is no old data-N.out
file, or if this file differs from data-N.out.new
, then the URL is printed to standard output.
The fetched data file is then renamed for when you run the script again.
The first time you run the script, all URLs will be outputted as they have never been seen before.
Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i
.
Whether you use curl
or wget
or some other Web client is essentially unimportant.
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
add a comment |
up vote
2
down vote
#!/bin/sh
i=1
while IFS= read -r url; do
file="data-$i.out"
curl -o "$file.new" "$url"
if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi
mv -f "$file.new" "$file"
i=$(( i + 1 ))
done <url-list.txt
This would read the URLs from url-list.txt
, line by line, and use curl
to fetch each, saving the output in a file called data-N.out.new
where N
is an integer (the URL ordinal number in the file).
If there is no old data-N.out
file, or if this file differs from data-N.out.new
, then the URL is printed to standard output.
The fetched data file is then renamed for when you run the script again.
The first time you run the script, all URLs will be outputted as they have never been seen before.
Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i
.
Whether you use curl
or wget
or some other Web client is essentially unimportant.
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
add a comment |
up vote
2
down vote
up vote
2
down vote
#!/bin/sh
i=1
while IFS= read -r url; do
file="data-$i.out"
curl -o "$file.new" "$url"
if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi
mv -f "$file.new" "$file"
i=$(( i + 1 ))
done <url-list.txt
This would read the URLs from url-list.txt
, line by line, and use curl
to fetch each, saving the output in a file called data-N.out.new
where N
is an integer (the URL ordinal number in the file).
If there is no old data-N.out
file, or if this file differs from data-N.out.new
, then the URL is printed to standard output.
The fetched data file is then renamed for when you run the script again.
The first time you run the script, all URLs will be outputted as they have never been seen before.
Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i
.
Whether you use curl
or wget
or some other Web client is essentially unimportant.
#!/bin/sh
i=1
while IFS= read -r url; do
file="data-$i.out"
curl -o "$file.new" "$url"
if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi
mv -f "$file.new" "$file"
i=$(( i + 1 ))
done <url-list.txt
This would read the URLs from url-list.txt
, line by line, and use curl
to fetch each, saving the output in a file called data-N.out.new
where N
is an integer (the URL ordinal number in the file).
If there is no old data-N.out
file, or if this file differs from data-N.out.new
, then the URL is printed to standard output.
The fetched data file is then renamed for when you run the script again.
The first time you run the script, all URLs will be outputted as they have never been seen before.
Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i
.
Whether you use curl
or wget
or some other Web client is essentially unimportant.
edited Nov 23 at 12:35
answered Nov 23 at 12:00
Kusalananda
117k16221360
117k16221360
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
add a comment |
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18
add a comment |
Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.
Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.
Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.
Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483661%2fbash-script-to-read-a-web-page-list-from-a-text-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon♦
Nov 23 at 12:28
Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon♦
Nov 23 at 12:28
Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon♦
Nov 23 at 12:29