Bash script to read a web page list from a text file











up vote
-1
down vote

favorite












I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?



Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.










share|improve this question









New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
    – terdon
    Nov 23 at 12:28










  • Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
    – terdon
    Nov 23 at 12:28










  • Also, please look at the comments on your original question and edit this one to provide the details requested.
    – terdon
    Nov 23 at 12:29















up vote
-1
down vote

favorite












I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?



Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.










share|improve this question









New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
    – terdon
    Nov 23 at 12:28










  • Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
    – terdon
    Nov 23 at 12:28










  • Also, please look at the comments on your original question and edit this one to provide the details requested.
    – terdon
    Nov 23 at 12:29













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?



Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.










share|improve this question









New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I want to read webpage list and check if some of them have updated. Is it better to use wget or curl and how should I do that?



Webpage list is in a simple text file. If the contents of a webpage is the same it will not print anything. If contents changed from the last time that the script ran, then it will type(stdout) the webpage address.







bash shell-script wget






share|improve this question









New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Nov 23 at 12:32





















New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Nov 23 at 11:50









Βάσω Κουπετσιδου

11




11




New contributor




Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Βάσω Κουπετσιδου is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
    – terdon
    Nov 23 at 12:28










  • Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
    – terdon
    Nov 23 at 12:28










  • Also, please look at the comments on your original question and edit this one to provide the details requested.
    – terdon
    Nov 23 at 12:29


















  • Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
    – terdon
    Nov 23 at 12:28










  • Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
    – terdon
    Nov 23 at 12:28










  • Also, please look at the comments on your original question and edit this one to provide the details requested.
    – terdon
    Nov 23 at 12:29
















Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon
Nov 23 at 12:28




Please don't re-post the same question. I have now reopened this one and closed your first one as a duplicate.
– terdon
Nov 23 at 12:28












Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon
Nov 23 at 12:28




Βάσω, please edit your question and show us a few lines of your file with the webpages so we know what we're dealing with. And do you keep the previous versions of the webpages somewhere? Are they simple html files? More complex pages which are generated on the fly?
– terdon
Nov 23 at 12:28












Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon
Nov 23 at 12:29




Also, please look at the comments on your original question and edit this one to provide the details requested.
– terdon
Nov 23 at 12:29










1 Answer
1






active

oldest

votes

















up vote
2
down vote













#!/bin/sh

i=1
while IFS= read -r url; do
file="data-$i.out"

curl -o "$file.new" "$url"

if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi

mv -f "$file.new" "$file"

i=$(( i + 1 ))
done <url-list.txt


This would read the URLs from url-list.txt, line by line, and use curl to fetch each, saving the output in a file called data-N.out.new where N is an integer (the URL ordinal number in the file).



If there is no old data-N.out file, or if this file differs from data-N.out.new, then the URL is printed to standard output.



The fetched data file is then renamed for when you run the script again.



The first time you run the script, all URLs will be outputted as they have never been seen before.



Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i.



Whether you use curl or wget or some other Web client is essentially unimportant.






share|improve this answer























  • This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
    – ctrl-alt-delor
    Nov 23 at 12:03












  • Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
    – ctrl-alt-delor
    Nov 23 at 12:18











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483661%2fbash-script-to-read-a-web-page-list-from-a-text-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote













#!/bin/sh

i=1
while IFS= read -r url; do
file="data-$i.out"

curl -o "$file.new" "$url"

if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi

mv -f "$file.new" "$file"

i=$(( i + 1 ))
done <url-list.txt


This would read the URLs from url-list.txt, line by line, and use curl to fetch each, saving the output in a file called data-N.out.new where N is an integer (the URL ordinal number in the file).



If there is no old data-N.out file, or if this file differs from data-N.out.new, then the URL is printed to standard output.



The fetched data file is then renamed for when you run the script again.



The first time you run the script, all URLs will be outputted as they have never been seen before.



Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i.



Whether you use curl or wget or some other Web client is essentially unimportant.






share|improve this answer























  • This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
    – ctrl-alt-delor
    Nov 23 at 12:03












  • Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
    – ctrl-alt-delor
    Nov 23 at 12:18















up vote
2
down vote













#!/bin/sh

i=1
while IFS= read -r url; do
file="data-$i.out"

curl -o "$file.new" "$url"

if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi

mv -f "$file.new" "$file"

i=$(( i + 1 ))
done <url-list.txt


This would read the URLs from url-list.txt, line by line, and use curl to fetch each, saving the output in a file called data-N.out.new where N is an integer (the URL ordinal number in the file).



If there is no old data-N.out file, or if this file differs from data-N.out.new, then the URL is printed to standard output.



The fetched data file is then renamed for when you run the script again.



The first time you run the script, all URLs will be outputted as they have never been seen before.



Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i.



Whether you use curl or wget or some other Web client is essentially unimportant.






share|improve this answer























  • This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
    – ctrl-alt-delor
    Nov 23 at 12:03












  • Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
    – ctrl-alt-delor
    Nov 23 at 12:18













up vote
2
down vote










up vote
2
down vote









#!/bin/sh

i=1
while IFS= read -r url; do
file="data-$i.out"

curl -o "$file.new" "$url"

if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi

mv -f "$file.new" "$file"

i=$(( i + 1 ))
done <url-list.txt


This would read the URLs from url-list.txt, line by line, and use curl to fetch each, saving the output in a file called data-N.out.new where N is an integer (the URL ordinal number in the file).



If there is no old data-N.out file, or if this file differs from data-N.out.new, then the URL is printed to standard output.



The fetched data file is then renamed for when you run the script again.



The first time you run the script, all URLs will be outputted as they have never been seen before.



Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i.



Whether you use curl or wget or some other Web client is essentially unimportant.






share|improve this answer














#!/bin/sh

i=1
while IFS= read -r url; do
file="data-$i.out"

curl -o "$file.new" "$url"

if ! cmp -s "$file" "$file.new"
then
printf '%sn' "$url"
fi

mv -f "$file.new" "$file"

i=$(( i + 1 ))
done <url-list.txt


This would read the URLs from url-list.txt, line by line, and use curl to fetch each, saving the output in a file called data-N.out.new where N is an integer (the URL ordinal number in the file).



If there is no old data-N.out file, or if this file differs from data-N.out.new, then the URL is printed to standard output.



The fetched data file is then renamed for when you run the script again.



The first time you run the script, all URLs will be outputted as they have never been seen before.



Reordering the URLs, or adding new URLs at the top would make the URLs be flagged as changed as the contents of the corresponding data file has changed. You could fix this by using e.g. the base64-encoded URL as part of the output filename instead of $i.



Whether you use curl or wget or some other Web client is essentially unimportant.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 at 12:35

























answered Nov 23 at 12:00









Kusalananda

117k16221360




117k16221360












  • This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
    – ctrl-alt-delor
    Nov 23 at 12:03












  • Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
    – ctrl-alt-delor
    Nov 23 at 12:18


















  • This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
    – ctrl-alt-delor
    Nov 23 at 12:03












  • Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
    – ctrl-alt-delor
    Nov 23 at 12:18
















This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03






This question is a duplicate. Vote to open the old one, and put answer there. (And maybe edit question to make it clear.)
– ctrl-alt-delor
Nov 23 at 12:03














Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18




Yes that would work. Note there are some edits of the other question (just grammar, punctuation).
– ctrl-alt-delor
Nov 23 at 12:18










Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.













Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.












Βάσω Κουπετσιδου is a new contributor. Be nice, and check out our Code of Conduct.















 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483661%2fbash-script-to-read-a-web-page-list-from-a-text-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre