How to follow all “HTML links”, and yet only save .zip files

up vote
-1
down vote

favorite

I am trying to do this:

function download() {

  wget 

    -r 

    -c 

    -A .zip,.html,.jsp,.php,.cgi 

    --reject-regex "(.*)?(.*)" 

    --secure-protocol=TLSv1_2 

    "$1"

}

The problems I'm facing are:

I only want the .zip files.

The .zip files are linked to on the "HTML" pages, which can be found under URLs such as /foo, /foo.html, /foo.jsp, /foo.php, /foo.cgi, and a few others I am not aware of I'm sure.

So I am trying to say: "Visit every HTML link, but download every .zip file." Wondering how to do this properly with wget. I am also skipping visiting the URL parameter links because currently it downloads them all, but if there is a better way to handle this (such as just collecting the links from them and not downloading them, that would be good to know).

The above doesn't work because it misses links without an extension like /foo. Plus, I don't want to save the php and other files, just the zip ones. Right now I am just leaving off the -A and downloading everything, then removing it in an custom run script after wget finishes which doesn't seem right. Basically just wondering how to do this correctly with wget.

asked Nov 16 at 17:43

Lance Pollard

1397

add a comment |

up vote
-1
down vote

favorite

I am trying to do this:

function download() {

  wget 

    -r 

    -c 

    -A .zip,.html,.jsp,.php,.cgi 

    --reject-regex "(.*)?(.*)" 

    --secure-protocol=TLSv1_2 

    "$1"

}

The problems I'm facing are:

I only want the .zip files.

The .zip files are linked to on the "HTML" pages, which can be found under URLs such as /foo, /foo.html, /foo.jsp, /foo.php, /foo.cgi, and a few others I am not aware of I'm sure.

asked Nov 16 at 17:43

Lance Pollard

1397

add a comment |

up vote
-1
down vote

favorite

I am trying to do this:

function download() {

  wget 

    -r 

    -c 

    -A .zip,.html,.jsp,.php,.cgi 

    --reject-regex "(.*)?(.*)" 

    --secure-protocol=TLSv1_2 

    "$1"

}

The problems I'm facing are:

I only want the .zip files.

The .zip files are linked to on the "HTML" pages, which can be found under URLs such as /foo, /foo.html, /foo.jsp, /foo.php, /foo.cgi, and a few others I am not aware of I'm sure.

asked Nov 16 at 17:43

Lance Pollard

1397

I am trying to do this:

function download() {

  wget 

    -r 

    -c 

    -A .zip,.html,.jsp,.php,.cgi 

    --reject-regex "(.*)?(.*)" 

    --secure-protocol=TLSv1_2 

    "$1"

}

The problems I'm facing are:

I only want the .zip files.

The .zip files are linked to on the "HTML" pages, which can be found under URLs such as /foo, /foo.html, /foo.jsp, /foo.php, /foo.cgi, and a few others I am not aware of I'm sure.

wget

asked Nov 16 at 17:43

Lance Pollard

1397

asked Nov 16 at 17:43

Lance Pollard

1397

asked Nov 16 at 17:43

Lance Pollard

1397

asked Nov 16 at 17:43

Lance Pollard

1397

asked Nov 16 at 17:43

Lance Pollard

1397

add a comment |

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482204%2fhow-to-follow-all-html-links-and-yet-only-save-zip-files%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk