Delete .pdf files only if .xlsx files in directory have same filename?
I have folders with hundreds of pdf
and xls(x)
files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf
, ABCD_00000002.pdf
, ... , ABCD_00002000.pdf
. These mass exports include a blank pdf
file for every single xls(x)
file - with both having the exact same filename. E.g., ABCD_00000005.xlsx
is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf
is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
add a comment |
I have folders with hundreds of pdf
and xls(x)
files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf
, ABCD_00000002.pdf
, ... , ABCD_00002000.pdf
. These mass exports include a blank pdf
file for every single xls(x)
file - with both having the exact same filename. E.g., ABCD_00000005.xlsx
is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf
is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
1
Are they.xls
or.xlsx
? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28
add a comment |
I have folders with hundreds of pdf
and xls(x)
files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf
, ABCD_00000002.pdf
, ... , ABCD_00002000.pdf
. These mass exports include a blank pdf
file for every single xls(x)
file - with both having the exact same filename. E.g., ABCD_00000005.xlsx
is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf
is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
I have folders with hundreds of pdf
and xls(x)
files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf
, ABCD_00000002.pdf
, ... , ABCD_00002000.pdf
. These mass exports include a blank pdf
file for every single xls(x)
file - with both having the exact same filename. E.g., ABCD_00000005.xlsx
is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf
is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
bash shell files directory rm
edited Aug 10 at 16:28
Jesse_b
11.9k23064
11.9k23064
asked Aug 10 at 16:26
ck_chicago
161
161
1
Are they.xls
or.xlsx
? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28
add a comment |
1
Are they.xls
or.xlsx
? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28
1
1
Are they
.xls
or .xlsx
? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?– Jesse_b
Aug 10 at 16:28
Are they
.xls
or .xlsx
? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?– Jesse_b
Aug 10 at 16:28
add a comment |
2 Answers
2
active
oldest
votes
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now.
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. IfABCD_0000005.xlsx
andABCD_0000005.pdf
both exist, your code leavesABCD_0000005.pdf
alone. But ifimportant.pdf
exists, and there’s no corresponding spreadsheet, your code deletesimportant.pdf
.
– G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51
|
show 1 more comment
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done
If there's no matching pdf it won't hurt anything.
You don’t really need the""
; i.e., you could do/bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.
– G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now.
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. IfABCD_0000005.xlsx
andABCD_0000005.pdf
both exist, your code leavesABCD_0000005.pdf
alone. But ifimportant.pdf
exists, and there’s no corresponding spreadsheet, your code deletesimportant.pdf
.
– G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51
|
show 1 more comment
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now.
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. IfABCD_0000005.xlsx
andABCD_0000005.pdf
both exist, your code leavesABCD_0000005.pdf
alone. But ifimportant.pdf
exists, and there’s no corresponding spreadsheet, your code deletesimportant.pdf
.
– G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51
|
show 1 more comment
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now.
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now.
edited Dec 16 at 21:36
Rui F Ribeiro
38.9k1479129
38.9k1479129
answered Aug 10 at 16:31
choroba
26.3k44672
26.3k44672
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. IfABCD_0000005.xlsx
andABCD_0000005.pdf
both exist, your code leavesABCD_0000005.pdf
alone. But ifimportant.pdf
exists, and there’s no corresponding spreadsheet, your code deletesimportant.pdf
.
– G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51
|
show 1 more comment
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. IfABCD_0000005.xlsx
andABCD_0000005.pdf
both exist, your code leavesABCD_0000005.pdf
alone. But ifimportant.pdf
exists, and there’s no corresponding spreadsheet, your code deletesimportant.pdf
.
– G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. If
ABCD_0000005.xlsx
and ABCD_0000005.pdf
both exist, your code leaves ABCD_0000005.pdf
alone. But if important.pdf
exists, and there’s no corresponding spreadsheet, your code deletes important.pdf
.– G-Man
Aug 10 at 17:43
I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits. But I believe that Xalorous got it right and you (choroba) got it backwards. If
ABCD_0000005.xlsx
and ABCD_0000005.pdf
both exist, your code leaves ABCD_0000005.pdf
alone. But if important.pdf
exists, and there’s no corresponding spreadsheet, your code deletes important.pdf
.– G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just
rm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.– Xalorous
Aug 10 at 17:51
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just
rm -rf *.pdf
. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.– Xalorous
Aug 10 at 17:51
|
show 1 more comment
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done
If there's no matching pdf it won't hurt anything.
You don’t really need the""
; i.e., you could do/bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.
– G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
add a comment |
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done
If there's no matching pdf it won't hurt anything.
You don’t really need the""
; i.e., you could do/bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.
– G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
add a comment |
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done
If there's no matching pdf it won't hurt anything.
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done
If there's no matching pdf it won't hurt anything.
answered Aug 10 at 17:04
Xalorous
24218
24218
You don’t really need the""
; i.e., you could do/bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.
– G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
add a comment |
You don’t really need the""
; i.e., you could do/bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.
– G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
You don’t really need the
""
; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.– G-Man
Aug 10 at 17:42
You don’t really need the
""
; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf"
. But this looks like it should work.– G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Are they
.xls
or.xlsx
? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?– Jesse_b
Aug 10 at 16:28