Delete .pdf files only if .xlsx files in directory have same filename?












3














I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.










share|improve this question




















  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28
















3














I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.










share|improve this question




















  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28














3












3








3







I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.










share|improve this question















I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.







bash shell files directory rm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 10 at 16:28









Jesse_b

11.9k23064




11.9k23064










asked Aug 10 at 16:26









ck_chicago

161




161








  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28














  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28








1




1




Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28




Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28










2 Answers
2






active

oldest

votes


















6














Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now.






share|improve this answer























  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01












  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43












  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51





















4














Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer





















  • You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









6














Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now.






share|improve this answer























  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01












  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43












  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51


















6














Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now.






share|improve this answer























  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01












  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43












  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51
















6












6








6






Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now.






share|improve this answer














Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=${pdf%.pdf}
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 16 at 21:36









Rui F Ribeiro

38.9k1479129




38.9k1479129










answered Aug 10 at 16:31









choroba

26.3k44672




26.3k44672












  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01












  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43












  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51




















  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01












  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43












  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51


















Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00




Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00












@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01






@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01














It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04




It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04












I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
– G-Man
Aug 10 at 17:43






I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
– G-Man
Aug 10 at 17:43














Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51






Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51















4














Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer





















  • You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46
















4














Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer





















  • You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46














4












4








4






Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer












Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "${xls%.xls*}"".pdf"
done


If there's no matching pdf it won't hurt anything.







share|improve this answer












share|improve this answer



share|improve this answer










answered Aug 10 at 17:04









Xalorous

24218




24218












  • You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46


















  • You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46
















You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
– G-Man
Aug 10 at 17:42




You don’t really need the ""; i.e., you could do /bin/rm -f "${xls%.xls*}.pdf".  But this looks like it should work.
– G-Man
Aug 10 at 17:42












I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46




I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre