Finding all “Non-Binary” files
up vote
35
down vote
favorite
Is it possible to use the find
command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.
I've received an archive of files from a windows user. This archive contains source code and image files. Our build system doesn't play nice with files that have windows line endings. I have a command line program (flip -u
) that will flip line endings between *nix and windows. So, I'd like to do something like this
find . -type f | xargs flip -u
However, if this command is run against an image file, or other binary media file, it will corrupt the file. I realize I could build a list of file extensions and filter with that, but I'd rather have something that's not reliant on me keeping that list up to date.
So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?
files find text newlines
|
show 1 more comment
up vote
35
down vote
favorite
Is it possible to use the find
command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.
I've received an archive of files from a windows user. This archive contains source code and image files. Our build system doesn't play nice with files that have windows line endings. I have a command line program (flip -u
) that will flip line endings between *nix and windows. So, I'd like to do something like this
find . -type f | xargs flip -u
However, if this command is run against an image file, or other binary media file, it will corrupt the file. I realize I could build a list of file extensions and filter with that, but I'd rather have something that's not reliant on me keeping that list up to date.
So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?
files find text newlines
1
You could use thefile
utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59
1
What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05
All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21
@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28
1
Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29
|
show 1 more comment
up vote
35
down vote
favorite
up vote
35
down vote
favorite
Is it possible to use the find
command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.
I've received an archive of files from a windows user. This archive contains source code and image files. Our build system doesn't play nice with files that have windows line endings. I have a command line program (flip -u
) that will flip line endings between *nix and windows. So, I'd like to do something like this
find . -type f | xargs flip -u
However, if this command is run against an image file, or other binary media file, it will corrupt the file. I realize I could build a list of file extensions and filter with that, but I'd rather have something that's not reliant on me keeping that list up to date.
So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?
files find text newlines
Is it possible to use the find
command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.
I've received an archive of files from a windows user. This archive contains source code and image files. Our build system doesn't play nice with files that have windows line endings. I have a command line program (flip -u
) that will flip line endings between *nix and windows. So, I'd like to do something like this
find . -type f | xargs flip -u
However, if this command is run against an image file, or other binary media file, it will corrupt the file. I realize I could build a list of file extensions and filter with that, but I'd rather have something that's not reliant on me keeping that list up to date.
So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?
files find text newlines
files find text newlines
edited Feb 25 '16 at 0:33
asked Aug 24 '12 at 18:46
Alan Storm
5152615
5152615
1
You could use thefile
utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59
1
What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05
All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21
@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28
1
Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29
|
show 1 more comment
1
You could use thefile
utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59
1
What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05
All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21
@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28
1
Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29
1
1
You could use the
file
utility somewhere in your script/pipeline to identify whether the file is data or text– lk-
Aug 24 '12 at 18:59
You could use the
file
utility somewhere in your script/pipeline to identify whether the file is data or text– lk-
Aug 24 '12 at 18:59
1
1
What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05
What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05
All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21
All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21
@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28
@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28
1
1
Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29
Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29
|
show 1 more comment
9 Answers
9
active
oldest
votes
up vote
18
down vote
accepted
I'd use file
and pipe the output into grep or awk to find text files, then extract just the filename portion of file
's output and pipe that into xargs.
something like:
file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.
You can also use find
(or whatever) to generate a list of files to examine with file
:
find /path/to/files -type f -exec file {} + |
awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
The -d'n'
argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0
when the input source doesn't or can't generate NULL-separated output (such as find
's -print0
option). According to the changelog, xargs got the -d
/--delimiter
option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).
Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.
Also note that file
is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.
I have used numerous variations of this method many times in the past with success.
1
Thanks for this solution! For some reasonfile
displaysEnglish text
rather thanASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replacedawk -F: '{print $1}'
with the equivalentcut -f1 -d:
.
– Andrew Cheong
Dec 10 '13 at 18:12
2
worth sayinggrep -I
filters binaries
– xenoterracide
Aug 10 '16 at 17:31
Looking for the wordtext
should be sufficient. This will also pick upfile
descriptions likeASCII Java program text
orHTML document text
ortroff or preprocessor input text
.
– user1024
Nov 1 '16 at 23:02
My answer is partially a response/improvement upon this answer. Very good point about grepping forASCII text
to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03
1
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
|
show 2 more comments
up vote
9
down vote
No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.
Now, once you've ignored that...
zip files
If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a
option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).
zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.
other files
The file command will look at a file and try to identify it. In particular, you'll probably find its -i
(output MIME type) option useful; only convert files with type text/*
add a comment |
up vote
6
down vote
A general solution to only process non-binary files in bash
using file -b --mime-encoding
:
while IFS= read -d '' -r file; do
[[ "$(file -b --mime-encoding "$file")" = binary ]] &&
{ echo "Skipping $file."; continue; }
echo "Processing $file."
# ...
done < <(find . -type f -print0)
I contacted the author of the file utility and he added a nifty -00
paramter in version 5.26 (released 2016-04-16, is e.g. in current Arch and Ubuntu 16.10) which prints fileresult
for multiple files fed to it at once, this way you can do e.g.:
find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …
(The awk
part is to filter out every file that isn't non-binary. ORS
is the output separator.)
Can be also used in a loop of course:
while IFS= read -d '' -r file; do
echo "Processing $file."
# ...
done < <(find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')
Based of this and the previous I created a little bash
script for filtering out binary files which utilizes the new method using the -00
parameter of file
in newer versions of it and falls back to the previous method on older versions:
#!/bin/bash
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[[ $# -eq 0 ]] && exit
if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&
printf '%s' "$f"
done
fi
Or here a more POSIX-y one, but it requires support for sort -V
:
#!/bin/sh
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[ $# -eq 0 ] && exit
if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" =
'file-5.26' ]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[ "$(file -b --mime-encoding -- "$f")" != binary ] &&
printf '%s' "$f"
done
fi
add a comment |
up vote
4
down vote
Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.
There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:
find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;
The find
command only makes use of POSIX-specified features. Using -exec
to run arbitrary commands as boolean tests is simple, robust (handles odd filenames correctly), and more portable than -print0
.
In fact, all parts of the command are specified by POSIX except for flip
.
Note that file
doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.
(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)
Argument-less filecalls
can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27
Also you are assuming no file starts with-
.
– phk
Nov 6 '16 at 17:29
And I see no reason why you wouldn't just do a single call tofile
, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as thefind
command will prefix./
to any filename passed to the shell command; (3) Usinggrep
as a test on a singlefile
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59
I looked over your final "POSIX-y" solution and I think it's clever—but you assume thatfile
supports the--mime-encoding
flag and the--
separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02
|
show 1 more comment
up vote
4
down vote
The accepted answer didn't find all of them for me. Here is an example using grep's -I
to ignore binaries, and ignoring all hidden files...
find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo
Here it is in use in a practical application: dos2unix
https://unix.stackexchange.com/a/365679/112190
Hope that helps.
add a comment |
up vote
2
down vote
find . -type f -exec grep -I -q . {} ; -print
This will find all regular files (-type f
) in the current directory (or below) that grep
thinks are non-empty and non-binary.
It uses grep -I
to distinguish between binary and non-binary files. The -I
flag and will cause grep
to exit with a non-zero exit status when it detects that a file is binary. A "binary" file is, according to grep
, a file that contains character outside the printable ASCII range.
The -q
option to grep
will cause it to quit with a zero exit status if the given pattern is found, without emitting any data. The pattern that we use is a single dot, which will match any character.
If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.
If you feel brave, you can plug your flip -u
into it as well:
find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;
add a comment |
up vote
1
down vote
Try this :
find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^ -~]' | xargs -0 -r flip -u
Where the argument of grep '[^ -~]'
is '[^<tab><space>-~]'
.
If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.
'[^<tab><space>-~]'
will match any character which is not ASCII text (carriage returns are ignore bygrep
).
-L
will print only the filename of files who does not match
-Z
will output filenames separated with a null character (forxargs -0
)
It's worth noting that with Perl-like Regexgrep -P
(if available)t
is available. Alternatively, using locale translation if the shell supports it:$'t'
(bash
andzsh
do).
– phk
Jan 6 '17 at 19:51
add a comment |
up vote
1
down vote
Alternate solution:
The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:
find . -type f -exec dos2unix {} ;
Sincedos2unix
can take multiple filenames as argument, it is much more efficient to dofind . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
add a comment |
up vote
0
down vote
sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;
i.( -type f -and -path '*/git/*' -iname ‘README’ ): searches for files within a path containing the name git and file with name README. If you know any specific folder and filename to search for it will be useful.
ii.-exec command runs a command on the file name generated by find
iii.; indicates end of command
iv.{} is the output of the file/foldername found from the previous find search
v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;
vii.grep
1.-l lists the name of the file
2.-I searches only non-binary files
3.-q quiet output
4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.
you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017
more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f46276%2ffinding-all-non-binary-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
9 Answers
9
active
oldest
votes
9 Answers
9
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
18
down vote
accepted
I'd use file
and pipe the output into grep or awk to find text files, then extract just the filename portion of file
's output and pipe that into xargs.
something like:
file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.
You can also use find
(or whatever) to generate a list of files to examine with file
:
find /path/to/files -type f -exec file {} + |
awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
The -d'n'
argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0
when the input source doesn't or can't generate NULL-separated output (such as find
's -print0
option). According to the changelog, xargs got the -d
/--delimiter
option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).
Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.
Also note that file
is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.
I have used numerous variations of this method many times in the past with success.
1
Thanks for this solution! For some reasonfile
displaysEnglish text
rather thanASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replacedawk -F: '{print $1}'
with the equivalentcut -f1 -d:
.
– Andrew Cheong
Dec 10 '13 at 18:12
2
worth sayinggrep -I
filters binaries
– xenoterracide
Aug 10 '16 at 17:31
Looking for the wordtext
should be sufficient. This will also pick upfile
descriptions likeASCII Java program text
orHTML document text
ortroff or preprocessor input text
.
– user1024
Nov 1 '16 at 23:02
My answer is partially a response/improvement upon this answer. Very good point about grepping forASCII text
to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03
1
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
|
show 2 more comments
up vote
18
down vote
accepted
I'd use file
and pipe the output into grep or awk to find text files, then extract just the filename portion of file
's output and pipe that into xargs.
something like:
file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.
You can also use find
(or whatever) to generate a list of files to examine with file
:
find /path/to/files -type f -exec file {} + |
awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
The -d'n'
argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0
when the input source doesn't or can't generate NULL-separated output (such as find
's -print0
option). According to the changelog, xargs got the -d
/--delimiter
option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).
Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.
Also note that file
is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.
I have used numerous variations of this method many times in the past with success.
1
Thanks for this solution! For some reasonfile
displaysEnglish text
rather thanASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replacedawk -F: '{print $1}'
with the equivalentcut -f1 -d:
.
– Andrew Cheong
Dec 10 '13 at 18:12
2
worth sayinggrep -I
filters binaries
– xenoterracide
Aug 10 '16 at 17:31
Looking for the wordtext
should be sufficient. This will also pick upfile
descriptions likeASCII Java program text
orHTML document text
ortroff or preprocessor input text
.
– user1024
Nov 1 '16 at 23:02
My answer is partially a response/improvement upon this answer. Very good point about grepping forASCII text
to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03
1
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
|
show 2 more comments
up vote
18
down vote
accepted
up vote
18
down vote
accepted
I'd use file
and pipe the output into grep or awk to find text files, then extract just the filename portion of file
's output and pipe that into xargs.
something like:
file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.
You can also use find
(or whatever) to generate a list of files to examine with file
:
find /path/to/files -type f -exec file {} + |
awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
The -d'n'
argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0
when the input source doesn't or can't generate NULL-separated output (such as find
's -print0
option). According to the changelog, xargs got the -d
/--delimiter
option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).
Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.
Also note that file
is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.
I have used numerous variations of this method many times in the past with success.
I'd use file
and pipe the output into grep or awk to find text files, then extract just the filename portion of file
's output and pipe that into xargs.
something like:
file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.
You can also use find
(or whatever) to generate a list of files to examine with file
:
find /path/to/files -type f -exec file {} + |
awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u
The -d'n'
argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0
when the input source doesn't or can't generate NULL-separated output (such as find
's -print0
option). According to the changelog, xargs got the -d
/--delimiter
option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).
Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.
Also note that file
is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.
I have used numerous variations of this method many times in the past with success.
edited Mar 21 '17 at 10:24
answered Aug 25 '12 at 1:15
cas
38.5k450100
38.5k450100
1
Thanks for this solution! For some reasonfile
displaysEnglish text
rather thanASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replacedawk -F: '{print $1}'
with the equivalentcut -f1 -d:
.
– Andrew Cheong
Dec 10 '13 at 18:12
2
worth sayinggrep -I
filters binaries
– xenoterracide
Aug 10 '16 at 17:31
Looking for the wordtext
should be sufficient. This will also pick upfile
descriptions likeASCII Java program text
orHTML document text
ortroff or preprocessor input text
.
– user1024
Nov 1 '16 at 23:02
My answer is partially a response/improvement upon this answer. Very good point about grepping forASCII text
to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03
1
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
|
show 2 more comments
1
Thanks for this solution! For some reasonfile
displaysEnglish text
rather thanASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replacedawk -F: '{print $1}'
with the equivalentcut -f1 -d:
.
– Andrew Cheong
Dec 10 '13 at 18:12
2
worth sayinggrep -I
filters binaries
– xenoterracide
Aug 10 '16 at 17:31
Looking for the wordtext
should be sufficient. This will also pick upfile
descriptions likeASCII Java program text
orHTML document text
ortroff or preprocessor input text
.
– user1024
Nov 1 '16 at 23:02
My answer is partially a response/improvement upon this answer. Very good point about grepping forASCII text
to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03
1
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
1
1
Thanks for this solution! For some reason
file
displays English text
rather than ASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}'
with the equivalent cut -f1 -d:
.– Andrew Cheong
Dec 10 '13 at 18:12
Thanks for this solution! For some reason
file
displays English text
rather than ASCII text
on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}'
with the equivalent cut -f1 -d:
.– Andrew Cheong
Dec 10 '13 at 18:12
2
2
worth saying
grep -I
filters binaries– xenoterracide
Aug 10 '16 at 17:31
worth saying
grep -I
filters binaries– xenoterracide
Aug 10 '16 at 17:31
Looking for the word
text
should be sufficient. This will also pick up file
descriptions like ASCII Java program text
or HTML document text
or troff or preprocessor input text
.– user1024
Nov 1 '16 at 23:02
Looking for the word
text
should be sufficient. This will also pick up file
descriptions like ASCII Java program text
or HTML document text
or troff or preprocessor input text
.– user1024
Nov 1 '16 at 23:02
My answer is partially a response/improvement upon this answer. Very good point about grepping for
ASCII text
to avoid messing up RTFs.– Wildcard
Nov 5 '16 at 16:03
My answer is partially a response/improvement upon this answer. Very good point about grepping for
ASCII text
to avoid messing up RTFs.– Wildcard
Nov 5 '16 at 16:03
1
1
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38
|
show 2 more comments
up vote
9
down vote
No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.
Now, once you've ignored that...
zip files
If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a
option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).
zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.
other files
The file command will look at a file and try to identify it. In particular, you'll probably find its -i
(output MIME type) option useful; only convert files with type text/*
add a comment |
up vote
9
down vote
No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.
Now, once you've ignored that...
zip files
If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a
option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).
zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.
other files
The file command will look at a file and try to identify it. In particular, you'll probably find its -i
(output MIME type) option useful; only convert files with type text/*
add a comment |
up vote
9
down vote
up vote
9
down vote
No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.
Now, once you've ignored that...
zip files
If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a
option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).
zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.
other files
The file command will look at a file and try to identify it. In particular, you'll probably find its -i
(output MIME type) option useful; only convert files with type text/*
No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.
Now, once you've ignored that...
zip files
If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a
option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).
zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.
other files
The file command will look at a file and try to identify it. In particular, you'll probably find its -i
(output MIME type) option useful; only convert files with type text/*
answered Aug 24 '12 at 19:00
derobert
71.5k8152210
71.5k8152210
add a comment |
add a comment |
up vote
6
down vote
A general solution to only process non-binary files in bash
using file -b --mime-encoding
:
while IFS= read -d '' -r file; do
[[ "$(file -b --mime-encoding "$file")" = binary ]] &&
{ echo "Skipping $file."; continue; }
echo "Processing $file."
# ...
done < <(find . -type f -print0)
I contacted the author of the file utility and he added a nifty -00
paramter in version 5.26 (released 2016-04-16, is e.g. in current Arch and Ubuntu 16.10) which prints fileresult
for multiple files fed to it at once, this way you can do e.g.:
find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …
(The awk
part is to filter out every file that isn't non-binary. ORS
is the output separator.)
Can be also used in a loop of course:
while IFS= read -d '' -r file; do
echo "Processing $file."
# ...
done < <(find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')
Based of this and the previous I created a little bash
script for filtering out binary files which utilizes the new method using the -00
parameter of file
in newer versions of it and falls back to the previous method on older versions:
#!/bin/bash
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[[ $# -eq 0 ]] && exit
if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&
printf '%s' "$f"
done
fi
Or here a more POSIX-y one, but it requires support for sort -V
:
#!/bin/sh
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[ $# -eq 0 ] && exit
if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" =
'file-5.26' ]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[ "$(file -b --mime-encoding -- "$f")" != binary ] &&
printf '%s' "$f"
done
fi
add a comment |
up vote
6
down vote
A general solution to only process non-binary files in bash
using file -b --mime-encoding
:
while IFS= read -d '' -r file; do
[[ "$(file -b --mime-encoding "$file")" = binary ]] &&
{ echo "Skipping $file."; continue; }
echo "Processing $file."
# ...
done < <(find . -type f -print0)
I contacted the author of the file utility and he added a nifty -00
paramter in version 5.26 (released 2016-04-16, is e.g. in current Arch and Ubuntu 16.10) which prints fileresult
for multiple files fed to it at once, this way you can do e.g.:
find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …
(The awk
part is to filter out every file that isn't non-binary. ORS
is the output separator.)
Can be also used in a loop of course:
while IFS= read -d '' -r file; do
echo "Processing $file."
# ...
done < <(find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')
Based of this and the previous I created a little bash
script for filtering out binary files which utilizes the new method using the -00
parameter of file
in newer versions of it and falls back to the previous method on older versions:
#!/bin/bash
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[[ $# -eq 0 ]] && exit
if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&
printf '%s' "$f"
done
fi
Or here a more POSIX-y one, but it requires support for sort -V
:
#!/bin/sh
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[ $# -eq 0 ] && exit
if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" =
'file-5.26' ]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[ "$(file -b --mime-encoding -- "$f")" != binary ] &&
printf '%s' "$f"
done
fi
add a comment |
up vote
6
down vote
up vote
6
down vote
A general solution to only process non-binary files in bash
using file -b --mime-encoding
:
while IFS= read -d '' -r file; do
[[ "$(file -b --mime-encoding "$file")" = binary ]] &&
{ echo "Skipping $file."; continue; }
echo "Processing $file."
# ...
done < <(find . -type f -print0)
I contacted the author of the file utility and he added a nifty -00
paramter in version 5.26 (released 2016-04-16, is e.g. in current Arch and Ubuntu 16.10) which prints fileresult
for multiple files fed to it at once, this way you can do e.g.:
find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …
(The awk
part is to filter out every file that isn't non-binary. ORS
is the output separator.)
Can be also used in a loop of course:
while IFS= read -d '' -r file; do
echo "Processing $file."
# ...
done < <(find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')
Based of this and the previous I created a little bash
script for filtering out binary files which utilizes the new method using the -00
parameter of file
in newer versions of it and falls back to the previous method on older versions:
#!/bin/bash
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[[ $# -eq 0 ]] && exit
if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&
printf '%s' "$f"
done
fi
Or here a more POSIX-y one, but it requires support for sort -V
:
#!/bin/sh
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[ $# -eq 0 ] && exit
if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" =
'file-5.26' ]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[ "$(file -b --mime-encoding -- "$f")" != binary ] &&
printf '%s' "$f"
done
fi
A general solution to only process non-binary files in bash
using file -b --mime-encoding
:
while IFS= read -d '' -r file; do
[[ "$(file -b --mime-encoding "$file")" = binary ]] &&
{ echo "Skipping $file."; continue; }
echo "Processing $file."
# ...
done < <(find . -type f -print0)
I contacted the author of the file utility and he added a nifty -00
paramter in version 5.26 (released 2016-04-16, is e.g. in current Arch and Ubuntu 16.10) which prints fileresult
for multiple files fed to it at once, this way you can do e.g.:
find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …
(The awk
part is to filter out every file that isn't non-binary. ORS
is the output separator.)
Can be also used in a loop of course:
while IFS= read -d '' -r file; do
echo "Processing $file."
# ...
done < <(find . -type f -exec file -00 --mime-encoding {} + |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')
Based of this and the previous I created a little bash
script for filtering out binary files which utilizes the new method using the -00
parameter of file
in newer versions of it and falls back to the previous method on older versions:
#!/bin/bash
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[[ $# -eq 0 ]] && exit
if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&
printf '%s' "$f"
done
fi
Or here a more POSIX-y one, but it requires support for sort -V
:
#!/bin/sh
# Expects files as arguments and returns the ones that do
# not appear to be binary files as a zero-separated list.
#
# USAGE:
# filter_binary_files.sh [FILES...]
#
# EXAMPLE:
# find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...
#
[ $# -eq 0 ] && exit
if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" =
'file-5.26' ]; then
file -00 --mime-encoding -- "$@" |
awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'
else
for f do
[ "$(file -b --mime-encoding -- "$f")" != binary ] &&
printf '%s' "$f"
done
fi
edited Mar 24 at 18:30
answered Mar 2 '16 at 11:10
phk
3,97652152
3,97652152
add a comment |
add a comment |
up vote
4
down vote
Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.
There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:
find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;
The find
command only makes use of POSIX-specified features. Using -exec
to run arbitrary commands as boolean tests is simple, robust (handles odd filenames correctly), and more portable than -print0
.
In fact, all parts of the command are specified by POSIX except for flip
.
Note that file
doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.
(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)
Argument-less filecalls
can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27
Also you are assuming no file starts with-
.
– phk
Nov 6 '16 at 17:29
And I see no reason why you wouldn't just do a single call tofile
, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as thefind
command will prefix./
to any filename passed to the shell command; (3) Usinggrep
as a test on a singlefile
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59
I looked over your final "POSIX-y" solution and I think it's clever—but you assume thatfile
supports the--mime-encoding
flag and the--
separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02
|
show 1 more comment
up vote
4
down vote
Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.
There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:
find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;
The find
command only makes use of POSIX-specified features. Using -exec
to run arbitrary commands as boolean tests is simple, robust (handles odd filenames correctly), and more portable than -print0
.
In fact, all parts of the command are specified by POSIX except for flip
.
Note that file
doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.
(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)
Argument-less filecalls
can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27
Also you are assuming no file starts with-
.
– phk
Nov 6 '16 at 17:29
And I see no reason why you wouldn't just do a single call tofile
, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as thefind
command will prefix./
to any filename passed to the shell command; (3) Usinggrep
as a test on a singlefile
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59
I looked over your final "POSIX-y" solution and I think it's clever—but you assume thatfile
supports the--mime-encoding
flag and the--
separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02
|
show 1 more comment
up vote
4
down vote
up vote
4
down vote
Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.
There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:
find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;
The find
command only makes use of POSIX-specified features. Using -exec
to run arbitrary commands as boolean tests is simple, robust (handles odd filenames correctly), and more portable than -print0
.
In fact, all parts of the command are specified by POSIX except for flip
.
Note that file
doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.
(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)
Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.
There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:
find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;
The find
command only makes use of POSIX-specified features. Using -exec
to run arbitrary commands as boolean tests is simple, robust (handles odd filenames correctly), and more portable than -print0
.
In fact, all parts of the command are specified by POSIX except for flip
.
Note that file
doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.
(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)
edited Apr 13 '17 at 12:36
Community♦
1
1
answered Nov 5 '16 at 16:01
Wildcard
22.6k961164
22.6k961164
Argument-less filecalls
can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27
Also you are assuming no file starts with-
.
– phk
Nov 6 '16 at 17:29
And I see no reason why you wouldn't just do a single call tofile
, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as thefind
command will prefix./
to any filename passed to the shell command; (3) Usinggrep
as a test on a singlefile
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59
I looked over your final "POSIX-y" solution and I think it's clever—but you assume thatfile
supports the--mime-encoding
flag and the--
separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02
|
show 1 more comment
Argument-less filecalls
can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27
Also you are assuming no file starts with-
.
– phk
Nov 6 '16 at 17:29
And I see no reason why you wouldn't just do a single call tofile
, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as thefind
command will prefix./
to any filename passed to the shell command; (3) Usinggrep
as a test on a singlefile
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59
I looked over your final "POSIX-y" solution and I think it's clever—but you assume thatfile
supports the--mime-encoding
flag and the--
separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02
Argument-less file
calls
can be quite slow, e.g. for videos it will tell you everything about the encoding.– phk
Nov 6 '16 at 17:27
Argument-less file
calls
can be quite slow, e.g. for videos it will tell you everything about the encoding.– phk
Nov 6 '16 at 17:27
Also you are assuming no file starts with
-
.– phk
Nov 6 '16 at 17:29
Also you are assuming no file starts with
-
.– phk
Nov 6 '16 at 17:29
And I see no reason why you wouldn't just do a single call to
file
, it can take multiple files as arguments.– phk
Nov 6 '16 at 17:45
And I see no reason why you wouldn't just do a single call to
file
, it can take multiple files as arguments.– phk
Nov 6 '16 at 17:45
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the
find
command will prefix ./
to any filename passed to the shell command; (3) Using grep
as a test on a single file
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.– Wildcard
Nov 6 '16 at 18:59
@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the
find
command will prefix ./
to any filename passed to the shell command; (3) Using grep
as a test on a single file
command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.– Wildcard
Nov 6 '16 at 18:59
I looked over your final "POSIX-y" solution and I think it's clever—but you assume that
file
supports the --mime-encoding
flag and the --
separator, neither of which is guaranteed by POSIX.– Wildcard
Nov 6 '16 at 19:02
I looked over your final "POSIX-y" solution and I think it's clever—but you assume that
file
supports the --mime-encoding
flag and the --
separator, neither of which is guaranteed by POSIX.– Wildcard
Nov 6 '16 at 19:02
|
show 1 more comment
up vote
4
down vote
The accepted answer didn't find all of them for me. Here is an example using grep's -I
to ignore binaries, and ignoring all hidden files...
find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo
Here it is in use in a practical application: dos2unix
https://unix.stackexchange.com/a/365679/112190
Hope that helps.
add a comment |
up vote
4
down vote
The accepted answer didn't find all of them for me. Here is an example using grep's -I
to ignore binaries, and ignoring all hidden files...
find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo
Here it is in use in a practical application: dos2unix
https://unix.stackexchange.com/a/365679/112190
Hope that helps.
add a comment |
up vote
4
down vote
up vote
4
down vote
The accepted answer didn't find all of them for me. Here is an example using grep's -I
to ignore binaries, and ignoring all hidden files...
find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo
Here it is in use in a practical application: dos2unix
https://unix.stackexchange.com/a/365679/112190
Hope that helps.
The accepted answer didn't find all of them for me. Here is an example using grep's -I
to ignore binaries, and ignoring all hidden files...
find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo
Here it is in use in a practical application: dos2unix
https://unix.stackexchange.com/a/365679/112190
Hope that helps.
answered May 17 '17 at 17:37
phyatt
25127
25127
add a comment |
add a comment |
up vote
2
down vote
find . -type f -exec grep -I -q . {} ; -print
This will find all regular files (-type f
) in the current directory (or below) that grep
thinks are non-empty and non-binary.
It uses grep -I
to distinguish between binary and non-binary files. The -I
flag and will cause grep
to exit with a non-zero exit status when it detects that a file is binary. A "binary" file is, according to grep
, a file that contains character outside the printable ASCII range.
The -q
option to grep
will cause it to quit with a zero exit status if the given pattern is found, without emitting any data. The pattern that we use is a single dot, which will match any character.
If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.
If you feel brave, you can plug your flip -u
into it as well:
find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;
add a comment |
up vote
2
down vote
find . -type f -exec grep -I -q . {} ; -print
This will find all regular files (-type f
) in the current directory (or below) that grep
thinks are non-empty and non-binary.
It uses grep -I
to distinguish between binary and non-binary files. The -I
flag and will cause grep
to exit with a non-zero exit status when it detects that a file is binary. A "binary" file is, according to grep
, a file that contains character outside the printable ASCII range.
The -q
option to grep
will cause it to quit with a zero exit status if the given pattern is found, without emitting any data. The pattern that we use is a single dot, which will match any character.
If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.
If you feel brave, you can plug your flip -u
into it as well:
find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;
add a comment |
up vote
2
down vote
up vote
2
down vote
find . -type f -exec grep -I -q . {} ; -print
This will find all regular files (-type f
) in the current directory (or below) that grep
thinks are non-empty and non-binary.
It uses grep -I
to distinguish between binary and non-binary files. The -I
flag and will cause grep
to exit with a non-zero exit status when it detects that a file is binary. A "binary" file is, according to grep
, a file that contains character outside the printable ASCII range.
The -q
option to grep
will cause it to quit with a zero exit status if the given pattern is found, without emitting any data. The pattern that we use is a single dot, which will match any character.
If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.
If you feel brave, you can plug your flip -u
into it as well:
find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;
find . -type f -exec grep -I -q . {} ; -print
This will find all regular files (-type f
) in the current directory (or below) that grep
thinks are non-empty and non-binary.
It uses grep -I
to distinguish between binary and non-binary files. The -I
flag and will cause grep
to exit with a non-zero exit status when it detects that a file is binary. A "binary" file is, according to grep
, a file that contains character outside the printable ASCII range.
The -q
option to grep
will cause it to quit with a zero exit status if the given pattern is found, without emitting any data. The pattern that we use is a single dot, which will match any character.
If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.
If you feel brave, you can plug your flip -u
into it as well:
find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;
edited Dec 4 at 13:31
answered May 17 '17 at 20:09
Kusalananda
120k16225369
120k16225369
add a comment |
add a comment |
up vote
1
down vote
Try this :
find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^ -~]' | xargs -0 -r flip -u
Where the argument of grep '[^ -~]'
is '[^<tab><space>-~]'
.
If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.
'[^<tab><space>-~]'
will match any character which is not ASCII text (carriage returns are ignore bygrep
).
-L
will print only the filename of files who does not match
-Z
will output filenames separated with a null character (forxargs -0
)
It's worth noting that with Perl-like Regexgrep -P
(if available)t
is available. Alternatively, using locale translation if the shell supports it:$'t'
(bash
andzsh
do).
– phk
Jan 6 '17 at 19:51
add a comment |
up vote
1
down vote
Try this :
find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^ -~]' | xargs -0 -r flip -u
Where the argument of grep '[^ -~]'
is '[^<tab><space>-~]'
.
If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.
'[^<tab><space>-~]'
will match any character which is not ASCII text (carriage returns are ignore bygrep
).
-L
will print only the filename of files who does not match
-Z
will output filenames separated with a null character (forxargs -0
)
It's worth noting that with Perl-like Regexgrep -P
(if available)t
is available. Alternatively, using locale translation if the shell supports it:$'t'
(bash
andzsh
do).
– phk
Jan 6 '17 at 19:51
add a comment |
up vote
1
down vote
up vote
1
down vote
Try this :
find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^ -~]' | xargs -0 -r flip -u
Where the argument of grep '[^ -~]'
is '[^<tab><space>-~]'
.
If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.
'[^<tab><space>-~]'
will match any character which is not ASCII text (carriage returns are ignore bygrep
).
-L
will print only the filename of files who does not match
-Z
will output filenames separated with a null character (forxargs -0
)
Try this :
find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^ -~]' | xargs -0 -r flip -u
Where the argument of grep '[^ -~]'
is '[^<tab><space>-~]'
.
If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.
'[^<tab><space>-~]'
will match any character which is not ASCII text (carriage returns are ignore bygrep
).
-L
will print only the filename of files who does not match
-Z
will output filenames separated with a null character (forxargs -0
)
edited Jan 6 '17 at 19:49
phk
3,97652152
3,97652152
answered Jan 6 '17 at 15:24
Vouze
62037
62037
It's worth noting that with Perl-like Regexgrep -P
(if available)t
is available. Alternatively, using locale translation if the shell supports it:$'t'
(bash
andzsh
do).
– phk
Jan 6 '17 at 19:51
add a comment |
It's worth noting that with Perl-like Regexgrep -P
(if available)t
is available. Alternatively, using locale translation if the shell supports it:$'t'
(bash
andzsh
do).
– phk
Jan 6 '17 at 19:51
It's worth noting that with Perl-like Regex
grep -P
(if available) t
is available. Alternatively, using locale translation if the shell supports it: $'t'
(bash
and zsh
do).– phk
Jan 6 '17 at 19:51
It's worth noting that with Perl-like Regex
grep -P
(if available) t
is available. Alternatively, using locale translation if the shell supports it: $'t'
(bash
and zsh
do).– phk
Jan 6 '17 at 19:51
add a comment |
up vote
1
down vote
Alternate solution:
The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:
find . -type f -exec dos2unix {} ;
Sincedos2unix
can take multiple filenames as argument, it is much more efficient to dofind . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
add a comment |
up vote
1
down vote
Alternate solution:
The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:
find . -type f -exec dos2unix {} ;
Sincedos2unix
can take multiple filenames as argument, it is much more efficient to dofind . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
add a comment |
up vote
1
down vote
up vote
1
down vote
Alternate solution:
The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:
find . -type f -exec dos2unix {} ;
Alternate solution:
The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:
find . -type f -exec dos2unix {} ;
answered Sep 21 '17 at 20:08
Spark
112
112
Sincedos2unix
can take multiple filenames as argument, it is much more efficient to dofind . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
add a comment |
Sincedos2unix
can take multiple filenames as argument, it is much more efficient to dofind . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
Since
dos2unix
can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
Since
dos2unix
can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41
add a comment |
up vote
0
down vote
sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;
i.( -type f -and -path '*/git/*' -iname ‘README’ ): searches for files within a path containing the name git and file with name README. If you know any specific folder and filename to search for it will be useful.
ii.-exec command runs a command on the file name generated by find
iii.; indicates end of command
iv.{} is the output of the file/foldername found from the previous find search
v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;
vii.grep
1.-l lists the name of the file
2.-I searches only non-binary files
3.-q quiet output
4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.
you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017
more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md
add a comment |
up vote
0
down vote
sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;
i.( -type f -and -path '*/git/*' -iname ‘README’ ): searches for files within a path containing the name git and file with name README. If you know any specific folder and filename to search for it will be useful.
ii.-exec command runs a command on the file name generated by find
iii.; indicates end of command
iv.{} is the output of the file/foldername found from the previous find search
v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;
vii.grep
1.-l lists the name of the file
2.-I searches only non-binary files
3.-q quiet output
4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.
you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017
more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md
add a comment |
up vote
0
down vote
up vote
0
down vote
sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;
i.( -type f -and -path '*/git/*' -iname ‘README’ ): searches for files within a path containing the name git and file with name README. If you know any specific folder and filename to search for it will be useful.
ii.-exec command runs a command on the file name generated by find
iii.; indicates end of command
iv.{} is the output of the file/foldername found from the previous find search
v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;
vii.grep
1.-l lists the name of the file
2.-I searches only non-binary files
3.-q quiet output
4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.
you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017
more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md
sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;
i.( -type f -and -path '*/git/*' -iname ‘README’ ): searches for files within a path containing the name git and file with name README. If you know any specific folder and filename to search for it will be useful.
ii.-exec command runs a command on the file name generated by find
iii.; indicates end of command
iv.{} is the output of the file/foldername found from the previous find search
v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;
vii.grep
1.-l lists the name of the file
2.-I searches only non-binary files
3.-q quiet output
4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.
you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017
more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md
answered Sep 4 '17 at 21:04
alpha_989
1735
1735
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f46276%2ffinding-all-non-binary-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You could use the
file
utility somewhere in your script/pipeline to identify whether the file is data or text– lk-
Aug 24 '12 at 18:59
1
What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05
All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21
@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28
1
Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29