Finding all “Non-Binary” files

up vote
35
down vote

favorite

Is it possible to use the find command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.

I've received an archive of files from a windows user. This archive contains source code and image files. Our build system doesn't play nice with files that have windows line endings. I have a command line program (flip -u) that will flip line endings between *nix and windows. So, I'd like to do something like this

find . -type f | xargs flip -u

However, if this command is run against an image file, or other binary media file, it will corrupt the file. I realize I could build a list of file extensions and filter with that, but I'd rather have something that's not reliant on me keeping that list up to date.

So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?

edited Feb 25 '16 at 0:33

asked Aug 24 '12 at 18:46

Alan Storm

5152615

1

You could use the file utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59

1

What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05

All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21

@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28

1

Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29

|
show 1 more comment

up vote
35
down vote

favorite

Is it possible to use the find command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.

find . -type f | xargs flip -u

So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?

edited Feb 25 '16 at 0:33

asked Aug 24 '12 at 18:46

Alan Storm

5152615

1

You could use the file utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59

1

What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05

All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21

@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28

1

Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29

|
show 1 more comment

up vote
35
down vote

favorite

Is it possible to use the find command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.

find . -type f | xargs flip -u

So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?

edited Feb 25 '16 at 0:33

asked Aug 24 '12 at 18:46

Alan Storm

5152615

Is it possible to use the find command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.

find . -type f | xargs flip -u

So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?

files find text newlines

edited Feb 25 '16 at 0:33

asked Aug 24 '12 at 18:46

Alan Storm

5152615

edited Feb 25 '16 at 0:33

asked Aug 24 '12 at 18:46

Alan Storm

5152615

edited Feb 25 '16 at 0:33

asked Aug 24 '12 at 18:46

Alan Storm

5152615

asked Aug 24 '12 at 18:46

Alan Storm

5152615

asked Aug 24 '12 at 18:46

Alan Storm

5152615

1

You could use the file utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59

1

What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05

All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21

@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28

1

Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29

|
show 1 more comment

1

You could use the file utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59

1

What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05

All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21

@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28

1

Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29

You could use the file utility somewhere in your script/pipeline to identify whether the file is data or text
– lk-
Aug 24 '12 at 18:59

What do you mean by non-binary (everything on a modern computer is binary). I am guessing you are using the distinction from the old C/PM operating system, that had text and binary files. Text files could be of any length but had to end with a ctrl-z, and binary files had to be a multiple of a 512byte block. If so you are meaning text file. (I also note that you write about line ending in non-binary files, this also would suggest that they are text files) Is this correct?
– ctrl-alt-delor
Jan 6 '17 at 17:05

All files are binary, it is just a mater of interpretation. Are you asking for how to find text files?
– ctrl-alt-delor
May 17 '17 at 20:21

@richard I come form an era where we called files meant to be interpreted as plain-text plain text, and all other files (images, word processing docs, etc.) binary. I know its all just one's and zeros under the hood :)
– Alan Storm
May 17 '17 at 20:28

Ah, I see what you mean about my terms -- I'll use binary/text in the future to avoid confusion. Re: the rn thing -- its my understand those are the ASCII characters for a typewriter's carriage return (move to the beginning of the line) and line feed (move down one line). So rn is a "more accurate" model of the real world physical thing an end of line character was for. Pre OS X, Macs used just a r for this. I usually write the whole thing off as "arbitrary choices made in a rush that we're still dealing with"
– Alan Storm
May 17 '17 at 22:29

|
show 1 more comment

9 Answers
9

active

oldest

votes

up vote
18
down vote

accepted

I'd use file and pipe the output into grep or awk to find text files, then extract just the filename portion of file's output and pipe that into xargs.

something like:

file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.

You can also use find (or whatever) to generate a list of files to examine with file:

find /path/to/files -type f -exec file {} + | 

  awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

The -d'n' argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0 when the input source doesn't or can't generate NULL-separated output (such as find's -print0 option). According to the changelog, xargs got the -d/--delimiter option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).

Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.

Also note that file is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.

I have used numerous variations of this method many times in the past with success.

edited Mar 21 '17 at 10:24

answered Aug 25 '12 at 1:15

cas

38.5k450100

1

Thanks for this solution! For some reason file displays English text rather than ASCII text on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}' with the equivalent cut -f1 -d:.
– Andrew Cheong
Dec 10 '13 at 18:12

2

worth saying grep -I filters binaries
– xenoterracide
Aug 10 '16 at 17:31

Looking for the word text should be sufficient. This will also pick up file descriptions like ASCII Java program text or HTML document text or troff or preprocessor input text.
– user1024
Nov 1 '16 at 23:02

My answer is partially a response/improvement upon this answer. Very good point about grepping for ASCII text to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03

1

xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38

|
show 2 more comments

up vote
9
down vote

No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.

Now, once you've ignored that...

zip files

If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).

zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.

other files

The file command will look at a file and try to identify it. In particular, you'll probably find its -i (output MIME type) option useful; only convert files with type text/*

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

add a comment |

up vote
6
down vote

A general solution to only process non-binary files in bash using file -b --mime-encoding:

while IFS= read -d '' -r file; do

  [[ "$(file -b --mime-encoding "$file")" = binary ]] &&

    { echo "Skipping   $file."; continue; }



  echo "Processing $file."



  # ...



done < <(find . -type f -print0)

I contacted the author of the file utility and he added a nifty -00 paramter in version 5.26 (released 2016-04-16, is e.g. in current Arch and Ubuntu 16.10) which prints fileresult for multiple files fed to it at once, this way you can do e.g.:

find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …

(The awk part is to filter out every file that isn't non-binary. ORS is the output separator.)

Can be also used in a loop of course:

while IFS= read -d '' -r file; do



  echo "Processing $file."



  # ...



done < <(find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')

Based of this and the previous I created a little bash script for filtering out binary files which utilizes the new method using the -00 parameter of file in newer versions of it and falls back to the previous method on older versions:

#!/bin/bash



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[[ $# -eq 0 ]] && exit



if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&

      printf '%s' "$f"

  done

fi

Or here a more POSIX-y one, but it requires support for sort -V:

#!/bin/sh



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[ $# -eq 0 ] && exit



if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" = 

    'file-5.26' ]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [ "$(file -b --mime-encoding -- "$f")" != binary ] &&

      printf '%s' "$f"

  done

fi

edited Mar 24 at 18:30

answered Mar 2 '16 at 11:10

phk

3,97652152

add a comment |

up vote
4
down vote

Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.

There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:

find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;

The find command only makes use of POSIX-specified features. Using -exec to run arbitrary commands as boolean tests is simple, robust (handles odd filenames correctly), and more portable than -print0.

In fact, all parts of the command are specified by POSIX except for flip.

Note that file doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.

(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)

edited Apr 13 '17 at 12:36

Community♦

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

Argument-less file calls can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27

Also you are assuming no file starts with -.
– phk
Nov 6 '16 at 17:29

And I see no reason why you wouldn't just do a single call to file, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45

@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the find command will prefix ./ to any filename passed to the shell command; (3) Using grep as a test on a single file command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59

I looked over your final "POSIX-y" solution and I think it's clever—but you assume that file supports the --mime-encoding flag and the -- separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02

|
show 1 more comment

up vote
4
down vote

The accepted answer didn't find all of them for me. Here is an example using grep's -I to ignore binaries, and ignoring all hidden files...

find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo

Here it is in use in a practical application: dos2unix

https://unix.stackexchange.com/a/365679/112190

Hope that helps.

answered May 17 '17 at 17:37

phyatt

25127

add a comment |

up vote
2
down vote

find . -type f -exec grep -I -q . {} ; -print

This will find all regular files (-type f) in the current directory (or below) that grep thinks are non-empty and non-binary.

It uses grep -I to distinguish between binary and non-binary files. The -I flag and will cause grep to exit with a non-zero exit status when it detects that a file is binary. A "binary" file is, according to grep, a file that contains character outside the printable ASCII range.

The -q option to grep will cause it to quit with a zero exit status if the given pattern is found, without emitting any data. The pattern that we use is a single dot, which will match any character.

If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.

If you feel brave, you can plug your flip -u into it as well:

find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;

edited Dec 4 at 13:31

answered May 17 '17 at 20:09

Kusalananda

120k16225369

add a comment |

up vote
1
down vote

Try this :

find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^         -~]' | xargs -0 -r flip -u

Where the argument of grep '[^ -~]' is '[^<tab><space>-~]'.

If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.

'[^<tab><space>-~]' will match any character which is not ASCII text (carriage returns are ignore by grep).

-L will print only the filename of files who does not match

-Z will output filenames separated with a null character (for xargs -0)

edited Jan 6 '17 at 19:49

phk

3,97652152

answered Jan 6 '17 at 15:24

Vouze

62037

It's worth noting that with Perl-like Regex grep -P (if available) t is available. Alternatively, using locale translation if the shell supports it: $'t' (bash and zsh do).
– phk
Jan 6 '17 at 19:51

add a comment |

up vote
1
down vote

Alternate solution:

The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:

find . -type f -exec dos2unix {} ;

answered Sep 21 '17 at 20:08

Spark

112

Since dos2unix can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41

add a comment |

up vote
0
down vote

sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;

i.( -type f -and -path '*/git/*' -iname ‘README’ ): searches for files within a path containing the name git and file with name README. If you know any specific folder and filename to search for it will be useful.

ii.-exec command runs a command on the file name generated by find

iii.; indicates end of command

iv.{} is the output of the file/foldername found from the previous find search

v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;

vii.grep

1.-l lists the name of the file

2.-I searches only non-binary files

3.-q quiet output

4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.

you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017

more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md

answered Sep 4 '17 at 21:04

alpha_989

1735

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f46276%2ffinding-all-non-binary-files%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

9 Answers
9

active

oldest

votes

9 Answers
9

active

oldest

votes

up vote
18
down vote

accepted

I'd use file and pipe the output into grep or awk to find text files, then extract just the filename portion of file's output and pipe that into xargs.

something like:

file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.

You can also use find (or whatever) to generate a list of files to examine with file:

find /path/to/files -type f -exec file {} + | 

  awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Also note that file is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.

I have used numerous variations of this method many times in the past with success.

edited Mar 21 '17 at 10:24

answered Aug 25 '12 at 1:15

cas

38.5k450100

1

Thanks for this solution! For some reason file displays English text rather than ASCII text on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}' with the equivalent cut -f1 -d:.
– Andrew Cheong
Dec 10 '13 at 18:12

2

worth saying grep -I filters binaries
– xenoterracide
Aug 10 '16 at 17:31

Looking for the word text should be sufficient. This will also pick up file descriptions like ASCII Java program text or HTML document text or troff or preprocessor input text.
– user1024
Nov 1 '16 at 23:02

My answer is partially a response/improvement upon this answer. Very good point about grepping for ASCII text to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03

1

xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38

|
show 2 more comments

up vote
18
down vote

accepted

I'd use file and pipe the output into grep or awk to find text files, then extract just the filename portion of file's output and pipe that into xargs.

something like:

file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.

You can also use find (or whatever) to generate a list of files to examine with file:

find /path/to/files -type f -exec file {} + | 

  awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Also note that file is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.

I have used numerous variations of this method many times in the past with success.

edited Mar 21 '17 at 10:24

answered Aug 25 '12 at 1:15

cas

38.5k450100

1

Thanks for this solution! For some reason file displays English text rather than ASCII text on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}' with the equivalent cut -f1 -d:.
– Andrew Cheong
Dec 10 '13 at 18:12

2

worth saying grep -I filters binaries
– xenoterracide
Aug 10 '16 at 17:31

Looking for the word text should be sufficient. This will also pick up file descriptions like ASCII Java program text or HTML document text or troff or preprocessor input text.
– user1024
Nov 1 '16 at 23:02

My answer is partially a response/improvement upon this answer. Very good point about grepping for ASCII text to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03

1

xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38

|
show 2 more comments

up vote
18
down vote

accepted

I'd use file and pipe the output into grep or awk to find text files, then extract just the filename portion of file's output and pipe that into xargs.

something like:

file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.

You can also use find (or whatever) to generate a list of files to examine with file:

find /path/to/files -type f -exec file {} + | 

  awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Also note that file is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.

I have used numerous variations of this method many times in the past with success.

edited Mar 21 '17 at 10:24

answered Aug 25 '12 at 1:15

cas

38.5k450100

I'd use file and pipe the output into grep or awk to find text files, then extract just the filename portion of file's output and pipe that into xargs.

something like:

file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.

You can also use find (or whatever) to generate a list of files to examine with file:

find /path/to/files -type f -exec file {} + | 

  awk -F: '/ASCII text/ {print $1}' | xargs -d'n' -r flip -u

Also note that file is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.

I have used numerous variations of this method many times in the past with success.

edited Mar 21 '17 at 10:24

answered Aug 25 '12 at 1:15

cas

38.5k450100

edited Mar 21 '17 at 10:24

answered Aug 25 '12 at 1:15

cas

38.5k450100

answered Aug 25 '12 at 1:15

cas

38.5k450100

answered Aug 25 '12 at 1:15

cas

38.5k450100

1

Thanks for this solution! For some reason file displays English text rather than ASCII text on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}' with the equivalent cut -f1 -d:.
– Andrew Cheong
Dec 10 '13 at 18:12

2

worth saying grep -I filters binaries
– xenoterracide
Aug 10 '16 at 17:31

Looking for the word text should be sufficient. This will also pick up file descriptions like ASCII Java program text or HTML document text or troff or preprocessor input text.
– user1024
Nov 1 '16 at 23:02

My answer is partially a response/improvement upon this answer. Very good point about grepping for ASCII text to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03

1

xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38

|
show 2 more comments

1

Thanks for this solution! For some reason file displays English text rather than ASCII text on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}' with the equivalent cut -f1 -d:.
– Andrew Cheong
Dec 10 '13 at 18:12

2

worth saying grep -I filters binaries
– xenoterracide
Aug 10 '16 at 17:31

Looking for the word text should be sufficient. This will also pick up file descriptions like ASCII Java program text or HTML document text or troff or preprocessor input text.
– user1024
Nov 1 '16 at 23:02

My answer is partially a response/improvement upon this answer. Very good point about grepping for ASCII text to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03

1

xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38

Thanks for this solution! For some reason file displays English text rather than ASCII text on my Solaris system, so I modified that portion accordingly. Also, I replaced awk -F: '{print $1}' with the equivalent cut -f1 -d:.
– Andrew Cheong
Dec 10 '13 at 18:12

worth saying grep -I filters binaries
– xenoterracide
Aug 10 '16 at 17:31

Looking for the word text should be sufficient. This will also pick up file descriptions like ASCII Java program text or HTML document text or troff or preprocessor input text.
– user1024
Nov 1 '16 at 23:02

My answer is partially a response/improvement upon this answer. Very good point about grepping for ASCII text to avoid messing up RTFs.
– Wildcard
Nov 5 '16 at 16:03

xenoterracide: You saved my life man ! Just a flag -I and BINGO
– Sergio Abreu
Jan 4 '17 at 21:38

|
show 2 more comments

up vote
9
down vote

Now, once you've ignored that...

zip files

zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.

other files

The file command will look at a file and try to identify it. In particular, you'll probably find its -i (output MIME type) option useful; only convert files with type text/*

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

add a comment |

up vote
9
down vote

Now, once you've ignored that...

zip files

zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.

other files

The file command will look at a file and try to identify it. In particular, you'll probably find its -i (output MIME type) option useful; only convert files with type text/*

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

add a comment |

up vote
9
down vote

Now, once you've ignored that...

zip files

zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.

other files

The file command will look at a file and try to identify it. In particular, you'll probably find its -i (output MIME type) option useful; only convert files with type text/*

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

Now, once you've ignored that...

zip files

zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.

other files

The file command will look at a file and try to identify it. In particular, you'll probably find its -i (output MIME type) option useful; only convert files with type text/*

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

answered Aug 24 '12 at 19:00

derobert

71.5k8152210

add a comment |

up vote
6
down vote

A general solution to only process non-binary files in bash using file -b --mime-encoding:

while IFS= read -d '' -r file; do

  [[ "$(file -b --mime-encoding "$file")" = binary ]] &&

    { echo "Skipping   $file."; continue; }



  echo "Processing $file."



  # ...



done < <(find . -type f -print0)

find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …

(The awk part is to filter out every file that isn't non-binary. ORS is the output separator.)

Can be also used in a loop of course:

while IFS= read -d '' -r file; do



  echo "Processing $file."



  # ...



done < <(find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')

#!/bin/bash



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[[ $# -eq 0 ]] && exit



if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&

      printf '%s' "$f"

  done

fi

Or here a more POSIX-y one, but it requires support for sort -V:

#!/bin/sh



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[ $# -eq 0 ] && exit



if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" = 

    'file-5.26' ]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [ "$(file -b --mime-encoding -- "$f")" != binary ] &&

      printf '%s' "$f"

  done

fi

edited Mar 24 at 18:30

answered Mar 2 '16 at 11:10

phk

3,97652152

add a comment |

up vote
6
down vote

A general solution to only process non-binary files in bash using file -b --mime-encoding:

while IFS= read -d '' -r file; do

  [[ "$(file -b --mime-encoding "$file")" = binary ]] &&

    { echo "Skipping   $file."; continue; }



  echo "Processing $file."



  # ...



done < <(find . -type f -print0)

find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …

(The awk part is to filter out every file that isn't non-binary. ORS is the output separator.)

Can be also used in a loop of course:

while IFS= read -d '' -r file; do



  echo "Processing $file."



  # ...



done < <(find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')

#!/bin/bash



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[[ $# -eq 0 ]] && exit



if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&

      printf '%s' "$f"

  done

fi

Or here a more POSIX-y one, but it requires support for sort -V:

#!/bin/sh



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[ $# -eq 0 ] && exit



if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" = 

    'file-5.26' ]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [ "$(file -b --mime-encoding -- "$f")" != binary ] &&

      printf '%s' "$f"

  done

fi

edited Mar 24 at 18:30

answered Mar 2 '16 at 11:10

phk

3,97652152

add a comment |

up vote
6
down vote

A general solution to only process non-binary files in bash using file -b --mime-encoding:

while IFS= read -d '' -r file; do

  [[ "$(file -b --mime-encoding "$file")" = binary ]] &&

    { echo "Skipping   $file."; continue; }



  echo "Processing $file."



  # ...



done < <(find . -type f -print0)

find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …

(The awk part is to filter out every file that isn't non-binary. ORS is the output separator.)

Can be also used in a loop of course:

while IFS= read -d '' -r file; do



  echo "Processing $file."



  # ...



done < <(find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')

#!/bin/bash



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[[ $# -eq 0 ]] && exit



if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&

      printf '%s' "$f"

  done

fi

Or here a more POSIX-y one, but it requires support for sort -V:

#!/bin/sh



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[ $# -eq 0 ] && exit



if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" = 

    'file-5.26' ]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [ "$(file -b --mime-encoding -- "$f")" != binary ] &&

      printf '%s' "$f"

  done

fi

edited Mar 24 at 18:30

answered Mar 2 '16 at 11:10

phk

3,97652152

A general solution to only process non-binary files in bash using file -b --mime-encoding:

while IFS= read -d '' -r file; do

  [[ "$(file -b --mime-encoding "$file")" = binary ]] &&

    { echo "Skipping   $file."; continue; }



  echo "Processing $file."



  # ...



done < <(find . -type f -print0)

find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}' | …

(The awk part is to filter out every file that isn't non-binary. ORS is the output separator.)

Can be also used in a loop of course:

while IFS= read -d '' -r file; do



  echo "Processing $file."



  # ...



done < <(find . -type f -exec file -00 --mime-encoding {} + |

  awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}')

#!/bin/bash



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[[ $# -eq 0 ]] && exit



if [[ "$(file -v)" =~ file-([1-9][0-9]|[6-9]|5.([3-9][0-9]|2[6-9])) ]]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [[ "$(file -b --mime-encoding -- "$f")" != binary ]] &&

      printf '%s' "$f"

  done

fi

Or here a more POSIX-y one, but it requires support for sort -V:

#!/bin/sh



# Expects files as arguments and returns the ones that do

# not appear to be binary files as a zero-separated list.

#

# USAGE:

#   filter_binary_files.sh [FILES...]

#

# EXAMPLE:

#   find . -type f -mtime +5 -exec ./filter_binary_files.sh {} + | xargs -0 ...

# 



[ $# -eq 0 ] && exit



if [ "$(printf '%sn' 'file-5.26' "$(file -v | head -1)" | sort -V)" = 

    'file-5.26' ]; then

  file -00 --mime-encoding -- "$@" |

    awk 'BEGIN{ORS=RS=""}{if(NR%2)f=$0;else if(!/binary/)print f}'

else

  for f do

    [ "$(file -b --mime-encoding -- "$f")" != binary ] &&

      printf '%s' "$f"

  done

fi

edited Mar 24 at 18:30

answered Mar 2 '16 at 11:10

phk

3,97652152

edited Mar 24 at 18:30

answered Mar 2 '16 at 11:10

phk

3,97652152

answered Mar 2 '16 at 11:10

phk

3,97652152

answered Mar 2 '16 at 11:10

phk

3,97652152

add a comment |

up vote
4
down vote

Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.

There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:

find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;

In fact, all parts of the command are specified by POSIX except for flip.

Note that file doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.

(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)

edited Apr 13 '17 at 12:36

Community♦

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

Argument-less file calls can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27

Also you are assuming no file starts with -.
– phk
Nov 6 '16 at 17:29

And I see no reason why you wouldn't just do a single call to file, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45

@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the find command will prefix ./ to any filename passed to the shell command; (3) Using grep as a test on a single file command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59

I looked over your final "POSIX-y" solution and I think it's clever—but you assume that file supports the --mime-encoding flag and the -- separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02

|
show 1 more comment

up vote
4
down vote

Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.

There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:

find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;

In fact, all parts of the command are specified by POSIX except for flip.

Note that file doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.

(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)

edited Apr 13 '17 at 12:36

Community♦

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

Argument-less file calls can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27

Also you are assuming no file starts with -.
– phk
Nov 6 '16 at 17:29

And I see no reason why you wouldn't just do a single call to file, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45

@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the find command will prefix ./ to any filename passed to the shell command; (3) Using grep as a test on a single file command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59

I looked over your final "POSIX-y" solution and I think it's clever—but you assume that file supports the --mime-encoding flag and the -- separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02

|
show 1 more comment

up vote
4
down vote

Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.

There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:

find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;

In fact, all parts of the command are specified by POSIX except for flip.

Note that file doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.

(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)

edited Apr 13 '17 at 12:36

Community♦

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

Cas's answer is good, but it assumes sane filenames; in particular it is assumed that filenames will not contain newlines.

There's no good reason to make this assumption here, since it is quite simple (and actually cleaner in my opinion) to handle that case correctly as well:

find . -type f -exec sh -c 'file "$1" | grep -q "ASCII text"' sh {} ; -exec flip -u {} ;

In fact, all parts of the command are specified by POSIX except for flip.

Note that file doesn't guarantee accuracy of the results it returns. However, in practice grepping for "ASCII text" in its output is quite reliable.

(It might miss some text files perhaps, but is very very unlikely to incorrectly identify a binary file as "ASCII text" and mangle it—so we are erring on the side of caution.)

edited Apr 13 '17 at 12:36

Community♦

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

edited Apr 13 '17 at 12:36

Community♦

edited Apr 13 '17 at 12:36

Community♦

edited Apr 13 '17 at 12:36

Community♦

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

answered Nov 5 '16 at 16:01

Wildcard

22.6k961164

Argument-less file calls can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27

Also you are assuming no file starts with -.
– phk
Nov 6 '16 at 17:29

And I see no reason why you wouldn't just do a single call to file, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45

@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the find command will prefix ./ to any filename passed to the shell command; (3) Using grep as a test on a single file command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59

I looked over your final "POSIX-y" solution and I think it's clever—but you assume that file supports the --mime-encoding flag and the -- separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02

|
show 1 more comment

Argument-less file calls can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27

Also you are assuming no file starts with -.
– phk
Nov 6 '16 at 17:29

And I see no reason why you wouldn't just do a single call to file, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45

@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the find command will prefix ./ to any filename passed to the shell command; (3) Using grep as a test on a single file command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59

I looked over your final "POSIX-y" solution and I think it's clever—but you assume that file supports the --mime-encoding flag and the -- separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02

Argument-less file calls can be quite slow, e.g. for videos it will tell you everything about the encoding.
– phk
Nov 6 '16 at 17:27

Also you are assuming no file starts with -.
– phk
Nov 6 '16 at 17:29

And I see no reason why you wouldn't just do a single call to file, it can take multiple files as arguments.
– phk
Nov 6 '16 at 17:45

@phk, to address your comments: (1) it's good to know the potential slowness, but I see no POSIX way to prevent that; (2) I make zero assumptions about file names, as the find command will prefix ./ to any filename passed to the shell command; (3) Using grep as a test on a single file command output at a time is the only POSIX way I can see to guarantee correct handling of filenames that may contain newlines.
– Wildcard
Nov 6 '16 at 18:59

I looked over your final "POSIX-y" solution and I think it's clever—but you assume that file supports the --mime-encoding flag and the -- separator, neither of which is guaranteed by POSIX.
– Wildcard
Nov 6 '16 at 19:02

|
show 1 more comment

up vote
4
down vote

The accepted answer didn't find all of them for me. Here is an example using grep's -I to ignore binaries, and ignoring all hidden files...

find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo

Here it is in use in a practical application: dos2unix

https://unix.stackexchange.com/a/365679/112190

Hope that helps.

answered May 17 '17 at 17:37

phyatt

25127

add a comment |

up vote
4
down vote

The accepted answer didn't find all of them for me. Here is an example using grep's -I to ignore binaries, and ignoring all hidden files...

find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo

Here it is in use in a practical application: dos2unix

https://unix.stackexchange.com/a/365679/112190

Hope that helps.

answered May 17 '17 at 17:37

phyatt

25127

add a comment |

up vote
4
down vote

The accepted answer didn't find all of them for me. Here is an example using grep's -I to ignore binaries, and ignoring all hidden files...

find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo

Here it is in use in a practical application: dos2unix

https://unix.stackexchange.com/a/365679/112190

Hope that helps.

answered May 17 '17 at 17:37

phyatt

25127

The accepted answer didn't find all of them for me. Here is an example using grep's -I to ignore binaries, and ignoring all hidden files...

find . -type f -not -path '*/.*' -exec grep -Il '.' {} ; | xargs -L 1 echo

Here it is in use in a practical application: dos2unix

https://unix.stackexchange.com/a/365679/112190

Hope that helps.

answered May 17 '17 at 17:37

phyatt

25127

answered May 17 '17 at 17:37

phyatt

25127

answered May 17 '17 at 17:37

phyatt

25127

answered May 17 '17 at 17:37

phyatt

25127

add a comment |

up vote
2
down vote

find . -type f -exec grep -I -q . {} ; -print

This will find all regular files (-type f) in the current directory (or below) that grep thinks are non-empty and non-binary.

If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.

If you feel brave, you can plug your flip -u into it as well:

find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;

edited Dec 4 at 13:31

answered May 17 '17 at 20:09

Kusalananda

120k16225369

add a comment |

up vote
2
down vote

find . -type f -exec grep -I -q . {} ; -print

This will find all regular files (-type f) in the current directory (or below) that grep thinks are non-empty and non-binary.

If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.

If you feel brave, you can plug your flip -u into it as well:

find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;

edited Dec 4 at 13:31

answered May 17 '17 at 20:09

Kusalananda

120k16225369

add a comment |

up vote
2
down vote

find . -type f -exec grep -I -q . {} ; -print

This will find all regular files (-type f) in the current directory (or below) that grep thinks are non-empty and non-binary.

If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.

If you feel brave, you can plug your flip -u into it as well:

find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;

edited Dec 4 at 13:31

answered May 17 '17 at 20:09

Kusalananda

120k16225369

find . -type f -exec grep -I -q . {} ; -print

This will find all regular files (-type f) in the current directory (or below) that grep thinks are non-empty and non-binary.

If the file is found to be non-binary and if it contains at least one character, the name of the file is printed.

If you feel brave, you can plug your flip -u into it as well:

find . -type f -exec grep -I -q . {} ; -print -exec flip -u {} ;

edited Dec 4 at 13:31

answered May 17 '17 at 20:09

Kusalananda

120k16225369

edited Dec 4 at 13:31

answered May 17 '17 at 20:09

Kusalananda

120k16225369

answered May 17 '17 at 20:09

Kusalananda

120k16225369

answered May 17 '17 at 20:09

Kusalananda

120k16225369

add a comment |

up vote
1
down vote

Try this :

find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^         -~]' | xargs -0 -r flip -u

Where the argument of grep '[^ -~]' is '[^<tab><space>-~]'.

If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.

'[^<tab><space>-~]' will match any character which is not ASCII text (carriage returns are ignore by grep).

-L will print only the filename of files who does not match

-Z will output filenames separated with a null character (for xargs -0)

edited Jan 6 '17 at 19:49

phk

3,97652152

answered Jan 6 '17 at 15:24

Vouze

62037

It's worth noting that with Perl-like Regex grep -P (if available) t is available. Alternatively, using locale translation if the shell supports it: $'t' (bash and zsh do).
– phk
Jan 6 '17 at 19:51

add a comment |

up vote
1
down vote

Try this :

find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^         -~]' | xargs -0 -r flip -u

Where the argument of grep '[^ -~]' is '[^<tab><space>-~]'.

If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.

'[^<tab><space>-~]' will match any character which is not ASCII text (carriage returns are ignore by grep).

-L will print only the filename of files who does not match

-Z will output filenames separated with a null character (for xargs -0)

edited Jan 6 '17 at 19:49

phk

3,97652152

answered Jan 6 '17 at 15:24

Vouze

62037

It's worth noting that with Perl-like Regex grep -P (if available) t is available. Alternatively, using locale translation if the shell supports it: $'t' (bash and zsh do).
– phk
Jan 6 '17 at 19:51

add a comment |

up vote
1
down vote

Try this :

find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^         -~]' | xargs -0 -r flip -u

Where the argument of grep '[^ -~]' is '[^<tab><space>-~]'.

If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.

'[^<tab><space>-~]' will match any character which is not ASCII text (carriage returns are ignore by grep).

-L will print only the filename of files who does not match

-Z will output filenames separated with a null character (for xargs -0)

edited Jan 6 '17 at 19:49

phk

3,97652152

answered Jan 6 '17 at 15:24

Vouze

62037

Try this :

find . -type f -print0 | xargs -0 -r grep -Z -L -U '[^         -~]' | xargs -0 -r flip -u

Where the argument of grep '[^ -~]' is '[^<tab><space>-~]'.

If you type it on a shell command line, type Ctrl+V before Tab.
In an editor, there should be no problem.

'[^<tab><space>-~]' will match any character which is not ASCII text (carriage returns are ignore by grep).

-L will print only the filename of files who does not match

-Z will output filenames separated with a null character (for xargs -0)

edited Jan 6 '17 at 19:49

phk

3,97652152

answered Jan 6 '17 at 15:24

Vouze

62037

edited Jan 6 '17 at 19:49

phk

3,97652152

edited Jan 6 '17 at 19:49

phk

3,97652152

edited Jan 6 '17 at 19:49

phk

3,97652152

answered Jan 6 '17 at 15:24

Vouze

62037

answered Jan 6 '17 at 15:24

Vouze

62037

answered Jan 6 '17 at 15:24

Vouze

62037

It's worth noting that with Perl-like Regex grep -P (if available) t is available. Alternatively, using locale translation if the shell supports it: $'t' (bash and zsh do).
– phk
Jan 6 '17 at 19:51

add a comment |

It's worth noting that with Perl-like Regex grep -P (if available) t is available. Alternatively, using locale translation if the shell supports it: $'t' (bash and zsh do).
– phk
Jan 6 '17 at 19:51

It's worth noting that with Perl-like Regex grep -P (if available) t is available. Alternatively, using locale translation if the shell supports it: $'t' (bash and zsh do).
– phk
Jan 6 '17 at 19:51

add a comment |

up vote
1
down vote

Alternate solution:

The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:

find . -type f -exec dos2unix {} ;

answered Sep 21 '17 at 20:08

Spark

112

Since dos2unix can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41

add a comment |

up vote
1
down vote

Alternate solution:

The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:

find . -type f -exec dos2unix {} ;

answered Sep 21 '17 at 20:08

Spark

112

Since dos2unix can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41

add a comment |

up vote
1
down vote

Alternate solution:

The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:

find . -type f -exec dos2unix {} ;

answered Sep 21 '17 at 20:08

Spark

112

Alternate solution:

The dos2unix command will convert line endings from Windows CRLF to Unix LF, and automatically skip binary files. I apply it recursively using:

find . -type f -exec dos2unix {} ;

answered Sep 21 '17 at 20:08

Spark

112

answered Sep 21 '17 at 20:08

Spark

112

answered Sep 21 '17 at 20:08

Spark

112

answered Sep 21 '17 at 20:08

Spark

112

Since dos2unix can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41

add a comment |

Since dos2unix can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41

Since dos2unix can take multiple filenames as argument, it is much more efficient to do find . -type f -exec dos2unix {} +
– Anthon
Sep 21 '17 at 20:41

add a comment |

up vote
0
down vote

sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;

ii.-exec command runs a command on the file name generated by find

iii.; indicates end of command

iv.{} is the output of the file/foldername found from the previous find search

v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;

vii.grep

1.-l lists the name of the file

2.-I searches only non-binary files

3.-q quiet output

4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.

you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017

more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md

answered Sep 4 '17 at 21:04

alpha_989

1735

add a comment |

up vote
0
down vote

sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;

ii.-exec command runs a command on the file name generated by find

iii.; indicates end of command

iv.{} is the output of the file/foldername found from the previous find search

v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;

vii.grep

1.-l lists the name of the file

2.-I searches only non-binary files

3.-q quiet output

4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.

you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017

more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md

answered Sep 4 '17 at 21:04

alpha_989

1735

add a comment |

up vote
0
down vote

sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;

ii.-exec command runs a command on the file name generated by find

iii.; indicates end of command

iv.{} is the output of the file/foldername found from the previous find search

v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;

vii.grep

1.-l lists the name of the file

2.-I searches only non-binary files

3.-q quiet output

4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.

you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017

more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md

answered Sep 4 '17 at 21:04

alpha_989

1735

sudo find / ( -type f -and -path '*/git/*' -iname ‘README’ ) -exec grep -liI '100644|100755' {} ; -exec flip -u {} ;

ii.-exec command runs a command on the file name generated by find

iii.; indicates end of command

iv.{} is the output of the file/foldername found from the previous find search

v.Multiple commands can be run on subsequently. By appending -exec "command" ; such as with -exec flip -u ;

vii.grep

1.-l lists the name of the file

2.-I searches only non-binary files

3.-q quiet output

4.'100644|100755' searches for either 100644 or 100755 within the file found. if found it then runs flip -u. | is the or operator for grep.

you can clone this test directory and try it out: https://github.com/alphaCTzo7G/stackexchange/tree/master/linux/findSolution204092017

more detailed answer here: https://github.com/alphaCTzo7G/stackexchange/blob/master/linux/findSolution204092017/README.md

answered Sep 4 '17 at 21:04

alpha_989

1735

answered Sep 4 '17 at 21:04

alpha_989

1735

answered Sep 4 '17 at 21:04

alpha_989

1735

answered Sep 4 '17 at 21:04

alpha_989

1735

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk