Can we trust the files in a filesystem that was repaired by e2fsck?

up vote
1
down vote

favorite

Topic

If a filesystem was successfully repaired by e2fsck, it is guaranteed that it is in a consistent (clean) state. However, it is not easy to assess the reliability of the files themselves after the repair.

This question aims on criteria to judge the integrity of the data stored in ext2 and ext4 filesystems that were repaired after being damaged in a specific failure scenario.

Background

I use an ext2 filesystem in an external USB HDD (i.e. platter based, no flash) to back up several Linux machines. For that, I mount the drive manually with the options rw, relatime (in total), so no sync option is used.

Just recently, after doing a large backup (several 100 GB) from an openSUSE 13.1 system (Linux kernel 3.11.6-4) and after all write activities to the USB HDD were finished, I was not able to unmount that drive: The umount command blocked and did not return. The same applied for an subsequently issued sync command, which entered an uninterruptible sleep (ps state D).

This was when I unplugged the USB HDD, which did not release the blocks.

An attempt to power off the machine thereafter by standard means (pm-utils) also got stuck. To bring the machine down, I used the SysRq salute r, e, i, s, u, b. But even there, the requests s (sync) and u (remount read-only) did not succeed: According to the kernel documentation for sysrq.c (sysrq.txt) these requests are not completed before they explicitly announce that the are, which none of them did in this case. So none of the mounted filesystems was confirmed to be cleanly unmounted when the SysRq b (reboot) hit, which finally initiated a complete reboot.

Checking all involved filesystems (ext4 on root partition and ext2 on USB HDD) with e2fsck, I luckily found the root filesystem clean, and the filesystem on the USB HDD only showed wrong counts of free blocks and free inodes, which could be repaired by e2fsck.

The Systemd journal of the machine that was used here did not show any entry related to the blocking of the umount and the syncs. In particular there were no entries related to IO problems. The USB unplug event and the rest of my measures apart from the SysRqs were properly logged.

S.M.A.R.T. and badblocks tests that were performed on the USB HDD after that incident did not reveal any anomalies. The drive, which is about 5 months old, seems to work normal now.

Variations

I encountered the same scenario several times in the last years with different USB HDDs (none of them older than 16 months) and on different Linux machines running different kernel versions. The only deviation in my treatment was that I sometimes used the power button instead of SysRq to bring the machine down.

At each of these incidents, I checked all possibly affected filesystems (all ext2 and ext4) with e2fsck, finding all of them in one of the following error states:

Clean filesystem.

Unclean filesystem which e2fsck could repair by just replaying the journal (ext4).

Filesystem showing wrong counts of free blocks and free inodes which could be corrected by e2fsck.

Filesystem containing orphaned inodes which e2fsck connected to lost+found.

Filesystem containing multiply-claimed inodes (claimed by several files) which were cloned by e2fsck.

The actual question

An ext2 or ext4 filesystem that was affected by the scenario described above and thereafter was successfully repaired by e2fsck is surely in an consistent (clean) state.

But what about contents and metadata of the files within that filesystem?

Is there a unique correlation between the filesystem damages found by e2fsck and data corruption? For example like:

If no other damages than wrong counts were found in the filesystem,
the actual file data are okay.

Or:

If the filesystem contains multiply-claimed inodes, the contents of at
least one file is corrupted.

Or is it the opposite: Filesystem and file data are independent in so far as one can not conclude from damages of the one to those of the other—at least without exact knowledge about what caused the damage on device communication level?

In the latter case, the described scenario could have corrupted the file contents even if the filesystem was later found to be clean. Right?

Are there any experience values or reasoned criteria that can be taken to assess the integrity of the files depending on the filesystem errors that were found by e2fsck?

In this context, the answer of Gilles to "How to test file system correction done by fsck" is a good read.

The distinction between filesystem and data integrity is also addressed in the section "Data Mode" in the kernel documentation of the ext4 filesystem. To the latter, I was pointed by the excellent answer of Mikel to "Do journaling filesystems guarantee against corruption after a power failure?", which is also very relevant to this topic.

Own guess and impact

Systemd offers the service unit (template) systemd-fsck@.service which by default "preens" filesystems selected by passno in /etc/fstab at boot time. According to the description of the -p option in man page e2fsck(8), preening "automatically fix[es] any filesystem problems that can be safely fixed without human intervention." Unfortunately the description does not specify whether "safely" refers to the filesystem consistency alone or it also includes the contents and metadata of the files.

However, since this Systemd service initiates the preening in a way that is totally transparent to the user, there are at least some experts who sufficiently trust in the results of corresponding filesystem repairs.

So, based on a vague feeling (!), I would say that for clean filesystems (error state 1 described above) and such that could be repaired by just replaying the journal (error state 2) it is safe to assume that the files themselves are not corrupted, even after such an incident.

For filesystems that were in error state 5, on the other hand, I would refer to a backup.

So, why all that fuss? Agreed: In case of a standard home or root filesystem, I would just compare its contents against the latest backup. But in this case, these backups are on the affected USB HDD themselves. If there are some doubts about their integrity, several machines need to be instantly backed up again In addition, this renders older backups which were accumulated during a revolving backup strategy on that drive, and which otherwise could have been used as snapshots of the corresponding data, meaningless.

So it would be quite useful to have some reasoned and reliable criteria on how far we can trust the data on an ext2 or ext4 filesystem that was repaired after being affected by the described scenario.

Further findings

Trying to solve that problem by my own, I found this excellent chapter about fsck in Oracles System Administration Guide for Sun. Albeit it describes the USF version of fsck, the general ideas apply to e2fsck as well. But also this very detailed document focuses on the usage of fsck and the filesystem itself rather than considering the latter's payload.

In this answer to "What does fsck -p (preen) do on ext4?", Noah posted a list of filesystem errors that can be handled automatically by fsck preening an ext4 filesystem and those that can not be. It would be great to have such a list of filesystem errors that indicates which ones of them imply in addition a corruption of file data and which ones do not—of course only if such a correlation exists...

It his answer, Michael Prokopec mentioned the importance of write caches to this question. In this respect, I found in the answer of Tall Jeff to "SATA Disks that handle write caching properly?" that at least most SATA drives have write caching enabled by default. However, according to the same post, drives try to flush these caches as fast as they can. But of course there are no guarantees...

edited Nov 19 at 19:47

asked Nov 13 at 12:53

Jürgen

add a comment |

up vote
1
down vote

favorite

Topic

This question aims on criteria to judge the integrity of the data stored in ext2 and ext4 filesystems that were repaired after being damaged in a specific failure scenario.

Background

This was when I unplugged the USB HDD, which did not release the blocks.

S.M.A.R.T. and badblocks tests that were performed on the USB HDD after that incident did not reveal any anomalies. The drive, which is about 5 months old, seems to work normal now.

Variations

At each of these incidents, I checked all possibly affected filesystems (all ext2 and ext4) with e2fsck, finding all of them in one of the following error states:

Clean filesystem.

Unclean filesystem which e2fsck could repair by just replaying the journal (ext4).

Filesystem showing wrong counts of free blocks and free inodes which could be corrected by e2fsck.

Filesystem containing orphaned inodes which e2fsck connected to lost+found.

Filesystem containing multiply-claimed inodes (claimed by several files) which were cloned by e2fsck.

The actual question

An ext2 or ext4 filesystem that was affected by the scenario described above and thereafter was successfully repaired by e2fsck is surely in an consistent (clean) state.

But what about contents and metadata of the files within that filesystem?

Is there a unique correlation between the filesystem damages found by e2fsck and data corruption? For example like:

If no other damages than wrong counts were found in the filesystem,
the actual file data are okay.

Or:

If the filesystem contains multiply-claimed inodes, the contents of at
least one file is corrupted.

In the latter case, the described scenario could have corrupted the file contents even if the filesystem was later found to be clean. Right?

Are there any experience values or reasoned criteria that can be taken to assess the integrity of the files depending on the filesystem errors that were found by e2fsck?

In this context, the answer of Gilles to "How to test file system correction done by fsck" is a good read.

Own guess and impact

For filesystems that were in error state 5, on the other hand, I would refer to a backup.

Further findings

edited Nov 19 at 19:47

asked Nov 13 at 12:53

Jürgen

add a comment |

up vote
1
down vote

favorite

Topic

This question aims on criteria to judge the integrity of the data stored in ext2 and ext4 filesystems that were repaired after being damaged in a specific failure scenario.

Background

This was when I unplugged the USB HDD, which did not release the blocks.

S.M.A.R.T. and badblocks tests that were performed on the USB HDD after that incident did not reveal any anomalies. The drive, which is about 5 months old, seems to work normal now.

Variations

At each of these incidents, I checked all possibly affected filesystems (all ext2 and ext4) with e2fsck, finding all of them in one of the following error states:

Clean filesystem.

Unclean filesystem which e2fsck could repair by just replaying the journal (ext4).

Filesystem showing wrong counts of free blocks and free inodes which could be corrected by e2fsck.

Filesystem containing orphaned inodes which e2fsck connected to lost+found.

Filesystem containing multiply-claimed inodes (claimed by several files) which were cloned by e2fsck.

The actual question

An ext2 or ext4 filesystem that was affected by the scenario described above and thereafter was successfully repaired by e2fsck is surely in an consistent (clean) state.

But what about contents and metadata of the files within that filesystem?

Is there a unique correlation between the filesystem damages found by e2fsck and data corruption? For example like:

If no other damages than wrong counts were found in the filesystem,
the actual file data are okay.

Or:

If the filesystem contains multiply-claimed inodes, the contents of at
least one file is corrupted.

In the latter case, the described scenario could have corrupted the file contents even if the filesystem was later found to be clean. Right?

Are there any experience values or reasoned criteria that can be taken to assess the integrity of the files depending on the filesystem errors that were found by e2fsck?

In this context, the answer of Gilles to "How to test file system correction done by fsck" is a good read.

Own guess and impact

For filesystems that were in error state 5, on the other hand, I would refer to a backup.

Further findings

edited Nov 19 at 19:47

asked Nov 13 at 12:53

Jürgen

Topic

This question aims on criteria to judge the integrity of the data stored in ext2 and ext4 filesystems that were repaired after being damaged in a specific failure scenario.

Background

This was when I unplugged the USB HDD, which did not release the blocks.

S.M.A.R.T. and badblocks tests that were performed on the USB HDD after that incident did not reveal any anomalies. The drive, which is about 5 months old, seems to work normal now.

Variations

At each of these incidents, I checked all possibly affected filesystems (all ext2 and ext4) with e2fsck, finding all of them in one of the following error states:

Clean filesystem.

Unclean filesystem which e2fsck could repair by just replaying the journal (ext4).

Filesystem showing wrong counts of free blocks and free inodes which could be corrected by e2fsck.

Filesystem containing orphaned inodes which e2fsck connected to lost+found.

Filesystem containing multiply-claimed inodes (claimed by several files) which were cloned by e2fsck.

The actual question

An ext2 or ext4 filesystem that was affected by the scenario described above and thereafter was successfully repaired by e2fsck is surely in an consistent (clean) state.

But what about contents and metadata of the files within that filesystem?

Is there a unique correlation between the filesystem damages found by e2fsck and data corruption? For example like:

If no other damages than wrong counts were found in the filesystem,
the actual file data are okay.

Or:

If the filesystem contains multiply-claimed inodes, the contents of at
least one file is corrupted.

In the latter case, the described scenario could have corrupted the file contents even if the filesystem was later found to be clean. Right?

Are there any experience values or reasoned criteria that can be taken to assess the integrity of the files depending on the filesystem errors that were found by e2fsck?

In this context, the answer of Gilles to "How to test file system correction done by fsck" is a good read.

Own guess and impact

For filesystems that were in error state 5, on the other hand, I would refer to a backup.

Further findings

linux synchronization crash unmounting e2fsck

edited Nov 19 at 19:47

asked Nov 13 at 12:53

Jürgen

edited Nov 19 at 19:47

asked Nov 13 at 12:53

Jürgen

edited Nov 19 at 19:47

asked Nov 13 at 12:53

Jürgen

asked Nov 13 at 12:53

Jürgen

asked Nov 13 at 12:53

Jürgen

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

As long as the system was not doing a major disk intensive job when things went wrong.

And if the drive settings were not purposly set to cache data before write.

You can be reasonably sure that if all the checks pass, that the data is trustworthy. However depending on the age of the drive and use case, I would clone the drive to a newer one and use the new drive.

edited Nov 19 at 7:13

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

With "... if all the checks pass..." you mean the cases where e2fsck found the filesystem clean or could repair it by just replaying the journal, together with passing the S.M.A.R.T. and badblocks tests, right?
– Jürgen
Nov 19 at 19:50

Does your answer reflect some experience values or was it derived from deeper knowledge of ext filesystems and the methods of their repair or from an insight into the related communication protocols?
– Jürgen
Nov 19 at 19:50

Good that you mentioned write caches: I found an interesting post about that and updated my question by appending the paragraph at the end. I agree that it is very unlikely that an only partial write, cached or not, results in a clean filesystem. So the latter is a good sign with respect to data integrity. But can we take it as a guarantee? I also updated the information about the age of the drives: less then 16 months in all cases.
– Jürgen
Nov 19 at 19:50

I have experience with these file systems and their repair.
– Michael Prokopec
Nov 19 at 19:52

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481449%2fcan-we-trust-the-files-in-a-filesystem-that-was-repaired-by-e2fsck%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

As long as the system was not doing a major disk intensive job when things went wrong.

And if the drive settings were not purposly set to cache data before write.

edited Nov 19 at 7:13

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

With "... if all the checks pass..." you mean the cases where e2fsck found the filesystem clean or could repair it by just replaying the journal, together with passing the S.M.A.R.T. and badblocks tests, right?
– Jürgen
Nov 19 at 19:50

Does your answer reflect some experience values or was it derived from deeper knowledge of ext filesystems and the methods of their repair or from an insight into the related communication protocols?
– Jürgen
Nov 19 at 19:50

Good that you mentioned write caches: I found an interesting post about that and updated my question by appending the paragraph at the end. I agree that it is very unlikely that an only partial write, cached or not, results in a clean filesystem. So the latter is a good sign with respect to data integrity. But can we take it as a guarantee? I also updated the information about the age of the drives: less then 16 months in all cases.
– Jürgen
Nov 19 at 19:50

I have experience with these file systems and their repair.
– Michael Prokopec
Nov 19 at 19:52

add a comment |

up vote
0
down vote

As long as the system was not doing a major disk intensive job when things went wrong.

And if the drive settings were not purposly set to cache data before write.

edited Nov 19 at 7:13

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

With "... if all the checks pass..." you mean the cases where e2fsck found the filesystem clean or could repair it by just replaying the journal, together with passing the S.M.A.R.T. and badblocks tests, right?
– Jürgen
Nov 19 at 19:50

Does your answer reflect some experience values or was it derived from deeper knowledge of ext filesystems and the methods of their repair or from an insight into the related communication protocols?
– Jürgen
Nov 19 at 19:50

Good that you mentioned write caches: I found an interesting post about that and updated my question by appending the paragraph at the end. I agree that it is very unlikely that an only partial write, cached or not, results in a clean filesystem. So the latter is a good sign with respect to data integrity. But can we take it as a guarantee? I also updated the information about the age of the drives: less then 16 months in all cases.
– Jürgen
Nov 19 at 19:50

I have experience with these file systems and their repair.
– Michael Prokopec
Nov 19 at 19:52

add a comment |

up vote
0
down vote

As long as the system was not doing a major disk intensive job when things went wrong.

And if the drive settings were not purposly set to cache data before write.

edited Nov 19 at 7:13

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

As long as the system was not doing a major disk intensive job when things went wrong.

And if the drive settings were not purposly set to cache data before write.

edited Nov 19 at 7:13

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

edited Nov 19 at 7:13

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

answered Nov 19 at 7:06

Michael Prokopec

3628

answered Nov 19 at 7:06

Michael Prokopec

3628

New contributor

Michael Prokopec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

With "... if all the checks pass..." you mean the cases where e2fsck found the filesystem clean or could repair it by just replaying the journal, together with passing the S.M.A.R.T. and badblocks tests, right?
– Jürgen
Nov 19 at 19:50

Does your answer reflect some experience values or was it derived from deeper knowledge of ext filesystems and the methods of their repair or from an insight into the related communication protocols?
– Jürgen
Nov 19 at 19:50

Good that you mentioned write caches: I found an interesting post about that and updated my question by appending the paragraph at the end. I agree that it is very unlikely that an only partial write, cached or not, results in a clean filesystem. So the latter is a good sign with respect to data integrity. But can we take it as a guarantee? I also updated the information about the age of the drives: less then 16 months in all cases.
– Jürgen
Nov 19 at 19:50

I have experience with these file systems and their repair.
– Michael Prokopec
Nov 19 at 19:52

add a comment |

With "... if all the checks pass..." you mean the cases where e2fsck found the filesystem clean or could repair it by just replaying the journal, together with passing the S.M.A.R.T. and badblocks tests, right?
– Jürgen
Nov 19 at 19:50

Does your answer reflect some experience values or was it derived from deeper knowledge of ext filesystems and the methods of their repair or from an insight into the related communication protocols?
– Jürgen
Nov 19 at 19:50

Good that you mentioned write caches: I found an interesting post about that and updated my question by appending the paragraph at the end. I agree that it is very unlikely that an only partial write, cached or not, results in a clean filesystem. So the latter is a good sign with respect to data integrity. But can we take it as a guarantee? I also updated the information about the age of the drives: less then 16 months in all cases.
– Jürgen
Nov 19 at 19:50

I have experience with these file systems and their repair.
– Michael Prokopec
Nov 19 at 19:52

With "... if all the checks pass..." you mean the cases where e2fsck found the filesystem clean or could repair it by just replaying the journal, together with passing the S.M.A.R.T. and badblocks tests, right?
– Jürgen
Nov 19 at 19:50

Does your answer reflect some experience values or was it derived from deeper knowledge of ext filesystems and the methods of their repair or from an insight into the related communication protocols?
– Jürgen
Nov 19 at 19:50

Good that you mentioned write caches: I found an interesting post about that and updated my question by appending the paragraph at the end. I agree that it is very unlikely that an only partial write, cached or not, results in a clean filesystem. So the latter is a good sign with respect to data integrity. But can we take it as a guarantee? I also updated the information about the age of the drives: less then 16 months in all cases.
– Jürgen
Nov 19 at 19:50

I have experience with these file systems and their repair.
– Michael Prokopec
Nov 19 at 19:52

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk