NVMe PCIe disk power cycling
I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)
I am running Fedora 22, with kernel 4.4.6.
My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.
Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.
Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
followed by:
echo 1 > /sys/bus/pci/rescan
works. However, if I power-off then power-on the device after removing it, the PCI bus rescan
does not work (and no message appears in dmesg
)
If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs
, I would get the following:
[ 192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume
And obviously, rescanning the PCI bus does nothing.
Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:
- Adding kernel boot parameters
- Use of
setpci
commands (hints?) - Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus
- Modifications in the kernel sources (hints?)
linux-kernel ssd pci
|
show 2 more comments
I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)
I am running Fedora 22, with kernel 4.4.6.
My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.
Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.
Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
followed by:
echo 1 > /sys/bus/pci/rescan
works. However, if I power-off then power-on the device after removing it, the PCI bus rescan
does not work (and no message appears in dmesg
)
If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs
, I would get the following:
[ 192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume
And obviously, rescanning the PCI bus does nothing.
Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:
- Adding kernel boot parameters
- Use of
setpci
commands (hints?) - Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus
- Modifications in the kernel sources (hints?)
linux-kernel ssd pci
Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05
You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09
@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15
1
Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16
1
@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with artcwake -m mem -s 5
, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
– mamahuhu
Apr 15 '16 at 17:38
|
show 2 more comments
I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)
I am running Fedora 22, with kernel 4.4.6.
My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.
Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.
Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
followed by:
echo 1 > /sys/bus/pci/rescan
works. However, if I power-off then power-on the device after removing it, the PCI bus rescan
does not work (and no message appears in dmesg
)
If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs
, I would get the following:
[ 192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume
And obviously, rescanning the PCI bus does nothing.
Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:
- Adding kernel boot parameters
- Use of
setpci
commands (hints?) - Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus
- Modifications in the kernel sources (hints?)
linux-kernel ssd pci
I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)
I am running Fedora 22, with kernel 4.4.6.
My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.
Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.
Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
followed by:
echo 1 > /sys/bus/pci/rescan
works. However, if I power-off then power-on the device after removing it, the PCI bus rescan
does not work (and no message appears in dmesg
)
If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs
, I would get the following:
[ 192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume
And obviously, rescanning the PCI bus does nothing.
Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:
- Adding kernel boot parameters
- Use of
setpci
commands (hints?) - Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus
- Modifications in the kernel sources (hints?)
linux-kernel ssd pci
linux-kernel ssd pci
edited Jul 26 '17 at 12:32
Patryk
3,585114152
3,585114152
asked Apr 14 '16 at 12:52
mamahuhu
3816
3816
Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05
You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09
@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15
1
Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16
1
@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with artcwake -m mem -s 5
, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
– mamahuhu
Apr 15 '16 at 17:38
|
show 2 more comments
Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05
You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09
@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15
1
Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16
1
@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with artcwake -m mem -s 5
, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
– mamahuhu
Apr 15 '16 at 17:38
Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05
Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05
You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09
You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09
@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15
@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15
1
1
Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16
Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16
1
1
@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a
rtcwake -m mem -s 5
, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!– mamahuhu
Apr 15 '16 at 17:38
@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a
rtcwake -m mem -s 5
, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!– mamahuhu
Apr 15 '16 at 17:38
|
show 2 more comments
1 Answer
1
active
oldest
votes
This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot
$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
then listing the register names and passing each to setpci (you don't need to be root):
$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'
This gets you lines like
vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000
Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot"
with each of these lines, ignoring this aspect.
The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,
sudo setpci -s 00:1f.2 CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l
will print the MSI capabilities registers:
00017005
fee0200c
000041b1
Compare these with the values shown by
sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from/sys/bus/pci/devices
. Thus, allsetpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f276439%2fnvme-pcie-disk-power-cycling%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot
$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
then listing the register names and passing each to setpci (you don't need to be root):
$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'
This gets you lines like
vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000
Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot"
with each of these lines, ignoring this aspect.
The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,
sudo setpci -s 00:1f.2 CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l
will print the MSI capabilities registers:
00017005
fee0200c
000041b1
Compare these with the values shown by
sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from/sys/bus/pci/devices
. Thus, allsetpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19
add a comment |
This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot
$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
then listing the register names and passing each to setpci (you don't need to be root):
$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'
This gets you lines like
vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000
Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot"
with each of these lines, ignoring this aspect.
The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,
sudo setpci -s 00:1f.2 CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l
will print the MSI capabilities registers:
00017005
fee0200c
000041b1
Compare these with the values shown by
sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from/sys/bus/pci/devices
. Thus, allsetpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19
add a comment |
This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot
$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
then listing the register names and passing each to setpci (you don't need to be root):
$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'
This gets you lines like
vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000
Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot"
with each of these lines, ignoring this aspect.
The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,
sudo setpci -s 00:1f.2 CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l
will print the MSI capabilities registers:
00017005
fee0200c
000041b1
Compare these with the values shown by
sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1
This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot
$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
then listing the register names and passing each to setpci (you don't need to be root):
$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'
This gets you lines like
vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000
Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot"
with each of these lines, ignoring this aspect.
The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,
sudo setpci -s 00:1f.2 CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l
will print the MSI capabilities registers:
00017005
fee0200c
000041b1
Compare these with the values shown by
sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1
answered Apr 14 '16 at 14:21
meuh
31.5k11854
31.5k11854
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from/sys/bus/pci/devices
. Thus, allsetpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19
add a comment |
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from/sys/bus/pci/devices
. Thus, allsetpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from
/sys/bus/pci/devices
. Thus, all setpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.– mamahuhu
Apr 15 '16 at 11:19
Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from
/sys/bus/pci/devices
. Thus, all setpci -s '01:00.0'
commands no longer works -- since the SSD is no longer enumerated.– mamahuhu
Apr 15 '16 at 11:19
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f276439%2fnvme-pcie-disk-power-cycling%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05
You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09
@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15
1
Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16
1
@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a
rtcwake -m mem -s 5
, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!– mamahuhu
Apr 15 '16 at 17:38