NVMe PCIe disk power cycling












6














I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)



I am running Fedora 22, with kernel 4.4.6.



My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.



Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.



Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:



echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove



followed by:



echo 1 > /sys/bus/pci/rescan



works. However, if I power-off then power-on the device after removing it, the PCI bus rescan does not work (and no message appears in dmesg)



If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs, I would get the following:



[  192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume


And obviously, rescanning the PCI bus does nothing.



Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:




  • Adding kernel boot parameters

  • Use of setpci commands (hints?)

  • Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus

  • Modifications in the kernel sources (hints?)










share|improve this question
























  • Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
    – ssice
    Apr 14 '16 at 13:05












  • You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
    – ssice
    Apr 14 '16 at 13:09










  • @ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
    – mamahuhu
    Apr 14 '16 at 13:15








  • 1




    Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
    – meuh
    Apr 15 '16 at 15:16






  • 1




    @meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a rtcwake -m mem -s 5, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
    – mamahuhu
    Apr 15 '16 at 17:38
















6














I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)



I am running Fedora 22, with kernel 4.4.6.



My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.



Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.



Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:



echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove



followed by:



echo 1 > /sys/bus/pci/rescan



works. However, if I power-off then power-on the device after removing it, the PCI bus rescan does not work (and no message appears in dmesg)



If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs, I would get the following:



[  192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume


And obviously, rescanning the PCI bus does nothing.



Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:




  • Adding kernel boot parameters

  • Use of setpci commands (hints?)

  • Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus

  • Modifications in the kernel sources (hints?)










share|improve this question
























  • Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
    – ssice
    Apr 14 '16 at 13:05












  • You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
    – ssice
    Apr 14 '16 at 13:09










  • @ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
    – mamahuhu
    Apr 14 '16 at 13:15








  • 1




    Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
    – meuh
    Apr 15 '16 at 15:16






  • 1




    @meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a rtcwake -m mem -s 5, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
    – mamahuhu
    Apr 15 '16 at 17:38














6












6








6


0





I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)



I am running Fedora 22, with kernel 4.4.6.



My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.



Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.



Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:



echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove



followed by:



echo 1 > /sys/bus/pci/rescan



works. However, if I power-off then power-on the device after removing it, the PCI bus rescan does not work (and no message appears in dmesg)



If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs, I would get the following:



[  192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume


And obviously, rescanning the PCI bus does nothing.



Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:




  • Adding kernel boot parameters

  • Use of setpci commands (hints?)

  • Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus

  • Modifications in the kernel sources (hints?)










share|improve this question















I want to test a NVMe SSD that is connected to a PCIe slot of my motherboard.
The test procedure is a specific algorithm that writes workloads to the SSD, while the SSD is exposed to radiations (e.g. neutrons)



I am running Fedora 22, with kernel 4.4.6.



My current software successfully works with SATA SSD. Since the SSD can become irresponsive due to radiations, it's sometimes mandatory to power cycle it in order to resume operations. It is made possible with an externally controlled power supply.



Now, I would like to port my software to test NVMe SSD PCIe.
I have modified a PCIe extender to externally apply voltage to the SSD; the derived power lines (+12V and 3.3V) are isolated from the PCIe connector power lines. With this setup, the SSD is well recognized -- and works -- when booting with the external power supply on.



Removing the device and re-scanning the PCI bus works as long as the NVMe SSD is powered on, namely:



echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove



followed by:



echo 1 > /sys/bus/pci/rescan



works. However, if I power-off then power-on the device after removing it, the PCI bus rescan does not work (and no message appears in dmesg)



If I "brutally" power off the SSD (with my controlled power supply) without removing the SSD under sysfs, I would get the following:



[  192.688934] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[ 192.689274] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699900] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 192.699946] Trying to free nonexistent resource <000000000000e000-000000000000e0ff>
[ 192.699953] nvme 0000:01:00.0: Device failed to resume


And obviously, rescanning the PCI bus does nothing.



Question: what would be necessary to achieve the power-cycling of the SSD without rebooting my test station? From similar threads, I understand that this problem is not trivial so I would be content with a wide range of solutions -- or hints --, including:




  • Adding kernel boot parameters

  • Use of setpci commands (hints?)

  • Use of extra logic, e.g. wire modifications on the PCIe extender to "fool" the PCIe bus

  • Modifications in the kernel sources (hints?)







linux-kernel ssd pci






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 26 '17 at 12:32









Patryk

3,585114152




3,585114152










asked Apr 14 '16 at 12:52









mamahuhu

3816




3816












  • Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
    – ssice
    Apr 14 '16 at 13:05












  • You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
    – ssice
    Apr 14 '16 at 13:09










  • @ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
    – mamahuhu
    Apr 14 '16 at 13:15








  • 1




    Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
    – meuh
    Apr 15 '16 at 15:16






  • 1




    @meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a rtcwake -m mem -s 5, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
    – mamahuhu
    Apr 15 '16 at 17:38


















  • Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
    – ssice
    Apr 14 '16 at 13:05












  • You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
    – ssice
    Apr 14 '16 at 13:09










  • @ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
    – mamahuhu
    Apr 14 '16 at 13:15








  • 1




    Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
    – meuh
    Apr 15 '16 at 15:16






  • 1




    @meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a rtcwake -m mem -s 5, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
    – mamahuhu
    Apr 15 '16 at 17:38
















Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05






Hot-plugging PCIe devices is discussed on electronics.SE at electronics.stackexchange.com/a/208796 but in short, if you want to go the standards-compliant way you need to ensure both your motherboard hardware and BIOS software support hot-plug.
– ssice
Apr 14 '16 at 13:05














You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09




You may also have some luck by building your own PCIe adaptor, talking to your harddrive and proxying the communication to the motherboard, fooling it to thinking it is still alive whilst being power cycled; but I assume that would be a quite expensive work to do.
– ssice
Apr 14 '16 at 13:09












@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15






@ssice Thanks for your comments. I had already read the (very interesting) hot-plugging thread on electronics.SE. Since I was "just" power cycling the device -- and not physically removing it --, I had high hopes that my case would be simpler. Concerning the "communication" proxy, that's indeed a bit heavy a modification. I could do with pin shorting or the use a simple electronic components, though.
– mamahuhu
Apr 14 '16 at 13:15






1




1




Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16




Not quite a reboot, but you could try power suspend and resume, which can be quite fast.
– meuh
Apr 15 '16 at 15:16




1




1




@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a rtcwake -m mem -s 5, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
– mamahuhu
Apr 15 '16 at 17:38




@meuh I had thought about it, but your remark made me search more thoroughly in this direction. It turns out that with a rtcwake -m mem -s 5, I can suspend for 5 seconds, and the voltage is indeed 0V on my SSD (I checked with a voltmeter). I'm using vnc to connect to the PC testing the SSD and it even turns out I do no loose the session (it just freezes for the 5 seconds). Thanks again for all the nice inputs!
– mamahuhu
Apr 15 '16 at 17:38










1 Answer
1






active

oldest

votes


















0














This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot



$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)


then listing the register names and passing each to setpci (you don't need to be root):



$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'


This gets you lines like



vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000


Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot" with each of these lines, ignoring this aspect.



The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,



sudo setpci -s 00:1f.2   CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l


will print the MSI capabilities registers:



00017005
fee0200c
000041b1


Compare these with the values shown by



sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1





share|improve this answer





















  • Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
    – mamahuhu
    Apr 15 '16 at 11:19











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f276439%2fnvme-pcie-disk-power-cycling%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot



$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)


then listing the register names and passing each to setpci (you don't need to be root):



$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'


This gets you lines like



vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000


Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot" with each of these lines, ignoring this aspect.



The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,



sudo setpci -s 00:1f.2   CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l


will print the MSI capabilities registers:



00017005
fee0200c
000041b1


Compare these with the values shown by



sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1





share|improve this answer





















  • Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
    – mamahuhu
    Apr 15 '16 at 11:19
















0














This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot



$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)


then listing the register names and passing each to setpci (you don't need to be root):



$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'


This gets you lines like



vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000


Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot" with each of these lines, ignoring this aspect.



The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,



sudo setpci -s 00:1f.2   CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l


will print the MSI capabilities registers:



00017005
fee0200c
000041b1


Compare these with the values shown by



sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1





share|improve this answer





















  • Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
    – mamahuhu
    Apr 15 '16 at 11:19














0












0








0






This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot



$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)


then listing the register names and passing each to setpci (you don't need to be root):



$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'


This gets you lines like



vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000


Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot" with each of these lines, ignoring this aspect.



The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,



sudo setpci -s 00:1f.2   CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l


will print the MSI capabilities registers:



00017005
fee0200c
000041b1


Compare these with the values shown by



sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1





share|improve this answer












This is unlikely to succeed in getting the device to work again, but might get the device responsive enough to respond to the remove. Whilst the device is ok, save all the pci configuraton registers, and after the power-cycle restore them. You can get some way towards this by finding the controller slot



$ lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)


then listing the register names and passing each to setpci (you don't need to be root):



$ setpci --dumpregs |
awk -v slot='00:1f.2' 'NR>1 && !/ E?CAP/{
reg = tolower($NF)
printf "%s=",reg
system("setpci -s " slot " " reg)
}'


This gets you lines like



vendor_id=8086
device_id=1e03
command=0407
status=02b0
base_address_0=0000f0b1
base_address_1=0000f0a1
base_address_2=0000f091
base_address_3=0000f081
base_address_4=0000f061
base_address_5=f7c06000


Obviously some of these registers are readonly, or have readonly bits. The idea is to call sudo setpci -s "$slot" with each of these lines, ignoring this aspect.



The above only handles the basic pci configuraton registers. However, you will need to save and restore some capability registers too. This will need more effort, depending on the register. You also need to be root to read them. For example,



sudo setpci -s 00:1f.2   CAP_MSI+0.l CAP_MSI+4.l CAP_MSI+8.l


will print the MSI capabilities registers:



00017005
fee0200c
000041b1


Compare these with the values shown by



sudo lspci -s "$slot" -vvv
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0200c Data: 41b1






share|improve this answer












share|improve this answer



share|improve this answer










answered Apr 14 '16 at 14:21









meuh

31.5k11854




31.5k11854












  • Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
    – mamahuhu
    Apr 15 '16 at 11:19


















  • Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
    – mamahuhu
    Apr 15 '16 at 11:19
















Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19




Thanks for these useful insights. However, there is a major issue; when the SSD is power cycled, the SSD disappears from /sys/bus/pci/devices. Thus, all setpci -s '01:00.0' commands no longer works -- since the SSD is no longer enumerated.
– mamahuhu
Apr 15 '16 at 11:19


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f276439%2fnvme-pcie-disk-power-cycling%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre