NVME硬盘容易挂,并且挂掉以后 想恢复数据非常难
# 这是一块挂掉的硬盘,之前是直接掉盘,然后换了个Nvme接口又上线了
# 分析日志
smartctl -a /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-362.18.1.el9_3.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 980 1TB
Serial Number: S64ANG0R527805M
Firmware Version: 2B4QFXO7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 5
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization: 996,993,064,960 [996 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 d51150cabe
Local Time is: Mon Feb 12 10:19:22 2024 CST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0055): Comp DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x10): NP_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 5.24W - - 0 0 0 0 0 0
1 + 4.49W - - 1 1 1 1 0 0
2 + 2.19W - - 2 2 2 2 0 500
3 - 0.0500W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 1000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 17 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 9%
Data Units Read: 27,463,840 [14.0 TB]
Data Units Written: 101,410,022 [51.9 TB]
Host Read Commands: 385,873,598
Host Write Commands: 3,311,342,087
Controller Busy Time: 5,090
Power Cycles: 19
Power On Hours: 5,372
Unsafe Shutdowns: 15
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 3411
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 17 Celsius
Temperature Sensor 2: 22 Celsius
Thermal Temp. 2 Transition Count: 209329
Thermal Temp. 2 Total Time: 149581
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
这个是新换的硬盘
smartctl -a /dev/nvme1n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-362.18.1.el9_3.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: MZXLR960HBHQ-000H3
Serial Number: S6C7NA0R700412
Firmware Version: MPK7525Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 960,197,124,096 [960 GB]
Unallocated NVM Capacity: 0
Controller ID: 65
NVMe Version: 1.3
Number of Namespaces: 64
Namespace 1 Size/Capacity: 960,197,124,096 [960 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 97110014a5
Local Time is: Mon Feb 12 10:23:19 2024 CST
Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005e): Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec
Optional NVM Commands (0x007f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv Timestmp
Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 256 Pages
Warning Comp. Temp. Threshold: 65 Celsius
Critical Comp. Temp. Threshold: 73 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 25.00W 12.00W - 0 0 0 0 180 180
1 + 11.00W 11.00W - 1 1 1 1 180 180
2 + 9.00W 9.00W - 2 2 2 2 180 180
3 + 9.00W 9.00W - 2 2 2 2 180 180
4 - 0.00W - - 3 3 3 3 120 120
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 1
1 - 4096 0 0
2 - 512 8 3
3 - 4096 8 2
4 - 4096 64 3
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 12 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 163,176 [83.5 GB]
Data Units Written: 1,007,053 [515 GB]
Host Read Commands: 3,377,116
Host Write Commands: 24,578,068
Controller Busy Time: 6
Power Cycles: 14
Power On Hours: 810
Unsafe Shutdowns: 13
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 12 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged