NVME 掉盘 日志分析

NVME硬盘容易挂,并且挂掉以后 想恢复数据非常难

# 这是一块挂掉的硬盘,之前是直接掉盘,然后换了个Nvme接口又上线了
# 分析日志

smartctl -a /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-362.18.1.el9_3.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 980 1TB
Serial Number:                      S64ANG0R527805M
Firmware Version:                   2B4QFXO7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      5
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            996,993,064,960 [996 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 d51150cabe
Local Time is:                      Mon Feb 12 10:19:22 2024 CST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0055):     Comp DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x10):        NP_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.24W       -        -    0  0  0  0        0       0
 1 +     4.49W       -        -    1  1  1  1        0       0
 2 +     2.19W       -        -    2  2  2  2        0     500
 3 -   0.0500W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     1000    9000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        17 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    9%
Data Units Read:                    27,463,840 [14.0 TB]
Data Units Written:                 101,410,022 [51.9 TB]
Host Read Commands:                 385,873,598
Host Write Commands:                3,311,342,087
Controller Busy Time:               5,090
Power Cycles:                       19
Power On Hours:                     5,372
Unsafe Shutdowns:                   15
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    3411
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               17 Celsius
Temperature Sensor 2:               22 Celsius
Thermal Temp. 2 Transition Count:   209329
Thermal Temp. 2 Total Time:         149581

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

这个是新换的硬盘

smartctl -a /dev/nvme1n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-362.18.1.el9_3.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       MZXLR960HBHQ-000H3
Serial Number:                      S6C7NA0R700412
Firmware Version:                   MPK7525Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 960,197,124,096 [960 GB]
Unallocated NVM Capacity:           0
Controller ID:                      65
NVMe Version:                       1.3
Number of Namespaces:               64
Namespace 1 Size/Capacity:          960,197,124,096 [960 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 97110014a5
Local Time is:                      Mon Feb 12 10:23:19 2024 CST
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005e):   Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec
Optional NVM Commands (0x007f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         256 Pages
Warning  Comp. Temp. Threshold:     65 Celsius
Critical Comp. Temp. Threshold:     73 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W   12.00W       -    0  0  0  0      180     180
 1 +    11.00W   11.00W       -    1  1  1  1      180     180
 2 +     9.00W    9.00W       -    2  2  2  2      180     180
 3 +     9.00W    9.00W       -    2  2  2  2      180     180
 4 -     0.00W       -        -    3  3  3  3      120     120

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0
 2 -     512       8         3
 3 -    4096       8         2
 4 -    4096      64         3

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        12 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    163,176 [83.5 GB]
Data Units Written:                 1,007,053 [515 GB]
Host Read Commands:                 3,377,116
Host Write Commands:                24,578,068
Controller Busy Time:               6
Power Cycles:                       14
Power On Hours:                     810
Unsafe Shutdowns:                   13
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               12 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注