J'ai installé la contribs Monitor Disk Health pour voir l'état de santé de mes disques. Évidemment je reçois un mail me disant ceci -->
Objet: SMART error (OfflineUncorrectableSector) detected on host: sme
This email was generated by the smartd daemon running on:
host name: sme
DNS domain: tchoupi.no-ip.net
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sda, 757 Offline uncorrectable sectors
For details see host's SYSLOG (default: /var/log/messages).
You can also use the smartctl utility for further investigation.
Another email message will be sent in 1 days if the problem persists
Cette commande:
tail -f /var/log/messages | tai64nloval | grep /dev/sd
me dit:
- Code: Tout sélectionner
Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sda, 757 Offline uncorrectable sectors
Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 250 to 251
Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sda, SMART Usage Attribute: 199 UDMA_CRC_Error_Count changed from 197 to 194
Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 55 to 54
Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sdb, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 55 to 54
Ni une ni deux, je fais tourner la commande
smartctl -a /dev/sda -h
qui me renvoie ceci :
- Code: Tout sélectionner
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: Maxtor 6Y160M0
Serial Number: Y44MLZBE
Firmware Version: YAR51EW0
User Capacity: 163,928,604,672 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Fri Feb 12 11:06:27 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 302) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 72) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 200 199 063 Pre-fail Always - 15237
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 1619
5 Reallocated_Sector_Ct 0x0033 251 162 063 Pre-fail Always - 24
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 251 244 187 Pre-fail Always - 37038
9 Power_On_Minutes 0x0032 191 191 000 Old_age Always - 1000h+26m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 248 248 000 Old_age Always - 2284
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 51
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 7592
196 Reallocated_Event_Count 0x0008 001 001 000 Old_age Offline - 757
197 Current_Pending_Sector 0x0008 253 164 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 001 001 000 Old_age Offline - 757
199 UDMA_CRC_Error_Count 0x0008 198 191 000 Old_age Offline - 10
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 45
202 TA_Increase_Count 0x000a 253 239 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 6
204 Shock_Count_Write_Opern 0x000a 253 251 000 Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 192 192 000 Old_age Offline - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
SMART Error Log Version: 1
Warning: ATA error count 8249 inconsistent with error log pointer 5
ATA Error Count: 8249 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 8249 occurred at disk power-on lifetime: 18044 hours (751 days + 20 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 cd cf d2 e0 Error: ICRC, ABRT at LBA = 0x00d2cfcd = 13815757
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 cd cf d2 e0 00 28d+06:06:44.752 READ DMA EXT
25 00 00 cd cd d2 e0 00 28d+06:06:44.752 READ DMA EXT
25 00 00 cd c9 d2 e0 00 28d+06:06:44.736 READ DMA EXT
25 00 00 cd c5 d2 e0 00 28d+06:06:44.736 READ DMA EXT
25 00 80 4d c4 d2 e0 00 28d+06:06:44.736 READ DMA EXT
Error 8248 occurred at disk power-on lifetime: 12640 hours (526 days + 16 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 4d ae 7a e0 Error: ICRC, ABRT at LBA = 0x007aae4d = 8040013
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 4d ae 7a e0 00 00:13:51.008 READ DMA EXT
25 00 00 4d aa 7a e0 00 00:13:50.992 READ DMA EXT
25 00 80 cd a6 7a e0 00 00:13:50.992 READ DMA EXT
25 00 00 cd a2 7a e0 00 00:13:50.992 READ DMA EXT
25 00 00 cd 9e 7a e0 00 00:13:50.976 READ DMA EXT
Error 8247 occurred at disk power-on lifetime: 13072 hours (544 days + 16 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 e5 b7 04 e0 Error: ICRC, ABRT at LBA = 0x0004b7e5 = 309221
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e5 b7 04 e0 00 21d+06:48:36.816 READ DMA
c8 00 08 2d b4 04 e0 00 21d+06:48:36.800 READ DMA
c8 00 08 15 f0 fa e0 00 21d+06:48:36.800 READ DMA
c8 00 08 d5 af 04 e0 00 21d+06:48:36.800 READ DMA
c8 00 10 ad af 04 e0 00 21d+06:48:36.800 READ DMA
Error 8246 occurred at disk power-on lifetime: 12580 hours (524 days + 4 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 dd 2f e3 ea Error: UNC 8 sectors at LBA = 0x0ae32fdd = 182661085
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 dd 2f e3 ea 00 37d+19:56:58.976 READ DMA
ec 03 46 00 00 00 a0 00 37d+19:56:58.960 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 37d+19:56:58.960 SET FEATURES [Set transfer mode]
ec 00 00 dd 2f e3 a0 00 37d+19:56:58.960 IDENTIFY DEVICE
c8 00 08 dd 2f e3 ea 00 37d+19:56:57.952 READ DMA
Error 8245 occurred at disk power-on lifetime: 12580 hours (524 days + 4 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 dd 2f e3 ea Error: UNC 8 sectors at LBA = 0x0ae32fdd = 182661085
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 dd 2f e3 ea 00 37d+19:56:57.952 READ DMA
ec 03 46 00 00 00 a0 00 37d+19:56:57.952 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 37d+19:56:57.952 SET FEATURES [Set transfer mode]
ec 00 00 dd 2f e3 a0 00 37d+19:56:57.952 IDENTIFY DEVICE
c8 00 08 dd 2f e3 ea 00 37d+19:56:56.944 READ DMA
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Évidemment j'en déduis qui serai bon d'envisager de changer de disque, mais est-ce vraiment critique comme situation, ou je peux laisser trainer cette situation ? Dans les logs on peux voir que ce soucis n'est apparemment pas d'aujourd'hui (751 days + 20 hours) Est-ce grave docteur ?
J'ai commencer à lire ceci, mais houlaaa, là, ça devient trop compliqué pour mes toutes petites connaissances linuxiennes. Toute aide ou conseil seront les bienvenus.