SMART error (OfflineUncorrectableSector)

par **shwing** » 12 Fév 2010 13:06

Hello,

J'ai installé la contribs Monitor Disk Health pour voir l'état de santé de mes disques. Évidemment je reçois un mail me disant ceci -->

Objet: SMART error (OfflineUncorrectableSector) detected on host: sme
This email was generated by the smartd daemon running on:

host name: sme
DNS domain: tchoupi.no-ip.net
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sda, 757 Offline uncorrectable sectors

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
Another email message will be sent in 1 days if the problem persists

Cette commande:

tail -f /var/log/messages | tai64nloval | grep /dev/sd

me dit:

Code: Tout sélectionner: Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sda, 757 Offline uncorrectable sectors Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 250 to 251 Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sda, SMART Usage Attribute: 199 UDMA_CRC_Error_Count changed from 197 to 194 Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 55 to 54 Feb 12 11:19:12 sme smartd[6638]: Device: /dev/sdb, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 55 to 54

Ni une ni deux, je fais tourner la commande

smartctl -a /dev/sda -h

qui me renvoie ceci :

Code: Tout sélectionner: smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: Maxtor 6Y160M0 Serial Number: Y44MLZBE Firmware Version: YAR51EW0 User Capacity: 163,928,604,672 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Fri Feb 12 11:06:27 2010 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 302) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 72) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 200 199 063 Pre-fail Always - 15237 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 1619 5 Reallocated_Sector_Ct 0x0033 251 162 063 Pre-fail Always - 24 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 251 244 187 Pre-fail Always - 37038 9 Power_On_Minutes 0x0032 191 191 000 Old_age Always - 1000h+26m 10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 248 248 000 Old_age Always - 2284 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 51 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 7592 196 Reallocated_Event_Count 0x0008 001 001 000 Old_age Offline - 757 197 Current_Pending_Sector 0x0008 253 164 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 001 001 000 Old_age Offline - 757 199 UDMA_CRC_Error_Count 0x0008 198 191 000 Old_age Offline - 10 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 45 202 TA_Increase_Count 0x000a 253 239 000 Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 6 204 Shock_Count_Write_Opern 0x000a 253 251 000 Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0 207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0 208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 192 192 000 Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 SMART Error Log Version: 1 Warning: ATA error count 8249 inconsistent with error log pointer 5 ATA Error Count: 8249 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 8249 occurred at disk power-on lifetime: 18044 hours (751 days + 20 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 cd cf d2 e0 Error: ICRC, ABRT at LBA = 0x00d2cfcd = 13815757 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 cd cf d2 e0 00 28d+06:06:44.752 READ DMA EXT 25 00 00 cd cd d2 e0 00 28d+06:06:44.752 READ DMA EXT 25 00 00 cd c9 d2 e0 00 28d+06:06:44.736 READ DMA EXT 25 00 00 cd c5 d2 e0 00 28d+06:06:44.736 READ DMA EXT 25 00 80 4d c4 d2 e0 00 28d+06:06:44.736 READ DMA EXT Error 8248 occurred at disk power-on lifetime: 12640 hours (526 days + 16 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 4d ae 7a e0 Error: ICRC, ABRT at LBA = 0x007aae4d = 8040013 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 4d ae 7a e0 00 00:13:51.008 READ DMA EXT 25 00 00 4d aa 7a e0 00 00:13:50.992 READ DMA EXT 25 00 80 cd a6 7a e0 00 00:13:50.992 READ DMA EXT 25 00 00 cd a2 7a e0 00 00:13:50.992 READ DMA EXT 25 00 00 cd 9e 7a e0 00 00:13:50.976 READ DMA EXT Error 8247 occurred at disk power-on lifetime: 13072 hours (544 days + 16 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 e5 b7 04 e0 Error: ICRC, ABRT at LBA = 0x0004b7e5 = 309221 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 e5 b7 04 e0 00 21d+06:48:36.816 READ DMA c8 00 08 2d b4 04 e0 00 21d+06:48:36.800 READ DMA c8 00 08 15 f0 fa e0 00 21d+06:48:36.800 READ DMA c8 00 08 d5 af 04 e0 00 21d+06:48:36.800 READ DMA c8 00 10 ad af 04 e0 00 21d+06:48:36.800 READ DMA Error 8246 occurred at disk power-on lifetime: 12580 hours (524 days + 4 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 dd 2f e3 ea Error: UNC 8 sectors at LBA = 0x0ae32fdd = 182661085 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 dd 2f e3 ea 00 37d+19:56:58.976 READ DMA ec 03 46 00 00 00 a0 00 37d+19:56:58.960 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 37d+19:56:58.960 SET FEATURES [Set transfer mode] ec 00 00 dd 2f e3 a0 00 37d+19:56:58.960 IDENTIFY DEVICE c8 00 08 dd 2f e3 ea 00 37d+19:56:57.952 READ DMA Error 8245 occurred at disk power-on lifetime: 12580 hours (524 days + 4 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 dd 2f e3 ea Error: UNC 8 sectors at LBA = 0x0ae32fdd = 182661085 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 dd 2f e3 ea 00 37d+19:56:57.952 READ DMA ec 03 46 00 00 00 a0 00 37d+19:56:57.952 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 37d+19:56:57.952 SET FEATURES [Set transfer mode] ec 00 00 dd 2f e3 a0 00 37d+19:56:57.952 IDENTIFY DEVICE c8 00 08 dd 2f e3 ea 00 37d+19:56:56.944 READ DMA SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Évidemment j'en déduis qui serai bon d'envisager de changer de disque, mais est-ce vraiment critique comme situation, ou je peux laisser trainer cette situation ? Dans les logs on peux voir que ce soucis n'est apparemment pas d'aujourd'hui (751 days + 20 hours) Est-ce grave docteur ?
J'ai commencer à lire ceci, mais houlaaa, là, ça devient trop compliqué pour mes toutes petites connaissances linuxiennes. Toute aide ou conseil seront les bienvenus.

SMART error (OfflineUncorrectableSector)

SMART error (OfflineUncorrectableSector)

Qui est en ligne ?