Store 2.0
STABILITÄT Store 2.0
Vermutung: Die drei SATA-Erweiterungskarten crashen gelegentlich das komplette System.
Karten:
http://www.sybausa.com/productInfo.php?iid=537
Syba SY-PEX40008 4-port SATA II PCI-e Software RAID Controller Card--Bundle with Low Profile Bracket, SIL3124 Chipset Sind die identischen Karten, die immer noch von Backblaze verbaut werden (Pod 2.0 UND Pod 3.0!) Hängen an drei PCI-E 1x (kleiner Port)
RAID bliebt heile (zum Glück!), da die Karten dann komplett die Zugriff sperren.
Arch-Log files haben gar keine Einträge zu den Crashs!!!
Remote-Log über Syslogd (über Myth) zeigt als letzten Eintrag:
mdadm: sending ioctl 1261 to a partition (buggy Eintrag, aber unkritisch) sata_sil24: IRQ status == 0xffffffff, PCI fault or device removal
sata-sil24:
https://ata.wiki.kernel.org/index.php/Sata_sil24
Spurious interrupts are expected on SiI3124 suffering from IRQ loss erratum on PCI-X
PATCH?
http://old.nabble.com/-PATCH-06-13--sata_sil24%3A-implement-loss-of-completion-interrupt-on-PCI-X-errta-fix-p3799674.html
Thread über Zugang mit SIL3124 Chip
http://www.linuxquestions.org/questions/linux-kernel-70/how-to-access-sata-drives-attached-to-sii3124-719408/
Test?
http://marc.info/?l=linux-ide&m=127228317404771&w=2
Raid nach Booten öffnen und mount
Um Auto-Assembly beim Booten zu verhindern muss die Config-Datei /etc/mdadm.conf leer (oder zumindest komplett auskommentiert sein) und "MDADM_SCAN=no" in /etc/sysconfig/mdadm
1.) Checken ob alle Platten da sind:
/root/bin/diskserial_sort2.sh
Müssen im Moment 17 Platten sein. Basis ist die Datei disknum.txt unter /root/bin
2.) Raids suchen und assemblen (kein Autostart):
mdadm --assemble --scan
3.) Cryptsetup:
cryptsetup luksOpen /dev/md125 cr_md125
4.) Mounten:
mount /dev/mapper/cr_md125 /data
Schliessen wäre:
cryptsetup luksClose cr_md125
JD2
java -Xmx512m -jar /home/gagi/jd2/JDownloader.jar
VNC
dergagi.selfhost.bz:5901
Festplatten-Layout
3000GB Hitachi Deskstar 5K3000 HDS5C3030ALA630 CoolSpin 32MB 3.5" (8.9cm) SATA 6Gb/s
3000GB Western Digital WD30EZRX 3TB interne Festplatte (8,9 cm (3,5 Zoll), 5400 rpm, 2ms, 64MB Cache, SATA III
Problem mit WD-Platten und LCC
http://idle3-tools.sourceforge.net/ http://koitsu.wordpress.com/2012/05/30/wd30ezrx-and-aggressive-head-parking/
Get idle3 timer raw value
idle3ctl -g /dev/sdh
Disable idle3 timer:
idle3ctl -d /dev/sdh
Serial auslesen mit:
udevadm info --query=all --name=/dev/sdi | grep ID_SERIAL_SHORT
Serial Systemplatte 160GB:
JC0150HT0J7TPC
Serials der Datenplatten
00 : 00000000000000 (1TB System, Samsung)
geht
01 : 02 : 03 : 04 : 05 :
geht jetzt auch, Molex-Kontakt Problem behoben
06 : 07 : MJ1311YNG4J48A (3TB) 08 : WD-WCC070299387 (3TB WD) 09 : MJ1311YNG3UUPA (3TB) 10 :
geht
11 : MJ1311YNG3SAMA (3TB) 12 : 13V9WK9AS NEU Hot Spare data 13 : MJ1311YNG09EDA (3TB) Garantie-Austausch, HOT SPARE data2 14 : 15 : MCE9215Q0AUYTW (3TB Toshiba neu)
geht
16 : MJ0351YNGA02YA (3TB) 17 : 18 : 19 : 20 : WD-WCAWZ1881335 (3TB WD)
geht jetzt auch, Molex-Kontakt Problem behoben
21 : 22 : WD-WCAWZ2279670 (3TB WD) 23 : MJ1311YNG3SSLA (3TB) 24 : MJ1311YNG25Z6A (3TB) 25 :
geht
26 : MJ1311YNG3RM5A (3TB) 27 : MJ1311YNG3NZ3A (3TB) 28 : MJ1311YNG3NT5A (3TB) 29 : 30 : MCM9215Q0B9LSY (3TB Toshiba neu)
geht
31 : 234BGY0GS (3TB Toshiba neu) 32 : 33 : MJ1311YNG3WZVA (3TB) 34 : MJ1311YNG3Y4SA (3TB) 35 : MJ1311YNG3SYKA (3TB)
geht
36 : 37 : WD-WCC070198169 (3TB WD) 38 : 39 : MJ1311YNG3RZTA (3TB) 40 :
geht
41 : MJ1311YNG3LTRA (3TB) 42 : 43 : MJ1311YNG38VGA (3TB) 44 : 45 :
TOTAL: 25 (18 + 4 + 2 Hotspare + eine Systemplatte) von 46 möglichen
Raid Baubefehl
im Screen mdadm
mdadm --create /dev/md125 --chunk=64 --level=raid6 --layout=ls --raid-devices=15 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1
Re-Create 2014-01-31:
NEUER RICHTIGER RE-CREATE BEFEHL von mdadm-git:
./mdadm --create --assume-clean /dev/md125 --chunk=64 --level=raid6 --layout=ls --raid-devices=18 /dev/sdb1:1024 /dev/sdd1:1024 /dev/sdf1:1024 /dev/sdg1:1024 /dev/sdh1:1024 /dev/sdc1:1024 /dev/sdt1:1024 /dev/sdn1:1024 /dev/sdo1:1024 /dev/sdq1:1024 /dev/sdm1:1024 /dev/sdp1:1024 /dev/sdu1:1024 /dev/sdv1:1024 /dev/sda1:1024 /dev/sds1:1024 /dev/sdl1:1024 /dev/sdw1:1024
Zweites Raid Baubefehl
im Screen mdadm
mdadm --create /dev/md126 --chunk=64 --level=raid6 --layout=ls --raid-devices=4 /dev/sdv1 /dev/sdw1 /dev/sdj1 /dev/sdh1
Verschlüsseln mit speziellen Paramtern für Hardware-Verschlüsselung:
cryptsetup -v luksFormat --cipher aes-cbc-essiv:sha256 --key-size 256 /dev/md126
Öffnen:
cryptsetup luksOpen /dev/md126 cr_md126
XFS Filesystem drauf:
mkfs.xfs /dev/mapper/cr_md126
Spare-Group einrichten
aktuelle Config in mdadm.conf schreiben
mdadm -D -s >> /etc/mdadm.conf
spare-group ergänzen
nano /etc/mdadm.conf
ganz unten spare-group=shared ergänzen
ARRAY /dev/md/126 metadata=1.2 spares=1 name=store2:126 UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx spare-group=shared
Raid-Baustatus
cat /proc/mdstat
automatisch jede Sekunde aktualisiert
watch -n 1 cat /proc/mdstat
Verschlüssel von Hand (ohne Yast2)
Verschlüsseln:
cryptsetup -v --key-size 256 luksFormat /dev/md125
Mit speziellen Paramtern für Hardware-Verschlüsselung:
cryptsetup -v luksFormat --cipher aes-cbc-essiv:sha256 --key-size 256 /dev/md125
Öffnen:
cryptsetup luksOpen /dev/md125 cr_md125
Filesystem drauf:
mkfs.xfs /dev/mapper/cr_md125
store2:~ # mkfs.xfs /dev/mapper/cr_md125
meta-data=/dev/mapper/cr_md125   isize=256    agcount=36, agsize=268435424 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=9523357168, imaxpct=5
         =                       sunit=16     swidth=208 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Status:
cryptsetup luksDump /dev/md125
Grown
Festplatte vorbereiten
Open gdisk with the first hard drive:
$ gdisk /dev/sda
and type the following commands at the prompt:
Add a new partition: n Select the default partition number: Enter Use the default for the first sector: Enter For sda1 and sda2 type the appropriate size in MB (i.e. +100MB and +2048M). For sda3 just hit Enter to select the remainder of the disk. Select Linux RAID as the partition type: fd00 Write the table to disk and exit: w
Bad Blocks
Screen-Umgebung starten
screen -S bb
badblocks -vs -o sdy-badblock-test /dev/sdy
verbose, show progress, output-file (log) badblocks sucht nur nach bad blocks, zerstört aber keine Daten.
Detach 
Strg-a d = detach
Reatach
screen -r bb
ODER Wieder reingehen
screen -x bb
Device zum Raid hinzufügen
mdadm --add /dev/md125 /dev/sdc1
Raid reshapen mit zusätzlichem Device (dauert ca. 3 volle Tage)
mdadm --grow --raid-devices=18 /dev/md125 --backup-file=/home/gagi/mda125backup
um zu sehen, wer/was gerade Zugriff nimmt:
lsof /data
Samba-Service beenden:
rcsmb stop systemctl stop smbd
Unmounten:
umount /data
XFS (Data)
XFS checken (ungemountet)
xfs_repair -n -o bhash=1024 /dev/mapper/cr_md125
Cryptcontainer wachsen
cryptsetup --verbose resize cr_md125
Mounten:
mount /dev/mapper/cr_md125 /data
XFS vergrößern
xfs_growfs /data
XFS checken (ungemountet)
xfs_repair -n -o bhash=1024 /dev/mapper/cr_md125
Read-Only Mounten:
mount -o ro /dev/mapper/cr_md125 /data
Samba-Freigabe
There's a bug in Samba in openSuse 11.4. Here's the workaround:
go to Yast --> AppArmor --> Control Panel (on) --> Configure Profile Modes --> usr.sbin.smbd = complain go to Yast --> system --> runlevels --> smb=on + nmb=on reboot
Direkte Netwerkverbindung Store1 <-> Store 2.0
du kannst auch mal schauen was in /etc/udev/rules.d/70-persistent-net... (oder wie auch immer die date heißt) steht.
da wird die mac einer bestimmten netzwerkadresse (eth0, eth1, ...) zugewiesen.
die datei kannst du auch löschen oder verschieben - wird beim neustart neu angelegt.
da kommt machmal was durcheinander - bei 'nem kernelupdate oder bios-update.
GEHT ! unterschiedliche subnets (192.168.2.100 und 192.168.2.102)
Fast-Copy
1.) Empfänger (Store2.0)
cd <Zielverzeichnis> netcat -l -p 4323 | gunzip | cpio -i -d -m
2.) Sender (Store)
cd <Quellverzeichnis> find . -type f | cpio -p | gzip -1 | netcat 192.168.2.102 4323
1.) Empfänger (Store2.0)
socat tcp4-listen:4323 stdout | tar xvpf - /data/eBooks
2.) Sender (Store)
tar cvf - /data/eBooks | socat stdin tcp4:192.168.2.102:4323
Test mit Fortschrittsanzeige bei bekannter Datengröße:
1.) Empfänger (Store2.0)
cd <Zielverzeichnis> socat tcp4-listen:4323 stdout | pv -s 93G | tar xvpf -
2.) Sender (Store)
cd <Quellverzeichnis> tar cvf - * | pv -s 93G | socat stdin tcp4:192.168.2.102:4323
dd if=/dev/sdl | bar -s 1.5T | dd of=/dev/sdw
FileBot Renamer Linux
filebot -rename -get-subtitles -non-strict /data/Downloads/Fertig/ --output /data/Downloads/Fertig/FileBot/ --format "{n}/Season {s}/{n}.{s00e00}.{t}" --db TheTVDB 
filebot -get-missing-subtitles -non-strict -r --lang en /data/Downloads/Fertig/FileBot/
filebot -script fn:replace --conflict override --def "e=.eng.srt" "r=.srt" /data/Downloads/Fertig/FileBot/
RemoteDesktop ArchLinux Client
z.B. auf Busfahrer: rdesktop -g 1440x900 -P -z -x l -r sound:off -u gagi 192.168.1.149
Backplane-Rotation zur Fehlerdiagnose
Urzustand mit 24 Platten 2013-11-12
/dev/sdx -> 00 : 00000000000000 (1TB) 31��C /dev/sdp -> 01 : WD-WCC070299387 (3TB WD) 31��C /dev/sdq -> 03 : MJ1311YNG3SSLA (3TB) 33��C /dev/sdr -> 05 : MJ1311YNG3NZ3A (3TB) 32��C /dev/sds -> 07 : MJ1311YNG4J48A (3TB) 32��C /dev/sdt -> 09 : MJ1311YNG3UUPA (3TB) 33��C /dev/sdu -> 11 : MJ1311YNG3SAMA (3TB) 32��C /dev/sdv -> 13 : MJ1311YNG3SU1A (3TB) 34��C /dev/sdw -> 15 : MCE9215Q0AUYTW (3TB Toshiba neu) 31��C /dev/sdh -> 16 : MJ0351YNGA02YA (3TB) nicht im Einsatz, bb-check 2013-08-28 37��C /dev/sdi -> 18 : MJ1311YNG3Y4SA (3TB) 40��C /dev/sdj -> 20 : WD-WCAWZ1881335 (3TB WD) hot spare 38��C /dev/sdk -> 22 : WD-WCAWZ2279670 (3TB WD) 41��C /dev/sdl -> 24 : MJ1311YNG25Z6A (3TB) 39��C /dev/sdm -> 26 : MJ1311YNG3RM5A (3TB) 39��C /dev/sdn -> 28 : MJ1311YNG3NT5A (3TB) 40��C /dev/sdo -> 30 : MCM9215Q0B9LSY (3TB Toshiba neu) 38��C /dev/sda -> 31 : 234BGY0GS (3TB Toshiba neu) 40��C /dev/sdb -> 33 : MJ1311YNG3WZVA (3TB) 43��C /dev/sdc -> 35 : MJ1311YNG3SYKA (3TB) 42��C /dev/sdd -> 37 : WD-WCC070198169 (3TB WD) 41��C /dev/sde -> 39 : MJ1311YNG3RZTA (3TB) 39��C /dev/sdf -> 41 : MJ1311YNG3LTRA (3TB) 39��C /dev/sdg -> 43 : MJ1311YNG38VGA (3TB) 39��C Insgesamt 24 Platten gefunden.
Crashes
ArchLinux 2011-09-09
Sep 9 18:20:04 localhost kernel: [156439.479947] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Sep 9 18:20:04 localhost kernel: [156439.480035] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Sep 9 18:20:04 localhost kernel: [156439.486612] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Sep 9 18:20:04 localhost kernel: [156439.503656] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Sep 9 18:20:04 localhost kernel: [156439.504562] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Sep 9 18:34:11 localhost -- MARK -- Sep 9 18:42:42 localhost kernel: [157797.911330] r8169: eth0: link up Sep 9 18:54:11 localhost -- MARK -- Sep 9 19:14:11 localhost -- MARK -- Sep 9 19:34:11 localhost -- MARK -- Sep 9 19:54:11 localhost -- MARK -- Sep 9 20:14:11 localhost -- MARK -- Sep 9 20:27:32 localhost kernel: [164086.971566] r8169: eth0: link up Sep 9 20:27:42 localhost kernel: [164097.580071] r8169: eth0: link up Sep 9 20:27:50 localhost kernel: [164105.391755] r8169: eth0: link up Sep 9 20:27:51 localhost kernel: [164106.272019] r8169: eth0: link up Sep 9 20:28:12 localhost kernel: [164127.150062] r8169: eth0: link up Sep 9 20:28:22 localhost kernel: [164137.941304] r8169: eth0: link up Sep 9 20:28:33 localhost kernel: [164148.890097] r8169: eth0: link up Sep 9 20:28:38 localhost kernel: [164153.080536] r8169: eth0: link up Sep 9 20:28:58 localhost kernel: [164173.790064] r8169: eth0: link up Sep 9 20:42:19 localhost kernel: [ 0.000000] Initializing cgroup subsys cpuset Sep 9 20:42:19 localhost kernel: [ 0.000000] Initializing cgroup subsys cpu Sep 9 20:42:19 localhost kernel: [ 0.000000] Linux version 2.6.32-lts (tobias@T-POWA-LX) (gcc version 4.6.1 20110819 (prerelease) (GCC) ) #1 SMP Tue Aug 30 08:59:44 CEST 2011 Sep 9 20:42:19 localhost kernel: [ 0.000000] Command line: root=/dev/disk/by-uuid/ba47ea9a-c24c-4dc6-a9a2-ca3b442bdbfc ro vga=0x31B Sep 9 20:42:19 localhost kernel: [ 0.000000] KERNEL supported cpus: Sep 9 20:42:19 localhost kernel: [ 0.000000] Intel GenuineIntel Sep 9 20:42:19 localhost kernel: [ 0.000000] AMD AuthenticAMD Sep 9 20:42:19 localhost kernel: [ 0.000000] Centaur CentaurHauls
OpenSuse 2011-09-26
Sep 26 23:15:59 store2 su: (to nobody) root on none Sep 26 23:17:17 su: last message repeated 2 times Sep 26 23:25:23 store2 smartd[4617]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 162 Sep 26 23:25:26 store2 smartd[4617]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166 Sep 26 23:25:29 store2 smartd[4617]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166 Sep 26 23:25:36 store2 smartd[4617]: Device: /dev/sdk [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166 Sep 26 23:25:37 store2 smartd[4617]: Device: /dev/sdl [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 181 to 187 Sep 26 23:55:22 store2 smartd[4617]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 96 Sep 26 23:55:23 store2 smartd[4617]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 162 to 166 Sep 26 23:55:26 store2 smartd[4617]: Device: /dev/sde [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 26 23:55:26 store2 smartd[4617]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 171 Sep 26 23:55:29 store2 smartd[4617]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 171 Sep 26 23:55:32 store2 smartd[4617]: Device: /dev/sdi [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 98 Sep 27 00:55:26 store2 kernel: imklog 5.6.5, log source = /proc/kmsg started.
OpenSuse2011-09-27
Sep 27 16:35:17 store2 smbd[29588]: [2011/09/27 16:35:17.391212, 0] param/loadparm.c:8445(check_usershare_stat) Sep 27 16:35:17 store2 smbd[29588]: check_usershare_stat: file /var/lib/samba/usershares/ owned by uid 0 is not a regular file Sep 27 16:44:06 store2 smbd[29163]: [2011/09/27 16:44:06.795153, 0] lib/util_sock.c:474(read_fd_with_timeout) Sep 27 16:44:06 store2 smbd[29163]: [2011/09/27 16:44:06.795341, 0] lib/util_sock.c:1441(get_peer_addr_internal) Sep 27 16:44:06 store2 smbd[29597]: [2011/09/27 16:44:06.795323, 0] lib/util_sock.c:474(read_fd_with_timeout) Sep 27 16:44:06 store2 smbd[29163]: getpeername failed. Error was Der Socket ist nicht verbunden Sep 27 16:44:06 store2 smbd[29592]: [2011/09/27 16:44:06.795368, 0] lib/util_sock.c:474(read_fd_with_timeout) Sep 27 16:44:06 store2 smbd[29163]: read_fd_with_timeout: client 0.0.0.0 read error = Die Verbindung wurde vom Kommunikationspartner zurückgesetzt. Sep 27 16:44:06 store2 smbd[29597]: [2011/09/27 16:44:06.795422, 0] lib/util_sock.c:1441(get_peer_addr_internal) Sep 27 16:44:06 store2 smbd[29597]: getpeername failed. Error was Der Socket ist nicht verbunden Sep 27 16:44:06 store2 smbd[29597]: read_fd_with_timeout: client 0.0.0.0 read error = Die Verbindung wurde vom Kommunikationspartner zurückgesetzt. Sep 27 16:44:06 store2 smbd[29592]: [2011/09/27 16:44:06.795468, 0] lib/util_sock.c:1441(get_peer_addr_internal) Sep 27 16:44:06 store2 smbd[29592]: getpeername failed. Error was Der Socket ist nicht verbunden Sep 27 16:44:06 store2 smbd[29592]: read_fd_with_timeout: client 0.0.0.0 read error = Die Verbindung wurde vom Kommunikationspartner zurückgesetzt. Sep 27 16:45:42 store2 smbd[29585]: [2011/09/27 16:45:42.499038, 0] lib/util_sock.c:474(read_fd_with_timeout) Sep 27 16:45:42 store2 smbd[29593]: [2011/09/27 16:45:42.499082, 0] lib/util_sock.c:474(read_fd_with_timeout) Sep 27 16:45:42 store2 smbd[29593]: [2011/09/27 16:45:42.499174, 0] lib/util_sock.c:1441(get_peer_addr_internal) Sep 27 16:45:42 store2 smbd[29585]: [2011/09/27 16:45:42.499174, 0] lib/util_sock.c:1441(get_peer_addr_internal) Sep 27 16:45:42 store2 smbd[29593]: getpeername failed. Error was Der Socket ist nicht verbunden Sep 27 16:45:42 store2 smbd[29585]: getpeername failed. Error was Der Socket ist nicht verbunden Sep 27 16:45:42 store2 smbd[29593]: read_fd_with_timeout: client 0.0.0.0 read error = Die Verbindung wurde vom Kommunikationspartner zurückgesetzt. Sep 27 16:45:42 store2 smbd[29585]: read_fd_with_timeout: client 0.0.0.0 read error = Die Verbindung wurde vom Kommunikationspartner zurückgesetzt. Sep 27 19:35:14 store2 kernel: imklog 5.6.5, log source = /proc/kmsg started.
OpenSuse 2011-09-29
während kräftigem Copyjob von Store
Sep 29 23:16:19 su: last message repeated 2 times Sep 29 23:28:41 store2 smartd[4624]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 162 Sep 29 23:28:44 store2 smartd[4624]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 153 to 157 Sep 29 23:28:49 store2 smartd[4624]: Device: /dev/sdh [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 29 23:28:53 store2 smartd[4624]: Device: /dev/sdk [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 162 to 157 Sep 29 23:28:57 store2 smartd[4624]: Device: /dev/sdo [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 176 to 181 Sep 29 23:58:44 store2 smartd[4624]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 29 23:58:49 store2 smartd[4624]: Device: /dev/sdh [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 29 23:58:53 store2 smartd[4624]: Device: /dev/sdk [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 162 Sep 29 23:58:57 store2 smartd[4624]: Device: /dev/sdn [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 187 to 193 Sep 29 23:58:58 store2 smartd[4624]: Device: /dev/sdo [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 181 to 176 Sep 29 23:59:02 store2 smartd[4624]: Device: /dev/sdq [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 176 to 181 Sep 30 00:28:41 store2 smartd[4624]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 187 to 193 Sep 30 00:28:43 store2 smartd[4624]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 00:28:49 store2 smartd[4624]: Device: /dev/sdh [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 00:28:58 store2 smartd[4624]: Device: /dev/sdo [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 176 to 181 Sep 30 00:58:47 store2 smartd[4624]: Device: /dev/sdf [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 00:58:49 store2 smartd[4624]: Device: /dev/sdh [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 00:58:59 store2 smartd[4624]: Device: /dev/sdp [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 01:28:47 store2 smartd[4624]: Device: /dev/sdf [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 01:28:47 store2 smartd[4624]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 162 Sep 30 01:28:50 store2 smartd[4624]: Device: /dev/sdi [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 01:58:47 store2 smartd[4624]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 162 to 157 Sep 30 01:59:00 store2 smartd[4624]: Device: /dev/sdp [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 02:28:45 store2 smartd[4624]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 02:28:46 store2 smartd[4624]: Device: /dev/sde [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 02:28:46 store2 smartd[4624]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 162 Sep 30 02:28:48 store2 smartd[4624]: Device: /dev/sdf [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 02:28:52 store2 smartd[4624]: Device: /dev/sdi [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 02:58:45 store2 smartd[4624]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 02:58:46 store2 smartd[4624]: Device: /dev/sde [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 02:58:46 store2 smartd[4624]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 162 to 157 Sep 30 02:58:47 store2 smartd[4624]: Device: /dev/sdf [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 99 to 100 Sep 30 02:58:49 store2 smartd[4624]: Device: /dev/sdh [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 100 to 99 Sep 30 09:39:22 store2 kernel: imklog 5.6.5, log source = /proc/kmsg started.
What you are seeing are the Normalized Attribute values changing.
For example when the Raw_Read_Error_Rate changed from 99 to 100, the increase in Normalized value from 99 to 100 means that the disk now thinks it is a bit LESS likely to fail than before, because this Normalized value is moving further above the (low) Threshold value.
ArchLinux 2011-10-17
Oct 17 21:21:35 localhost smartd[1941]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 181 to 176 Oct 17 21:21:37 localhost smartd[1941]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 181 to 176 Oct 17 21:21:45 localhost smartd[1941]: Device: /dev/sdm [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 200 to 206 Oct 17 21:30:03 localhost -- MARK -- Oct 17 21:50:03 localhost -- MARK -- Oct 17 21:51:37 localhost smartd[1941]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 176 to 181 Oct 17 21:51:41 localhost smartd[1941]: Device: /dev/sdi [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 181 to 187 Oct 17 22:10:03 localhost -- MARK -- Oct 17 22:21:34 localhost smartd[1941]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 176 to 181 Oct 17 22:21:47 localhost smartd[1941]: Device: /dev/sdo [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 206 to 214 Oct 17 22:21:49 localhost smartd[1941]: Device: /dev/sdq [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 206 to 214 Oct 17 22:30:03 localhost -- MARK -- Oct 17 22:50:03 localhost -- MARK -- Oct 17 23:11:18 localhost kernel: [ 0.000000] Initializing cgroup subsys cpuset Oct 17 23:11:18 localhost kernel: [ 0.000000] Initializing cgroup subsys cpu
ArchLinux 2011-11-06
Nov 6 12:39:05 localhost -- MARK -- Nov 6 12:42:18 localhost smartd[1927]: Device: /dev/sdi [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 193 to 200 Nov 6 12:42:20 localhost smartd[1927]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 193 to 187 Nov 6 12:42:24 localhost smartd[1927]: Device: /dev/sdn [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 222 to 214 Nov 6 12:42:25 localhost smartd[1927]: Device: /dev/sdo [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 230 to 222 Nov 6 12:42:26 localhost smartd[1927]: Device: /dev/sdp [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 214 to 222 Nov 6 12:59:05 localhost -- MARK -- Nov 6 14:29:21 localhost kernel: [ 0.000000] Initializing cgroup subsys cpuset Nov 6 14:29:21 localhost kernel: [ 0.000000] Initializing cgroup subsys cpu Nov 6 14:29:21 localhost kernel: [ 0.000000] Linux version 3.0-ARCH (tobias@T-POWA-LX) (gcc version 4.6.1 20110819 (prerelease) (GCC) ) #1 SMP PREEMPT Wed Oct$ Nov 6 14:29:21 localhost kernel: [ 0.000000] Command line: root=/dev/disk/by-id/ata-Hitachi_HCS5C1016CLA382_JC0150HT0J7TPC-part3 ro Nov 6 14:29:21 localhost kernel: [ 0.000000] BIOS-provided physical RAM map:
EINFACH SO!
ArchLinux 2011-11-21
Dabei war vorher ein Systemupdate gelaufen (inklusive neuem Kernel), aber noch nicht rebootet.
Nov 21 09:30:27 localhost smartd[2208]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 200 to 206 Nov 21 09:30:30 localhost smartd[2208]: Device: /dev/sdl [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 240 to 250 Nov 21 09:30:31 localhost smartd[2208]: Device: /dev/sdm [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 250 to 240 Nov 21 09:30:35 localhost smartd[2208]: Device: /dev/sdp [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 230 to 240 Nov 21 09:43:32 localhost kernel: [1280595.864622] ------------[ cut here ]------------ Nov 21 09:43:32 localhost kernel: [1280595.864636] WARNING: at drivers/gpu/drm/i915/i915_irq.c:649 ironlake_irq_handler+0x1102/0x1110 [i915]() Nov 21 09:43:32 localhost kernel: [1280595.864638] Hardware name: H61M-S2V-B3 Nov 21 09:43:32 localhost kernel: [1280595.864639] Missed a PM interrupt Nov 21 09:43:32 localhost kernel: [1280595.864640] Modules linked in: xfs sha256_generic dm_crypt dm_mod raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod coretemp nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 sr_mod cdrom snd_hda_codec_realtek usb_storage uas sg evdev snd_hda_intel snd_hda_codec iTCO_wdt snd_hwdep snd_pcm snd_timer i915 snd drm_kms_helper drm pcspkr i2c_algo_bit r8169 ppdev i2c_i801 shp chp parport_pc intel_agp i2c_core pci_hotplug parport intel_gtt mei(C) soundcore snd_page_alloc processor button mii iTCO_vendor_support video aesni_intel cryptd aes_x86_64 aes_generic ext4 mbcache jbd2 crc16 usbhid hid sd_mod sata_sil24 ahci libahci libata scsi_mod ehci_hcd usbcore Nov 21 09:43:32 localhost kernel: [1280595.864674] Pid: 0, comm: swapper Tainted: G C 3.0-ARCH #1 Nov 21 09:43:32 localhost kernel: [1280595.864675] Call Trace: Nov 21 09:43:32 localhost kernel: [1280595.864676] <IRQ> [<ffffffff8105c76f>] warn_slowpath_common+0x7f/0xc0 Nov 21 09:43:32 localhost kernel: [1280595.864684] [<ffffffff8105c866>] warn_slowpath_fmt+0x46/0x50 Nov 21 09:43:32 localhost kernel: [1280595.864688] [<ffffffff81078f7d>] ? queue_work+0x5d/0x70 Nov 21 09:43:32 localhost kernel: [1280595.864693] [<ffffffffa0235a22>] ironlake_irq_handler+0x1102/0x1110 [i915] Nov 21 09:43:32 localhost kernel: [1280595.864696] [<ffffffff812a4bc5>] ? dma_issue_pending_all+0x95/0xa0 Nov 21 09:43:32 localhost kernel: [1280595.864699] [<ffffffff81333db1>] ? net_rx_action+0x131/0x300 Nov 21 09:43:32 localhost kernel: [1280595.864702] [<ffffffff810bf835>] handle_irq_event_percpu+0x75/0x2a0 Nov 21 09:43:32 localhost kernel: [1280595.864705] [<ffffffff810bfaa5>] handle_irq_event+0x45/0x70 Nov 21 09:43:32 localhost kernel: [1280595.864707] [<ffffffff810c21af>] handle_edge_irq+0x6f/0x120 Nov 21 09:43:32 localhost kernel: [1280595.864710] [<ffffffff8100d9f2>] handle_irq+0x22/0x40 Nov 21 09:43:32 localhost kernel: [1280595.864712] [<ffffffff813f66aa>] do_IRQ+0x5a/0xe0 Nov 21 09:43:32 localhost kernel: [1280595.864715] [<ffffffff813f4393>] common_interrupt+0x13/0x13 Nov 21 09:43:32 localhost kernel: [1280595.864716] <EOI> [<ffffffff81273cdb>] ? intel_idle+0xcb/0x120 Nov 21 09:43:32 localhost kernel: [1280595.864720] [<ffffffff81273cbd>] ? intel_idle+0xad/0x120 Nov 21 09:43:32 localhost kernel: [1280595.864723] [<ffffffff81313d9d>] cpuidle_idle_call+0x9d/0x350 Nov 21 09:43:32 localhost kernel: [1280595.864726] [<ffffffff8100a21a>] cpu_idle+0xba/0x100 Nov 21 09:43:32 localhost kernel: [1280595.864729] [<ffffffff813d1eb2>] rest_init+0x96/0xa4 Nov 21 09:43:32 localhost kernel: [1280595.864731] [<ffffffff81748c23>] start_kernel+0x3de/0x3eb Nov 21 09:43:32 localhost kernel: [1280595.864733] [<ffffffff81748347>] x86_64_start_reservations+0x132/0x136 Nov 21 09:43:32 localhost kernel: [1280595.864735] [<ffffffff81748140>] ? early_idt_handlers+0x140/0x140 Nov 21 09:43:32 localhost kernel: [1280595.864737] [<ffffffff8174844d>] x86_64_start_kernel+0x102/0x111 Nov 21 09:43:32 localhost kernel: [1280595.864738] ---[ end trace 01037f4ec3ec4ee5 ]--- Nov 21 10:00:16 localhost smartd[2208]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 206 to 200 Nov 21 10:00:18 localhost smartd[2208]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 200 to 193 Nov 21 10:00:19 localhost smartd[2208]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 206 to 200 Nov 21 10:00:23 localhost smartd[2208]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 206 to 200 Nov 21 10:00:29 localhost smartd[2208]: Device: /dev/sdl [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 250 to 240 Nov 21 10:00:30 localhost smartd[2208]: Device: /dev/sdm [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 240 to 250 Nov 21 10:00:33 localhost smartd[2208]: Device: /dev/sdp [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 240 to 230 Nov 21 10:00:34 localhost smartd[2208]: Device: /dev/sdq [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 240 to 250 Nov 21 11:52:01 localhost kernel: [ 0.000000] Initializing cgroup subsys cpuset Nov 21 11:52:01 localhost kernel: [ 0.000000] Initializing cgroup subsys cpu