Bienvenidos

Todos estos pasos descriptos fueron probados en ambientes productivos

lunes, 13 de junio de 2011

I/O Error Sobre Storage Externo

Me llamaron diciendo que les daba IO error algunos filesystems, el error que mostraba el /var/adm/messages era este :
Aug 17 16:29:17 coneja fctl: [ID 517869 kern.warning] WARNING: 942=>fp(3)::GPN_ID for D_ID=64a200 failed
Aug 17 16:29:17 coneja fctl: [ID 517869 kern.warning] WARNING: 943=>fp(3)::N_x Port with D_ID=64a200, PWWN=5005076801403680 disappeared from fabric
Aug 17 16:29:17 coneja fctl: [ID 517869 kern.warning] WARNING: 952=>fp(3)::GPN_ID for D_ID=64e200 failed
Aug 17 16:29:17 coneja fctl: [ID 517869 kern.warning] WARNING: 953=>fp(3)::N_x Port with D_ID=64e200, PWWN=5005076801303680 disappeared from fabric
Aug 17 16:29:18 coneja fctl: [ID 517869 kern.warning] WARNING: 964=>fp(1)::GPN_ID for D_ID=64a900 failed
Aug 17 16:29:18 coneja fctl: [ID 517869 kern.warning] WARNING: 965=>fp(1)::N_x Port with D_ID=64a900, PWWN=5005076801103680 disappeared from fabric
Aug 17 16:29:18 coneja fctl: [ID 517869 kern.warning] WARNING: 974=>fp(1)::GPN_ID for D_ID=64e900 failed
Aug 17 16:29:18 coneja fctl: [ID 517869 kern.warning] WARNING: 975=>fp(1)::N_x Port with D_ID=64e900, PWWN=5005076801203680 disappeared from fabric
Aug 17 17:03:28 coneja mpxio: [ID 669396 kern.info] /scsi_vhci/ssd@g60050768019901b4000000000000002a (ssd107) multipath status: optimal, path /pci@8,600000/SUN
W,qlc@2/fp@0,0 (fp3) to target address: 5005076801403680,0 is offline. Load balancing: round-robin
Aug 17 17:03:28 coneja scsi: [ID 243001 kern.info]    Target 0x64e200: Device type=0x1f Nonzero pqual=0x1


El storage externo , tuvo problemas y los fs quedaron con IO error
chequeo el metastat |grep -i Errored
los filesystems quedan doblados, lo que hay que hacer es un umount del filesystem y montarlo, sino chilla, lo dejo asi.
Si putea por io error le corro un fsck  y luego lo monto.

[coneja] /var/adm # umount /u18
[coneja] /var/adm # mount /u18
mount: I/O error
mount: cannot mount /dev/md/dsk/d58

[coneja] /var/adm # fsck /dev/md/dsk/d58
** /dev/md/rdsk/d58
** Last Mounted on /u18
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups

CORRECT BAD CG SUMMARIES FOR CG 0? y

CORRECTED SUPERBLOCK SUMMARIES FOR CG 0
CORRECTED SUMMARIES FOR CG 0
FRAG BITMAP WRONG
FIX? y

CORRECT GLOBAL SUMMARY
SALVAGE? y

Log was discarded, updating cyl groups
10 files, 12582971 used, 5358831 free (7 frags, 669853 blocks, 0.0% fragmentation)

***** FILE SYSTEM WAS MODIFIED *****

[coneja] /var/adm #

Algunos filesystems que NO tiraron IO error pero daban Errored en el status del metastat, no hizo falta correrles el fsck, solo umount y mount
[coneja] /var/adm # umount /u01
umount: /u01 busy
[coneja] /var/adm # metastat d41
d41: Soft Partition
    Device: d40
    State: Errored
    Size: 41943040 blocks (20 GB)
        Extent              Start Block              Block count
             0                     2080                 41943040

coneja] /var/adm # umount -f /u01
[coneja] /var/adm # mount  /u01
[coneja] /var/adm # metastat d41
d41: Soft Partition
    Device: d40
    State: Okay
    Size: 41943040 blocks (20 GB)
        Extent              Start Block              Block count
             0                     2080                 41943040

Device Relocation Information:
Device                                  Reloc   Device ID
c6t60050768019901B4000000000000002Ad0   Yes     id1,ssd@w60050768019901b4000000000000002a
c6t60050768019901B4000000000000002Bd0   Yes     id1,ssd@w60050768019901b4000000000000002b
c6t60050768019901B4000000000000002Cd0   Yes     id1,ssd@w60050768019901b4000000000000002c
[coneja] /var/adm #

Por lo que estuve viendo en sunsolve, hay un parche mas nuevo para la SAN, que es el 113039-25 (requiere REBOOT) quizas solucione esto.

1 comentario: