Finalidad del documento :
En Base a un m5000 con 2 dominios asignados y con Solaris instalado, tenemos que armar un solo dominio con todos los componentes de hardware que disponemos.
Hardware utilizado :
1 m5000 con 64gb de ram , 4 cpu octacore
Descripcion de los componentes del m5000
XSB, eXtended System Board, estas se pueden configurar en 2 modos, Uni-mod y quad-mode, para poder configurar dominios, cada uno de estos debe tener un LSB asociada.
PSB, Physical System Board, cada PSB esta compuesta por CPU , Memoria y IO board.
CPUM , Cpu Memory Board
MEMB, Memory Board
LSB, Logical System Board
A continuacion, se muestra como esta la configuracion del hardware, antes del armado final.
Vemos que hay 2 dominios configurados con 2 S.O corriendo
XSCF>
showboards -a
XSB DID(LSB) Assignment Pwr Conn Conf Test Fault
---- -------- ----------- ---- ---- ---- ------- --------
00-0 00(00) Assigned y y y Passed Normal
01-0 01(00) Assigned y y y Passed Normal
XSCF>
showdomainstatus -a
DID Domain Status
00 Running
01 Running
02 -
03 -
XSCF>
showdomainstatus -d0
DID Domain Status
00 Running
XSCF>
showdscp
DSCP Configuration:
Network: 192.168.224.0
Netmask: 255.255.255.0
Location Address
---------- ---------
XSCF 192.168.224.1
Domain #00 192.168.224.2
Domain #01 192.168.224.3
Domain #02 192.168.224.4
Domain #03 192.168.224.5
XSCF>
XSCF> showfru -a sb
Device Location XSB Mode Memory Mirror Mode
sb 00 Uni no
sb 01 Uni no
XSCF> showhardconf
SPARC Enterprise M5000;
+ Serial:BDF1245599; Operator_Panel_Switch:Locked;
+ Power_Supply_System:Single; SCF-ID:XSCF#0;
+ System_Power:On; System_Phase:Cabinet Power On;
Domain#0 Domain_Status:Running;
Domain#1 Domain_Status:Running;
MBU_B Status:Normal; Ver:4401h; Serial:BD124500AG ;
+ FRU-Part-Number:CF00541-4360 01 /541-4360-01 ;
+ Memory_Size:64 GB;
+ Type:2;
CPUM#0-CHIP#0 Status:Normal; Ver:0601h; Serial:PP124200D2 ;
+ FRU-Part-Number:CA06761-D205 C3 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
CPUM#0-CHIP#1 Status:Normal; Ver:0601h; Serial:PP124200D2 ;
+ FRU-Part-Number:CA06761-D205 C3 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
CPUM#2-CHIP#0 Status:Normal; Ver:0601h; Serial:PP124101TJ ;
+ FRU-Part-Number:CA06761-D205 C3 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
CPUM#2-CHIP#1 Status:Normal; Ver:0601h; Serial:PP124101TJ ;
+ FRU-Part-Number:CA06761-D205 C3 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
MEMB#0 Status:Normal; Ver:0101h; Serial:NN1242F7UL ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;
MEM#0A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f93f;
+ Type:4B; Size:4 GB;
MEM#0B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f935;
+ Type:4B; Size:4 GB;
MEM#1A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f93e;
+ Type:4B; Size:4 GB;
MEM#1B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f940;
+ Type:4B; Size:4 GB;
MEM#2A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f999;
+ Type:4B; Size:4 GB;
MEM#2B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f92a;
+ Type:4B; Size:4 GB;
MEM#3A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f937;
+ Type:4B; Size:4 GB;
MEM#3B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343f93a;
+ Type:4B; Size:4 GB;
MEMB#4 Status:Normal; Ver:0101h; Serial:NN1242F7V2 ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;
MEM#0A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2344045b;
+ Type:4B; Size:4 GB;
MEM#0B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-234403d6;
+ Type:4B; Size:4 GB;
MEM#1A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-234403e1;
+ Type:4B; Size:4 GB;
MEM#1B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2343035a;
+ Type:4B; Size:4 GB;
* MEM#2A Status:Degraded;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2344047f;
+ Type:4B; Size:4 GB;
MEM#2B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2344045d;
+ Type:4B; Size:4 GB;
MEM#3A Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-23440340;
+ Type:4B; Size:4 GB;
MEM#3B Status:Normal;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-23440459;
+ Type:4B; Size:4 GB;
DDC_A#0 Status:Normal;
DDC_A#1 Status:Normal;
DDC_A#2 Status:Normal;
DDC_A#3 Status:Normal;
DDC_B#0 Status:Normal;
DDC_B#1 Status:Normal;
IOU#0 Status:Normal; Ver:0101h; Serial:NN1235ETAK ;
+ FRU-Part-Number:CF00541-2240 05 /541-2240-05 ;
+ Type:1;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
PCI#1 Name_Property:SUNW,qlc; Card_Type:Other;
PCI#2 Name_Property:network; Card_Type:Other;
PCI#3 Name_Property:SUNW,qlc; Card_Type:Other;
PCI#4 Name_Property:SUNW,qlc; Card_Type:Other;
IOU#1 Status:Normal; Ver:0101h; Serial:NN1234EGTL ;
+ FRU-Part-Number:CF00541-2240 05 /541-2240-05 ;
+ Type:1;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
PCI#1 Name_Property:SUNW,qlc; Card_Type:Other;
PCI#2 Name_Property:network; Card_Type:Other;
PCI#3 Name_Property:SUNW,qlc; Card_Type:Other;
PCI#4 Name_Property:SUNW,qlc; Card_Type:Other;
XSCFU Status:Normal,Active; Ver:0101h; Serial:NN1239F0NH ;
+ FRU-Part-Number:CF00541-0481 05 /541-0481-05 ;
OPNL Status:Normal; Ver:0101h; Serial:NN1235EMT1 ;
+ FRU-Part-Number:CF00541-0850 06 /541-0850-06 ;
PSU#0 Status:Normal; Serial:476856F+1142AD0055;
+ FRU-Part-Number:CF00300-2311 0101 /300-2311-01-01;
+ Power_Status:On; AC:200 V;
PSU#1 Status:Normal; Serial:476856F+1153AD00M6;
+ FRU-Part-Number:CF00300-2311 0101 /300-2311-01-01;
+ Power_Status:On; AC:200 V;
PSU#2 Status:Normal; Serial:1357FYG-1047AD003J;
+ FRU-Part-Number:CF00300-2311 0101 /300-2311-01-01;
+ Power_Status:On; AC:200 V;
PSU#3 Status:Normal; Serial:476856F+1141AD0019;
+ FRU-Part-Number:CF00300-2311 0101 /300-2311-01-01;
+ Power_Status:On; AC:200 V;
FANBP_C Status:Normal; Ver:0501h; Serial:NN1235ER95;
+ FRU-Part-Number:CF00541-3099 01 /541-3099-01 ;
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FAN_A#2 Status:Normal;
FAN_A#3 Status:Normal;
XSCF>
Ahora comenzamos con las tareas de reconfiguracion
XSCF>
XSCF> showfru sb 00
Device Location XSB Mode Memory Mirror Mode
sb 00 Uni no
XSCF> showdcl -a
DID LSB XSB Status
00 Running
00 00-0
---------------------------
01 Running
00 01-0
XSCF> showboards -a -v
XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD
---- - -------- ----------- ---- ---- ---- ------- -------- ----
00-0 00(00) Assigned y y y Passed Normal n
01-0 01(00) Assigned y y y Passed Normal n
XSCF> showfru sb 0
Device Location XSB Mode Memory Mirror Mode
sb 00 Uni no
XSCF> showfru sb 1
Device Location XSB Mode Memory Mirror Mode
sb 01 Uni no
Antes de sacar la placa XSB , los dominios tienen que estar apagados, sino sucedera este error :
XSCF> deleteboard -c unassign 00-0
XSB#00-0 will be unassigned from domain immediately. Continue?[y|n] :y
XSB#00-0 is the last LSB for DomainID 0, and this domain is still running. Operation failed.
Ahora si, apagamos los dominios.
XSCF> poweroff -d 0
DomainIDs to power off:00
Continue? [y|n] :y
00 :Powering off
*Note*
This command only issues the instruction to power-off.
The result of the instruction can be checked by the "showlogs power".
XSCF> showdcl -a
DID LSB XSB Status
00 Running (Waiting for OS Shutdown)
00 00-0
---------------------------
01 Running
00 01-0
XSCF> poweroff -d 1
DomainIDs to power off:01
Continue? [y|n] :y
01 :Powering off
*Note*
This command only issues the instruction to power-off.
The result of the instruction can be checked by the "showlogs power".
XSCF> showdcl -a
DID LSB XSB Status
00 Running (Waiting for OS Shutdown)
00 00-0
---------------------------
01 Running (Waiting for OS Shutdown)
00 01-0
XSCF> showdcl -a
DID LSB XSB Status
00 Shutdown Started
00 00-0
---------------------------
01 Running (Waiting for OS Shutdown)
00 01-0
XSCF> showdcl -a
DID LSB XSB Status
00 Powered Off
00 00-0
---------------------------
01 Running (Waiting for OS Shutdown)
00 01-0
XSCF> showdcl -a
DID LSB XSB Status
00 Powered Off
00 00-0
---------------------------
01 Shutdown Started
00 01-0
XSCF> showdcl -a
DID LSB XSB Status
00 Powered Off
00 00-0
---------------------------
01 Powered Off
00 01-0
Ahora que estan apagados los 2 dominios, procedemos a quitar la board 0 y 1
XSCF> deleteboard -c unassign 00-0
XSB#00-0 will be unassigned from domain immediately. Continue?[y|n] :y
XSCF> deleteboard -c unassign 00-0
XSCF> setdcl -d 0 -r 00
XSCF> deleteboard -c unassign 01-0
XSB#01-0 will be unassigned from domain immediately. Continue?[y|n] :y
XSCF>
XSCF> setdcl -d 0 -r 01
XSCF> showboards -av
XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD
---- - -------- ----------- ---- ---- ---- ------- -------- ----
00-0 SP Available n n n Passed Normal n
01-0 SP Available n n n Passed Normal n
XSCF> showdomainstatus -a
DID Domain Status
00 -
01 -
02 -
03 -
XSCF> setupfru -x 1 sb 0
XSCF> showfru -a sb
Device Location XSB Mode Memory Mirror Mode
sb 00 Uni no
sb 01 Uni no
XSCF> showfru sb 0
Device Location XSB Mode Memory Mirror Mode
sb 00 Uni no
XSCF> showfru sb 1
Device Location XSB Mode Memory Mirror Mode
sb 01 Uni no
XSCF> setupfru -x 1 sb 1
XSCF> showfru sb 1
Device Location XSB Mode Memory Mirror Mode
sb 01 Uni no
XSCF> setdcl -d 0 -a 0=00-0
XSCF> setdcl -d 0 -a 1=00-1
XSCF> setdcl -d 0 -a 0=01-0
LSB#00 is already registered in DCL.
XSCF> setdcl -d 0 -a 2=01-0
XSCF> setdcl -d 0 -a 3=01-1
XSCF> addboard -c assign -d 0 00-0
XSB#00-0 will be assigned to DomainID 0. Continue?[y|n] :y
XSCF> addboard -c assign -d 0 00-1
XSB#00-1 will be assigned to DomainID 0. Continue?[y|n] :y
XSB#00-1 is not installed.
XSCF> addboard -c assign -d 0 01-0
XSB#01-0 will be assigned to DomainID 0. Continue?[y|n] :y
XSCF> addboard -c assign -d 0 01-1
XSB#01-1 will be assigned to DomainID 0. Continue?[y|n] :y
XSB#01-1 is not installed.
XSCF> showdcl -v -d 0
DID LSB XSB Status No-Mem No-IO Float Cfg-policy
00 Powered Off FRU
00 00-0 False False False
01 00-1 False False False
02 01-0 False False False
03 01-1 False False False
04 -
05 -
06 -
07 -
08 -
09 -
10 -
11 -
12 -
13 -
14 -
15 -
XSCF> showboards -v -a
XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD
---- - -------- ----------- ---- ---- ---- ------- -------- ----
00-0 * 00(00) Assigned n n n Unknown Normal n
01-0 * 00(02) Assigned n n n Unknown Normal n
XSCF> showdcl -v -a
DID LSB XSB Status No-Mem No-IO Float Cfg-policy
00 Powered Off FRU
00 00-0 False False False
01 00-1 False False False
02 01-0 False False False
03 01-1 False False False
04 -
05 -
06 -
07 -
08 -
09 -
10 -
11 -
12 -
13 -
14 -
15 -
---------------------------------------------------------------
01 Powered Off FRU
00 01-0 False False False
01 -
02 -
03 -
04 -
05 -
06 -
07 -
08 -
09 -
10 -
11 -
12 -
13 -
14 -
15 -
XSCF> showdcl -a
DID LSB XSB Status
00 Powered Off
00 00-0
01 00-1
02 01-0
03 01-1
---------------------------
01 Powered Off
00 01-0
XSCF> setdcl -d 1 -r 00
XSCF> showdcl -a
DID LSB XSB Status
00 Powered Off
00 00-0
01 00-1
02 01-0
03 01-1
XSCF> addboard -c assign -d 0 00-1
XSB#00-1 will be assigned to DomainID 0. Continue?[y|n] :y
XSB#00-1 is not installed.
XSCF> addboard -c assign -d 0 01-1
XSB#01-1 will be assigned to DomainID 0. Continue?[y|n] :y
XSB#01-1 is not installed.
XSCF> poweron -d 0
DomainIDs to power on:00
Continue? [y|n] :y
00 :Powering on
*Note*
This command only issues the instruction to power-on.
The result of the instruction can be checked by the "showlogs power".
XSCF> console -d 0
Console contents may be logged.
Connect to DomainID 0?[y|n] :y
POST Sequence 01 CPU Check
LSB#02 (XSB#01-0): POST 2.17.0 (2011/11/17 10:29)
POST Sequence 02 Banner
LSB#00 (XSB#00-0): POST 2.17.0 (2011/11/17 10:29)
POST Sequence 03 Fatal Check
POST Sequence 04 CPU Register
POST Sequence 05 STICK
POST Sequence 06 MMU
POST Sequence 07 Memory Initialize
POST Sequence 08 Memory
POST Sequence 09 Raw UE In Cache
POST Sequence 0A Floating Point Unit
POST Sequence 0B SC
POST Sequence 0C Cacheable Instruction
POST Sequence 0D Softint
POST Sequence 0E CPU Cross Call
POST Sequence 0F CMU-CH
POST Sequence 10 PCI-CH
POST Sequence 11 Master Device
POST Sequence 12 DSCP
POST Sequence 13 SC Check Before STICK Diag
POST Sequence 14 STICK Stop
POST Sequence 15 STICK Start
POST Sequence 16 Error CPU Check
POST Sequence 17 System Configuration
POST Sequence 18 System Status Check
POST Sequence 19 System Status Check After Sync
POST Sequence 1A OpenBoot Start...
POST Sequence Complete.
SPARC Enterprise M5000 Server, using Domain console
Copyright (c) 1998, 2012, Oracle and/or its affiliates. All rights reserved.
Copyright (c) 2012, Oracle and/or its affiliates and Fujitsu Limited. All rights reserved.
OpenBoot 4.33.5.d, 65536 MB memory installed, Serial #102844532.
Ethernet address 0:10:e0:21:48:74, Host ID: 86214874.
Aborting auto-boot sequence.
{0} ok
root@m5kd0 # prtdiag -v
System Configuration: Oracle Corporation sun4u SPARC Enterprise M5000 Server
System clock frequency: 1012 MHz
Memory size: 65536 Megabytes
==================================== CPUs ====================================
CPU CPU Run L2$ CPU CPU
LSB Chip ID MHz MB Impl. Mask
--- ---- ---------------------------------------- ---- --- ----- ----
00 0 0, 1, 2, 3, 4, 5, 6, 7 2660 11.0 7 193
00 1 8, 9, 10, 11, 12, 13, 14, 15 2660 11.0 7 193
02 0 64, 65, 66, 67, 68, 69, 70, 71 2660 11.0 7 193
02 1 72, 73, 74, 75, 76, 77, 78, 79 2660 11.0 7 193
============================ Memory Configuration ============================
Memory Available Memory DIMM # of Mirror Interleave
LSB Group Size Status Size DIMMs Mode Factor
--- ------ ------------------ ------- ------ ----- ------- ----------
00 A 16384MB okay 4096MB 4 no 2-way
00 B 16384MB okay 4096MB 4 no 2-way
02 A 16384MB okay 4096MB 4 no 2-way
02 B 16384MB okay 4096MB 4 no 2-way
picl_initialize failed: Daemon not responding
==================== Hardware Revisions ====================
System PROM revisions:
----------------------
OBP 4.33.5.d 2012/07/18 06:55
=================== Environmental Status ===================
Mode switch is in LOCK mode
picl_initialize failed: Daemon not responding
En la primer salida del showhardconf, vemos que un dimm de memoria esta con status DEGRADED, y con un * asterisco al lado.
* MEM#2A Status:Degraded;
+ Code:ce0000000000000001M3 93T5160FBA-CE6 4146-2344047f;
+ Type:4B; Size:4 GB;
Eso significa que el dimm de memoria esta en falla.
Desde la XSCF ejecutamos el comando snapshot -L F -t user@milinux:/tmp
esto me genero un archivo .zip en mi linux, que al descomprimirlo es similar al explorer de solaris.
Dentro del archivo @scf@log@monitor.log se ve claramente la posicion del dimm de memoria en falla
Feb 15 06:26:37 m5k Warning: /MBU_B/MEMB#4/MEM#2A:DOMAIN:DIMM permanent correctable error
para poder correr este comando se debe tener coneccion de red, lo que hicimos fue conectar un cable cruzado entre el servidor y mi linux que oficiaba
de terminal (con minicom) configuramos una ip en el server y otro en mi linux, y enviamos la salida del snapshot al linux mio.