AMMRL: Varian Unity console problem

From: Letitia Yao <letitia_at_nmr.chem.umn.edu>
Date: Thu, 9 Jan 2014 15:54:26 -0600 (CST)

We have an old Unity console, connected to a Sun Ultra 10 running Solaris 8/Vnmr
6.1C.  It has always been cranky when starting up after shutting off power--we
sometimes get it back by switching the cables to another diff box, trying there
and failing, then putting them back on the original one and starting up again. 
Nonsense, of course, but it has worked several times in the past.  Yesterday
after shutting down to replace a fan, nothing will bring it back.  The typical
unix messages are:

14:45:50  got SCSI bus reset
14:45:50 genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device;
service still available
14:45:50 genunix: [ID 611667 kern.info] NOTICE: glm0: got SCSI bus reset
14:46:21 scsi: [ID 365881 kern.info] /pci_at_1f,0/pci_at_1/scsi_at_3 (glm0):
14:46:21  Cmd (0x70793dc0) dump for Target 3 Lun 0:
14:46:21 scsi: [ID 365881 kern.info] /pci_at_1f,0/pci_at_1/scsi_at_3 (glm0):
14:46:21          cdb=[ 0xa 0x1 0x0 0x0 0x1 0x0 ]
14:46:21 scsi: [ID 365881 kern.info] /pci_at_1f,0/pci_at_1/scsi_at_3 (glm0):
14:46:21  pkt_flags=0x0 pkt_statistics=0x60 pkt_state=0x7
14:46:21 scsi: [ID 365881 kern.info] /pci_at_1f,0/pci_at_1/scsi_at_3 (glm0):
14:46:21  pkt_scbp=0x0 cmd_flags=0x1860
14:46:21 scsi: [ID 107833 kern.warning] WARNING: /pci_at_1f,0/pci_at_1/scsi_at_3 (glm0):
14:46:21  Connected command timeout for Target 3.0
14:46:21 genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device;
service still available
14:46:21 genunix: [ID 611667 kern.info] NOTICE: glm0: Connected command timeout
for Target 3.0
14:46:21 glm: [ID 280919 kern.warning] WARNING: ID[SUNWpd.glm.cmd_timeout.6017]
14:46:21 scsi: [ID 107833 kern.warning] WARNING: /pci_at_1f,0/pci_at_1/scsi_at_3 (glm0):
14:46:21  got SCSI bus reset

I have tried replacing the diff box (I have three spares), replacing the HAL
board (also three spares), replacing cables to computer and to console,
rebooting, power cycling in various orders, running setacq when needed, all to no
avail.  The computer has no problem talking over the scsi bus to an old disk
drive I tried connecting.  Are any other boards likely to be causing problems
with communication?  I can't say for certain that the other diff boxes and the
other HAL boards are actually working, since they are not in running systems, but
they all behave similarly, and it would be strange if all of them were bad.

Running su acqproc to start Acqproc is silent, but after a few seconds starts
giving warnings like the above in messages.  Running su acqproc to stop it fails
to kill Acqproc, but resetting the console after the failure does kill it.

Any suggestions (aside from joining the 21st century)?

Steve Philson
University of Minnesota
Received on Thu Jan 09 2014 - 11:54:23 MST

This archive was generated by hypermail 2.4.0 : Sun Jun 18 2023 - 17:15:27 MST