Dear AMMRL Members,
This is a robust discussion group. Many thanks to Jack Howarth, Patrick
Wheeler, Rich Shoemaker, Christopher Robosky, Wei Wycoff, Tom Kalisker,
Charles G. Fry, Kirk Marat, Martha Morton, Guillermo Mendoza-Díaz, Ron
Nieman, Sara Basiaga, Kimberly Yach, Brian Myers, Gary, Michael Strain,
Pat Hays, Jaroslav Zajicek, Robert Harker, and Jeff Simpson.
--> The most straightforward solution, given by Jack Howarth, is,
based on his experience, to upgrade to VnmrJ 1.1C or 1.1D, and stick with
the new Java interface.
Obs - In a couple of emails, there are references to similar problems also
occurring with VnmrJ 1.1C/1.1D, but no indication whether VnmrJ is being
used in the "Classic" mode or not.
Apparently, according to some of the answers, Varian is working on a patch
for this problem.
Now it is clear to me that this is an extensive occurrence of what seems
to be a bug in 6.1C, although not formally acknowledge by Varian. If
indeed an upgrade to VnmrJ would solve the problem, then Varian
together with its users must find a way to facilitate the acquisition of
this new version.
There are a couple of things worth trying:
1) Check the 5V power supply to the digital card cage and adjust it
(according procedure described in email #5),
2) Wherever possible, regress Vnmr 6.1C to patch 108 or earlier (this
seems virtually impossible for those with cold probes).
General: As pointed out in email#20 there is relatively recent VNMR News
that informs that similar communication problem has been traced down to
the Motorola Communications board (for Mercury's only).
Once more, my sincere thanks to all of you.
Cheers,
Carlos
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Carlos R. N. Pacheco, Ph.D. - NMR Spectroscopist
Princeton University/Chemistry
Frick Lab, Washington Road
Princeton, NJ 08544-1009
Phone: (609)258-1633; Fax: (609)258-6746
E-mail: cpacheco_at_Princeton.edu; http://www.princeton.edu/~nmr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--------------------------------------------------------------
1) Which version of VNMR are you using? I noticed that both VNMR 6.1C
(after applying the 20x series of patches) and the classic VNMR interface
of VNMR J 1.1C/1.1D exhibited this problem. VNMR 6.1C with patch level 108
or earlier didn't seem to have this problem. As I see it there are two
solutions...
1) Regress your machine to VNMR 6.1C with patch 108
2) Use the VNMR J interface of VNMR J 1.1C or 1.1D (NOT the classic VNMR
interface).
That has been my experience anyway.
Jack
ps We have Inova consoles from around 1996 which have the 680x0 based
CPUs.
---------------------------------------
2) We've certainly encountered apparently random "Inactive" events on both
of our Varian Inova systems, one running Vnmr 6.1C and the other VnmrJ
1.1D. On the older system, we seem to find it most often when conducting
operations with the Gilson attached for VAST usage. We've never been able
to determine a real root cause.
Usually, it can be fixed with "su acqproc", but occasionally a console
reset is required.
---------------------------------------
3) Although Varian has not formally acknowledged this problem, we have had
intermittent problems like you describe, with the console going inactive
for no apparent reason. It's not related to logging in/out on our end,
but it has happened in the middle of long experiments (i.e. 2D and 3D
experiments). In all cases, the systems are running VNMR 6.1C with a 20X
(i.e. 203 or 205) patch. The patches above 200 add VNMRJ functionality to
VNMR 6.1C, and nearly convert VNMR to VNMRJ without the new Java
interface.
Others have reported similar situations when running VNMRJ 1.1C or 1.1D in
"Classic" mode, yet their problems seemed to go away when they ran the
instruments using the newer Java interface.
We had to upgrade to the 203-205 patches with our Inova 600 when we
installed the cryogenic (cold) probe, as most of the items in these
patches relate to the operation of these probes. We started getting
several strange errors in the console after this, including the mysterious
"Acquisition Error Code 900" and the spontaneous "Inactive" problems that
you describe.
If you look at the archives over the last couple of months, you'll see
that this has been discussed (Jack Howarth at the U. of Cincinnatti
Medical School initiated this discussion, if my memory is accurate).
Jack seemed to believe that if the instruments were operated in the full
VNMRJ mode, the "inactive" problem seems to go away.
This can be very frustrating, but you're not alone.
--------------------------------------------
4) Varian is having problems with acqproc hangups. They are working on a
patch. I beta tested a patch for them a while back, but it only fixed one
of the two 600s that I have that is hanging up. I remember that Jack
Howarth at the University of Cincinnati is also experiencing this hang up.
---------------------------------------
5) Our Inova 600 has the same kind of problem. When it happens we hear a
non-stop beeping sound with a message in the terminal window warning us
about "automation card has not booted in console". It will not affect a
running long experiment, however, after the experiment is finished we have
no connection to the console. We have to do "su acqproc", reset the
console's CCU board, and "su acqproc" again to reboot. After that "su"
will be needed to get everything back to work. This happens about once a
week. I sent email via Varian's website and got some suggestions from
them. Please see below, 1 and 2 describe how to measure the 5V supply to
the digital card cage. 3 describes how to adjust the 5V power supply.
Our measurement was 5.04-5.05 V, which is high, but we have not done any
adjustment yet to see if it solves the problem.
Varian's Suggestions (in a few email replies):
1. It sounds more like communication issue between the console and the Sun
the MSR board somehow lost communication. One recommendation is to check
the 5V supply to the digital card cage. Try open up the left side panel to
the console to expose the digital card cage. Loosen the two screws on the
digital back plane. Use a voltmeter to measure the 5V at the back plane.
It should be in the range of 5.00 +/- .02. Otherwise, we should look into
reason of dropping communications from cabling, hardware (MSR board) to
s/w.
2. The digital back plane is the vertical back plane behind the digital
card cage where the digital boards plug into. You will need to open up the
side panel (there is a screw at the top middle portion that holds the
panel) to have access to the digital card cage. To access the digital back
plane, loosen two thumb screws above the cooling fans. That will drop down
and expose the digital back plane. You will be able to check the voltage
(red and black wire) at the top panel. The voltage should be 5.00 +/- 0.01
Volts.
The monitor ports at the front of the system power supply shows the
voltage out of the individual power supply. There will be voltage drop. It
is essential that one should measure the voltage at the back plane.
3. It is certainly one area that we would like to set accordingly. The
problem is that you need to adjust the 5V power supply in the system power
supply. That translate into removing the system power supply out of the
console; remove the cover; reconnect everything; find the 5V power supply
(PS201); locate the voltage adjustment pot; adjust the pot while
monitoring the 5V at the back of the digital backplane.
In the earlier Inova consoles, we found that the digital boards are very
sensitive to the 5V line. That is why we put a fairly strict range of the
5V.
I forgot to mention that our Inova 600 runs Solaris 9/VNMRJ1.1C. It was
installed in 2000.
---------------------------------------
6) We have 15 Varian systems here. I occasionally have to due resets but
one only one system is it a regularly reoccurring issue.
One of our Inova 500's (that was upgraded from U+) shows a problem when
the acqi display in lock mode is open for more then 208 seconds. When this
happens the lock fails to update regularly in the acqi window, the system
then freezes up and an abortallacqs has to be issued. This problem has
been shown to be caused by the vnmrs patches. On an unpatched version of
Vnmr the problem goes away.
I know this sound silly but is it possible that there is any static
discharge occurring when the user sits down that is somehow transferring
to the console. This may also occur when a sample is placed in the
magnet. We had one user who wore nylons with a wool dress and this was a
reoccurring issue.
You may be able to pinpoint what is happening to the communication between
the host and the acq. computer. depending on the second Ethernet port(ex.
le1) issue a following as root:
snoop -d hme1
this shows that packet information between the SUN and the INOVA. If the
communication is down as indicated by the inactive status you will not be
able to ping inova or wormhole or inovaauto.
Running a snoop in a shelltool may give some insight as to what is
happening.
On a normal system with just the acqstat window open and otherwise idle
there should be some communication every 5 seconds as the lock level,
spinner speed, VT Temp, etc are updated into the acqstatus window.
It may also help to setup one of the afflicted systems as a standalone and
see if the problem goes away when isolated from the building eternet.
---------------------------------------
7) We have to assume that you are using vnmr6.1c with up-to-date patches
on Sun workstations. You definitely should update the group if this is
not the case. It would help people out there to know what kind of Sun
workstations you have, as that may be effecting the problem.
We see similar problems on both our INOVA spectrometers (both vnmr6.1c
patched on Sun Ultra1 workstations), but with only occasional frequency
(maybe a couple times a month).
We did see considerable problems with this a while ago with our INOVA-600;
it took a complete replacement of the back-plane to reduce the frequency
of the problems. We went through a _lot_ of board swaps before Varian
finally decided the back-plane had to be swapped. Damn good thing we were
still under warranty. It was coupled with a problem with our DSP, so not
completely straightforward that your problem is related.
Will be very interested in what you find out, as even at a couple times a
month, this is a serious problem for us.
---------------------------------------
8) We have similar problems with our INOVA 600. Frequently we require
console re-boots or the use of the "setacq" program. Varian have not been
helpful in finding the problem.
I should not that we DO NOT have any similar problems with our Bruker
Avance system.
---------------------------------------
9) I have the same problem with a INOVA 600 that was installed in
February, 2004. People only use one account, which is always open. The
problem happens 1-2/per week. I haven't investigated the cause. With 4
other instruments, this is only a nuisance.
---------------------------------------
10) Some time ago I had similar problem with our Gemmini 2000, but it
becomes very common when we change the workstation to a sunBlade, and the
operate system was update to 7.
I notice that if I control the lockpower, keeping below 13 in this
machine, I stop to have the hangings. Also I used to check several times
the stability of the system by periodic mantineance using the fsck
utility.
------------------------------------------
11) We see this as well, mostly with our Inova-500. We cannot isolate the
problem.
------------------------------------------
12) I had a similar problem on my Gemini 2000 awhile back. It probably
isn't the same cause as yours but I was similarly frustrated. My console
ethernet card uses BNC, but my Sun ethernet card was RJ45. I had to link
them together using a hub, and it turned out that my hub had gone bad.
If you're using a hub, it may be acting up. My other suspicion would be
the workstation's ethernet card itself, if there is no hub.
------------------------------------------
13) Are you receiving any Broadcast message in the terminal window at the
time it is inactive? I made Varian aware of a communication problem with
the automation board, since it was happening to me 2 times a month. They
claim they have seen it before on other INOVAs and have not been able to
pin-point it to fix the problem. If you get a message in the terminal
saying you need to reboot, that may be the same situation I have.
---------------------------------------
14) I was losing connectivity on my new Mercury 200 console. It turns out
there was known problem with the CPU board that was shipped--disconnects
are said to be erratic with the "bad" CPU board. We also had a problem
where one of the jumpers was not set properly at the factory on the ADC
board and this was causing disconnection every time there was an ADC
overflow.
---------------------------------------
15) You have probably already received a zillion responses.
As far as I know, and I just talked with Varian about it a few weeks ago,
it results from a bug in the VNMR software. Last I heard, they were not
able to track it down, although I wonder if they are just not trying, and
hoping that we will all switch to VNMRJ.
Anyway, the only thing you can do is recycle "su acqproc" and perhaps
reboot the MSR or CPU boards.
---------------------------------------
16) It happens all the time on our three Inova systems
300 1995-vintage, VNMR 6.1b
500 2001-vintage, VNMR 6.1c
600 1996-vintage, VNMR 6.1c
We have the reset procedure posted prominently next to the consoles. The
600, which has the least number of logins, seems to have the fewest
console crashes... which would seem to support your observation.
---------------------------------------
17) My Mercury 400 did that a lot after installation, and Varian said it
was a console computer. A patch was needed and a board replaced.
Apparently the manufacturer of the board changed specs on a chip and thus
sets up situations where the Inactive Bug surfaces. You cannot just
download the patch since it is made to go with the board replacement. I
was not aware that Inova could have this issue as well. We have an Inova
600 (since Jan 2004) and have had no problems.
Varian will ask you is the system is "killed" or started when you type "su
acqproc" in a terminal. That is very diagnositic in trouble shooting this
problem.
---------------------------------------
18) We are experiencing the same problem with our Inova 500. It seems, at
least in our case, that the acquisition window becomes inactive if it is
left open for an "extended" time period, when a user changes a sample or
adjusts shims. The communication between Sun host computer and acquisition
computer is lost. The only solution to this problem is to reboot the
acquisition computer by pressing the reset button and after booting
process is finished one must issue the su acqproc command within UNIX
shell. We informed Varian about this problem but so far no solution has
been found (in our case). We also run Solaris 8/VNMR 6.1C and installed
the latest patches (205).
---------------------------------------
19) Most everyone in my estimation, using Varian NMR experiences similar
problems at one time or another. This is from our experiences with VXR,
Gemini, Mercury, Mercury VX Plus and Inova. Every one of the instruments
has this behavior sometimes.
There are a host of hardware difficulties that can cause it. MSR can be
the cause on the Inova and the CPU card on the Mercury as well as the spin
controller. We are assuming you are using a fully patched software
version. I personally believe there are people who are more prone to
causing inactivity especially after we had problems with one instrument
that always occurred after a certain user. I may be something to do with
him moving quickly between experiments and taxing the resources of the
host computer
etc.
Also complete cold reboot can help instead of the reset button can help
because this resets all cards to defaults. We used to have a Mercury that
would ONLY reset this way.
Because there are no air filters on the Mercury systems they are likely to
have dust on all components that periodically would need cleaning.
Remember to: su to root then run /vnmr/bin/setacq after removing cards.
Even with filters, cards can still accumulate dust which changes the RF
characteristics of the devices. This can cause intermittency especially
in high speed logic not to mention the high power RF sections.
---------------------------------------
20) How old is your spectrometer? Have you swapped out the (Motorola?)
communications board in your system?
See the following:
ACQUISITION COMMUNICATION PROBLEMS ON MERCURY-Vx AND MERCURYplus SYSTEMS:
During the lifetime of a spectrometer, there is some degree of continuous
hardware evolution, as technology evolves, and sometimes also due to parts
obsolescence. Some of this evolution is apparent to the user (e.g., when
on the UNITY INOVA we changed the I.F. from 10.5 MHz to 20 MHz, see
Varian NMR News 2003-02-14), in other cases such changes are handled as
"board revision levels", i.e., they can only be seen when comparing board
labels or part numbers. One such "silent upgrade" happened with the
acquisition CPU on MERCURY-Vx and MERCURYplus systems:
- The original MERCURY-Vx shipped with a Motorola MVME162-222 CPU, VARIAN,
p/n 01-905942-03 at some point (about 2 years ago) we switched to a newer
version of that CPU, called MVME162P-242E (Varian p/n 01-905942-01) more
recently, we switched to yet a newer model, called MVME162PA-252SE (Varian
p/n 01-905942-00)
These changes were meant to be totally transparent to the user, i.e.,
based on the specification of the board (and internal testing) we did not
expect any functional changes from these upgrades. Yet, SOME MERCURY-Vx
and MERCURYplus systems experience random failures due to an apparent
acquisition communications failure, indicated by "Acqstat" suddenly
showing "inactive", requiring the acquisition CPU to be rebooted, and
Expproc to be restarted with "su acqproc". Such failures can occur at
random intervals and are of course very disruptive.
With Motorola's help we were able to trace this down to an acquisition
computer compatibility issue on systems with the MVME162P-242E board. That
CPU uses a chip named "PETRA-1" which seems to cause these hardware
failures. The original CPU (MVME162-222) did not have that chip, and the
newest model original CPU (MVME162-222) did not have that chip, and the
newest model (MVME162PA-252SE) uses a chip named "PETRA-2". Both these
CPUs do NOT cause such failures - it only happens with some systems that
use the MVME162P-242E acquisition CPU.
The fix for this problem consists of two elements: for one, you must
install the latest patch, 6.1CallSOLmvx202 for MERCURY-Vx, and
6.1CallSOLmpl202 for MERCURYplus systems. This patch makes the system more
reliable, but does NOT fix the issue completely. The second step involves
upgrading the MVME162P-242E CPU to the PETRA-2 chip. This upgrade is done
by Motorola (as the upgrade involves changing several other components as
well), and the revised board then is named MVME162PA-242E (Varian p/n
01-905942-02).
If your MERCURY-Vx or MERCURYplus exhibits such acquisition "communication
failures" and if it is equipped with a MVME162P-242E CPU, then please
contact your local service office and refer them to service bulletin
MP200317.
The acquisition CPU is the leftmost board in the digital card cage in the
rear of the MERCURY console. There is a name tag on the upper "handle"
(board extractor) of the board. In the case of the faulty board that name
tag shows "VME162P-242E". We will handle these board upgrades as quickly
as possible. Note that neither the VNMR patch nor the board upgrade ALONE
really fix the problem - you need BOTH the board upgrade AND the VNMR
"202" patch for a complete fix. page at
http://www.varianinc.com/products/nmr/software/patches/ and install the
patch using the current version of "patchinstall" that can be downloaded
from the same page. No special precautions are required for the
installation of these patches (even an active acquisition would not be
affected).
Received on Wed Nov 10 2004 - 18:17:24 MST