SCSI bus not terminated | SCSI errors appear in the log file | Each SCSI bus must be terminated only
at the beginning and end of the bus. Depending on the bus
configuration, it might be necessary to enable or disable
termination in host bus adapters, RAID controllers, and
storage enclosures. To support hot plugging, external
termination is required to terminate a SCSI bus. | In addition, be sure that no devices are connected to
a SCSI bus using a stub that is longer than 0.1
meter. | Refer to Section 2.4.4 Configuring Shared Disk Storage and Section D.3 SCSI Bus Termination for information about
terminating different types of SCSI buses. |
|
SCSI bus length greater than maximum limit | SCSI errors appear in the log file | Each type of SCSI bus must adhere to
restrictions on length, as described in Section D.4 SCSI Bus Length. | In addition, ensure that no single-ended devices are
connected to the LVD SCSI bus, because this causes the
entire bus to revert to a single-ended bus, which has more
severe length restrictions than a differential bus. |
|
SCSI identification numbers not unique | SCSI errors appear in the log file | Each device on a SCSI bus must have a unique identification
number. Refer to Section D.5 SCSI Identification Numbers for more
information. |
SCSI commands timing out before completion | SCSI errors appear in the log file | The prioritized arbitration scheme on
a SCSI bus can result in low-priority devices being locked
out for some period of time. This may cause commands to time
out, if a low-priority storage device, such as a disk, is
unable to win arbitration and complete a command that a host
has queued to it. For some workloads, this problem can be
avoided by assigning low-priority SCSI identification
numbers to the host bus adapters. | Refer to Section D.5 SCSI Identification Numbers for more
information. |
|
Mounted quorum partition | Messages indicating checksum errors on a quorum partition
appear in the log file | Be sure that the quorum partition raw
devices are used only for cluster state information. They
cannot be used for cluster services or for non-cluster
purposes, and cannot contain a file system. Refer to Section 2.4.4.3 Configuring Shared Cluster Partitions for more information. | These messages could also indicate that the underlying
block device special file for the quorum partition has been
erroneously used for non-cluster
purposes. |
|
Service file system is unclean | A disabled service cannot be enabled | Manually run a checking program such
as fsck. Then, enable the
service. | Note that the cluster infrastructure does by default
run fsck with the -p
option to automatically repair file system
inconsistencies. For particularly egregious error types, you
may be required to manually initiate file system repair
options. |
|
Quorum partitions not set up correctly | Messages indicating that a quorum partition cannot be
accessed appear in the log file | Run the /sbin/shutil -t command to check
that the quorum partitions are accessible. If the command
succeeds, run the shutil -p command
on both cluster systems. If the output is different on the
systems, the quorum partitions do not point to the same
devices on both systems. Check to make sure that the raw
devices exist and are correctly specified in the
/etc/sysconfig/rawdevices file. Refer to
Section 2.4.4.3 Configuring Shared Cluster Partitions for more
information. |
Cluster service operation fails | Messages indicating the operation failed to appear on the
console or in the log file | There are many different reasons for the failure of a
service operation (for example, a service stop or start). To
help identify the cause of the problem, set the severity level
for the cluster daemons to DEBUG to log
descriptive messages. Then, retry the operation and examine the
log file. Refer to Section 8.6 Modifying Cluster Event Logging for more
information. |
Cluster service stop fails because a file system cannot be
unmounted | Messages indicating the operation failed appear on the
console or in the log file | Use the fuser and
ps commands to identify the processes that
are accessing the file system. Use the kill
command to stop the processes. Use the lsof -t
file_system command to
display the identification numbers for the processes that are
accessing the specified file system. If needed, pipe the
output to the kill command. | To avoid this problem, be sure that only
cluster-related processes can access shared storage data. In
addition, modify the service and enable forced unmount for
the file system. This enables the cluster service to unmount
a file system even if it is being accessed by an application
or user. |
|
Incorrect entry in the cluster database | Cluster operation is impaired | The Cluster Status Tool can be
used to examine and modify service configuration. The
Cluster Configuration Tool is used to modify
cluster parameters. |
Incorrect Ethernet heartbeat entry in the cluster database
or /etc/hosts file | Cluster status indicates that a Ethernet heartbeat channel
is OFFLINE even though the interface is
valid | Examine and modify the cluster
configuration by running the
Cluster Configuration Tool, as specified in
Section 8.4 Modifying the Cluster Configuration, and correct the
problem. | In addition, be sure to use the
ping command to send a packet to all
network interfaces used in the cluster. |
|
Loose cable connection to power switch | Power switch status using clufence
returns an error or hangs | Check the serial cable connection. |
Power switch serial port incorrectly specified in the
cluster database | Power switch status using clufence
indicates a problem | Examine the current settings and modify the cluster
configuration by running the
Cluster Configuration Tool, as specified in
Section 8.4 Modifying the Cluster Configuration, and correct the
problem. |