Troubleshooting

This chapter provides some of the GlusterFS troubleshooting methods.

Identifying locked file and clear locks

You can use the statedump command to list the locks held on files. The statedump output also provides information on each lock with its range, basename, and PID of the application holding the lock, and so on. You can analyze the output to find the locks whose owner/application is no longer running or interested in that lock. After ensuring that no application is using the file, you can clear the lock using the following clear-locks command:

# gluster volume clear-locks VOLNAME path kind {blocked | granted | all}{inode range | entry basename | posix range}

For more information on performing statedump, see Performing Statedump on a Volume

To identify locked file and clear locks

  1. Perform statedump on the volume to view the files that are locked using the following command:

    # gluster volume statedump VOLNAME

    For example, to display statedump of test-volume:

    # gluster volume statedump test-volume
    Volume statedump successful

    The statedump files are created on the brick servers in the` /tmp` directory or in the directory set using the server.statedump-path volume option. The naming convention of the dump file is brick-path.brick-pid.dump.

  2. Clear the entry lock using the following command:

    # gluster volume clear-locks VOLNAME path kind granted entry basename

    The following are the sample contents of the statedump file indicating entry lock (entrylk). Ensure that those are stale locks and no resources own them.

    [xlator.features.locks.vol-locks.inode]
    path=/
    mandatory=0
    entrylk-count=1
    lock-dump.domain.domain=vol-replicate-0
    xlator.feature.locks.lock-dump.domain.entrylk.entrylk[0](ACTIVE)=type=ENTRYLK_WRLCK on basename=file1, pid = 714782904, owner=ffffff2a3c7f0000, transport=0x20e0670, , granted at Mon Feb 27 16:01:01 2012
    
    conn.2.bound_xl./rhgs/brick1.hashsize=14057
    conn.2.bound_xl./rhgs/brick1.name=/gfs/brick1/inode
    conn.2.bound_xl./rhgs/brick1.lru_limit=16384
    conn.2.bound_xl./rhgs/brick1.active_size=2
    conn.2.bound_xl./rhgs/brick1.lru_size=0
    conn.2.bound_xl./rhgs/brick1.purge_size=0

    For example, to clear the entry lock on file1 of test-volume:

    # gluster volume clear-locks test-volume / kind granted entry file1
    Volume clear-locks successful
    test-volume-locks: entry blocked locks=0 granted locks=1
  3. Clear the inode lock using the following command:

    # gluster volume clear-locks VOLNAME path kind granted inode range

    The following are the sample contents of the statedump file indicating there is an inode lock (inodelk). Ensure that those are stale locks and no resources own them.

    [conn.2.bound_xl./rhgs/brick1.active.1]
    gfid=538a3d4a-01b0-4d03-9dc9-843cd8704d07
    nlookup=1
    ref=2
    ia_type=1
    [xlator.features.locks.vol-locks.inode]
    path=/file1
    mandatory=0
    inodelk-count=1
    lock-dump.domain.domain=vol-replicate-0
    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 714787072, owner=00ffff2a3c7f0000, transport=0x20e0670, , granted at Mon Feb 27 16:01:01 2012

    For example, to clear the inode lock on file1 of test-volume:

    # gluster  volume clear-locks test-volume /file1 kind granted inode 0,0-0
    Volume clear-locks successful
    test-volume-locks: inode blocked locks=0 granted locks=1
  4. Clear the granted POSIX lock using the following command:

    # gluster volume clear-locks VOLNAME path kind granted posix range

    The following are the sample contents of the statedump file indicating there is a granted POSIX lock. Ensure that those are stale locks and no resources own them.

    xlator.features.locks.vol1-locks.inode]
    path=/file1
    mandatory=0
    posixlk-count=15
    posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=8, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at Mon Feb 27 16:01:01 2012
    , granted at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=7, len=1, pid = 1, owner=30404152462d436c-69656e7431, transport=0x11eb4f0, , granted at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=8, len=1, pid = 1, owner=30404152462d436c-69656e7431, transport=0x11eb4f0, , blocked at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[3](ACTIVE)=type=WRITE, whence=0, start=6, len=1, pid = 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , granted at Mon Feb 27 16:01:01 2012
    ...

    For example, to clear the granted POSIX lock on file1 of test-volume:

    # gluster volume clear-locks test-volume /file1 kind granted posix 0,8-1
    Volume clear-locks successful
    test-volume-locks: posix blocked locks=0 granted locks=1
    test-volume-locks: posix blocked locks=0 granted locks=1
    test-volume-locks: posix blocked locks=0 granted locks=1
  5. Clear the blocked POSIX lock using the following command:

    # gluster volume clear-locks VOLNAME path kind blocked posix range

    The following are the sample contents of the statedump file indicating there is a blocked POSIX lock. Ensure that those are stale locks and no resources own them.

    [xlator.features.locks.vol1-locks.inode]
    path=/file1
    mandatory=0
    posixlk-count=30
    posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at Mon Feb 27 16:01:01 2012
    , granted at Mon Feb 27 16:01:01
    
    posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012
    
    ...

    For example, to clear the blocked POSIX lock on file1 of test-volume:

    # gluster volume clear-locks test-volume /file1 kind blocked posix 0,0-1
    Volume clear-locks successful
    test-volume-locks: posix blocked locks=28 granted locks=0
    test-volume-locks: posix blocked locks=1 granted locks=0
    No locks cleared.
  6. Clear all POSIX locks using the following command:

    # gluster volume clear-locks VOLNAME path kind all posix range

    The following are the sample contents of the statedump file indicating that there are POSIX locks. Ensure that those are stale locks and no resources own them.

    [xlator.features.locks.vol1-locks.inode]
    path=/file1
    mandatory=0
    posixlk-count=11
    posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=8, len=1, pid = 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , blocked at Mon Feb 27 16:01:01 2012
    , granted at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=0, len=1, pid = 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , granted at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[2](ACTIVE)=type=WRITE, whence=0, start=7, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , granted at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[3](ACTIVE)=type=WRITE, whence=0, start=6, len=1, pid = 1, owner=30404152462d436c-69656e7431, transport=0x11eb4f0, , granted at Mon Feb 27 16:01:01 2012
    
    posixlk.posixlk[4](BLOCKED)=type=WRITE, whence=0, start=8, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at Mon Feb 27 16:01:01 2012
    ...

    For example, to clear all POSIX locks on file1 of test-volume:

    # gluster volume clear-locks test-volume /file1 kind all posix 0,0-1
    Volume clear-locks successful
    test-volume-locks: posix blocked locks=1 granted locks=0
    No locks cleared.
    test-volume-locks: posix blocked locks=4 granted locks=1

You can perform statedump on test-volume again to verify that all the above locks are cleared.

Retrieving File Path from the Gluster Volume

The heal info command lists the GFIDs of the files that needs to be healed. If you want to find the path of the files associated with the GFIDs, use the getfattr utility. The getfattr utility enables you to locate a file residing on a gluster volume brick. You can retrieve the path of a file even if the filename is unknown.

Retrieving Known File Name

To retrieve a file path when the file name is known, execute the following command in the Fuse mount directory:

# getfattr -n trusted.glusterfs.pathinfo -e text <path_to_fuse_mount/filename>

Where,

path_to_fuse_mount: The fuse mount where the gluster volume is mounted.

filename: The name of the file for which the path information is to be retrieved.

For example:

# getfattr  -n trusted.glusterfs.pathinfo -e text /mnt/fuse_mnt/File1
getfattr: Removing leading '/' from absolute path names
# file: mnt/fuse_mnt/File1
trusted.glusterfs.pathinfo="(<DISTRIBUTE:testvol-dht> (<REPLICATE:testvol-replicate-0>
<POSIX(/rhgs/brick1):tuxpad:/rhgs/brick1/File1>
<POSIX(/rhgs/brick2):tuxpad:/rhgs/brick2/File1>))"

The command output displays the brick pathinfo under the <POSIX> tag. In this example output, two paths are displayed as the file is replicated twice and resides on a two-way replicated volume.

Retrieving Unknown File Name

You can retrieve the file path of an unknown file using its gfid string. The gfid string is the hyphenated version of the trusted.gfid attribute. For example, if the gfid is 80b0b1642ea4478ba4cda9f76c1e6efd, then the gfid string will be 80b0b164-2ea4-478b-a4cd-a9f76c1e6efd.

Note

To obtain the gfid of a file, run the following command:

# getfattr -d -m. -e hex /path/to/file/on/the/brick

Retrieving File Path using gfid String

To retrieve the file path using the gfid string, follow these steps:

  1. Fuse mount the volume with the aux-gfid option enabled.

    # mount -t glusterfs -o aux-gfid-mount hostname:volume-name  <path_to_fuse_mnt>

    Where,

    path_to_fuse_mount: The fuse mount where the gluster volume is mounted.

    For example:

    # mount -t glusterfs -o aux-gfid-mount 127.0.0.2:testvol /mnt/aux_mount
  2. After mounting the volume, execute the following command

    # getfattr -n trusted.glusterfs.pathinfo -e text <path-to-fuse-mnt>/.gfid/<GFID string>

    Where,

    path_to_fuse_mount: The fuse mount where the gluster volume is mounted.

    GFID string: The GFID string.

    For example:

    # getfattr -n trusted.glusterfs.pathinfo -e text /mnt/aux_mount/.gfid/80b0b164-2ea4-478b-a4cd-a9f76c1e6efd
    getfattr: Removing leading '/' from absolute path names
    # file: mnt/aux_mount/.gfid/80b0b164-2ea4-478b-a4cd-a9f76c1e6efd trusted.glusterfs.pathinfo="(<DISTRIBUTE:testvol-dht> (<REPLICATE:testvol-replicate-0> <POSIX(/rhgs/brick2):tuxpad:/rhgs/brick2/File1> <POSIX(/rhgs/brick1):tuxpad:/rhgs/brick1/File1>))

    The command output displays the brick pathinfo under the <POSIX> tag. In this example output, two paths are displayed as the file is replicated twice and resides on a two-way replicated volume.

results matching ""

    No results matching ""