There is an important performance bug fix specific to NetBSD here,
which disable gfid2path by default. This features causes a huge
amount of different extended attributes to be created, and the
NetBSD implementation does not scale well with it.
In order to recover a server after the feature is disabled, stop
glusterfs daemones, disable extended attributes using extattrctl,
remove ${BRICK_ROOT}/.attribute/system/trusted.gfid2path.*
re-enable extended attributes and restart glusterfs.
From http://blog.gluster.org/2016/06/glusterfs-3-8-released/
Gluster.org announces the release of 3.8 on June 14, 2016, marking
a decade of active development.
The 3.8 release focuses on:
- containers with inclusion of Heketi
- hyperconvergence
- ecosystem integration
- protocol improvements with NFS Ganesha
Contributed features are marked with the supporting organizations.
Automatic conflict resolution, self-healing improvements (Facebook)
Synchronous Replication receives a major boost with features
contributed from Facebook. Multi-threaded self-healing makes
self-heal perform at a faster rate than before. Automatic
Conflict resolution ensures that conflicts due to network
partitions are handled without the need for administrative
intervention
NFSv4.1 (Ganesha) - protocol
Gluster's native NFSv3 server is disabled by default with this
release. Gluster's integration with NFS Ganesha provides NFS
v3, v4 and v4.1 accesses to data stored in Gluster volume.
BareOS - backup / data protection
Gluster 3.8 is ready for integration with BareOS 16.2. BareOS
16.2 leverages glusterfind for intelligently backing up objects
stored in a Gluster volume.
"Next generation" tiering and sharding - VM images
Sharding is now stable for VM image storage. Geo-replication
has been enhanced to integrate with sharding for offsite
backup/disaster recovery of VM images. Self-healing and data
tiering with sharding makes it an excellent candidate for
hyperconverged virtual machine image storage.
block device & iSCSI with LIO - containers
File backed block devices are usable from Gluster through iSCSI.
This release of Gluster integrates with tcmu-runner
[https://github.com/agrover/tcmu-runner] to access block devices
natively through libgfapi.
Heketi - containers, dynamic provisioning
Heketi provides the ability to dynamically provision Gluster
volumes without administrative intervention. Heketi can manage
multiple Gluster clusters and will be the cornerstone for
integration with Container and Storage as a Service management
ecosystems.
glusterfs-coreutils (Facebook) - containers
Native coreutils for Gluster developed by Facebook that uses
libgfapi to interact with gluster volumes. Useful for systems
and containers that do not have FUSE.
For more details, our release notes are included:
https://github.com/gluster/glusterfs/blob/release-3.8/doc/release-notes/3.8.0.md
The release of 3.8 also marks the end of life for GlusterFS 3.5,
there will no further updates for this version.
Complete list of changes since 3.7.1:
- doc: add 1233044, 1232179 in 3.7.2 release-notes
- features/bitrot: fix fd leak in truncate (stub)
- doc: add release notes for 3.7.2
- libgfchangelog: Fix crash in gf_changelog_process
- glusterd: Fix snapshot of a volume with geo-rep
- cluster/ec: Avoid parallel executions of the same state machine
- quota: fix double accounting with rename operation
- cluster/dht: Prevent use after free bug
- cluster/ec: Wind unlock fops at all cost
- glusterd: Buffer overflow causing crash for glusterd
- NFS-Ganesha: Automatically export vol that was exported before vol restart
- common-ha: cluster HA setup sometimes fails
- cluster/ec: Prevent double unwind
- quota/glusterd: porting to new logging framework.
- bitrot/glusterd: gluster volume set command for bitrot should not supported
- tests: fix spurious failure in bug-857330/xml.t
- features/bitrot: tuanble object signing waiting time value for bitrot
- quota: don't log error when disk quota exceeded
- protocol/client : porting log messages to new framework
- cluster/afr: Do not attempt entry self-heal if the last lookup on entry
failed on src
- changetimerecorder : port log messages to a new framework
- tier/volume set: Validate volume set option for tier
- glusterd/tier: glusterd crashed with detach-tier commit force
- rebalance,store,glusterd/glusterd: porting to new logging framework.
- libglusterfs: Enabling the fini() in cleanup_and_exit()
- sm/glusterd: Porting messages to new logging framework
- nfs: Authentication performance improvements
- common-ha: cluster HA setup sometimes fails
- glusterd: subvol_count value for replicate volume should be calculate
correctly
- common-ha : Clean up cib state completely
- NFS-Ganesha : Return correct return value
- glusterd: Porting messages to new logging framework.
- glusterd: Stop tcp/ip listeners during glusterd exit
- storage/posix: Handle MAKE_INODE_HANDLE failures
- cluster/ec: Prevent Null dereference in dht-rename
- doc: fix markdown formatting
- upcall: prevent busy loop in reaper thread
- protocol/server : port log messages to a new framework
- nfs.c nfs3.c: port log messages to a new framework
- logging: log "Stale filehandle" on the client as Debug
- snapshot/scheduler: Modified main() function to take arguments.
- tools/glusterfind: print message for good cases
- geo-rep: ignore symlink and harlink errors in geo-rep
- tools/glusterfind: ignoring deleted files
- spec/geo-rep: Add rsync as dependency for georeplication rpm
- features/changelog: Do htime setxattr without XATTR_REPLACE flag
- tools/glusterfind: Cleanup glusterfind dir after a volume delete
- tools/glusterfind: Cleanup session dir after delete
- geo-rep: Validate use_meta_volume option
- spec: correct the vendor string in spec file
- tools/glusterfind: Fix GFID to Path conversion for dir
- libglusterfs: update glfs-message header for reserved segments
- features/qemu-block: Don't unref root inode
- features/changelog: Avoid setattr fop logging during rename
- common-ha: handle long node names and node names with '-' and '.' in them
- features/marker : Pass along xdata to lower translator
- tools/glusterfind: verifying volume is online
- build: fix compiling on older distributions
- snapshot/scheduler: Handle OSError in os. callbacks
- snapshot/scheduler: Check if GCRON_TASKS exists before
- features/quota: Fix ref-leak
- tools/glusterfind: verifying volume presence
- stripe: fix use-after-free
- Upcall/cache-invalidation: Ignore fops with frame->root->client not set
- rpm: correct date and order of entries in the %changelog
- nfs: allocate and return the hashkey for the auth_cache_entry
- doc: add release notes for 3.7.1
- snapshot: Fix finding brick mount path logic
- glusterd/snapshot: Return correct errno in events of failure - PATCH 2
- rpc: call transport_unref only on non-NULL transport
- heal : Do not invoke glfs_fini for glfs-heal commands
- Changing log level from Warning to Debug
- features/shard: Handle symlinks appropriately in fops
- cluster/ec: EC_XATTR_DIRTY doesn't come in response
- worm: Let lock, zero xattrop calls succeed
- bitrot/glusterd: scrub option should be disabled once bitrot option is
reset
- glusterd/shared_storage: Provide a volume set option to create and mount
the shared storage
- dht: Add lookup-optimize configuration option for DHT
- glusterfs.spec.in: move libgf{db,changelog}.pc from -api-devel to -devel
- fuse: squash 64-bit inodes in readdirp when enable-ino32 is set
- glusterd: do not show pid of brick in volume status if brick is down.
- cluster/dht: fix incorrect dst subvol info in inode_ctx
- common-ha: fix race between setting grace and virt IP fail-over
- heal: Do not call glfs_fini in final builds
- dht/rebalance : Fixed rebalance failure
- cluster/dht: Fix dht_setxattr to follow files under migration
- meta: implement fsync(dir)
- socket: throttle only connected transport
- contrib/timer-wheel: fix deadlock in del_timer()
- snapshot/scheduler: Return proper error code in case of failure
- quota: retry connecting to quotad on ENOTCONN error
- features/quota: prevent statfs frame loss when an error happens during
ancestry
- features/quota : Make "quota-deem-statfs" option "on" by default, when
quota is enabled
- cluster/dht: pass a destination subvol to fop2 variants to avoid races.
- cli: Fix incorrect parse logic for volume heal commands
- glusterd: Bump op version and max op version for 3.7.2
- cluster/dht: Don't rely on linkto xattr to find destination subvol
- afr: honour selfheal enable/disable volume set options
- features/shard: Fix incorrect parameter to get_lowest_block()
- libglusterfs: Copy d_len and dict as well into dst dirent
- features/quota : Do unwind if postbuf is NULL
- cluster/ec: Fix incorrect check for iatt differences
- features/shard: Fix issue with readdir(p) fop
- glusterfs.spec.in: python-gluster should be 'noarch'
- glusterd: Bump op version and max op version for 3.7.1
- glusterd: fix repeated connection to nfssvc failed msgs
Bitrot detection is a technique used to identify an ?insidious?
type of disk error where data is silently corrupted with no indication
from the disk to the storage software layer that an error has
occurred. When bitrot detection is enabled on a volume, gluster
performs signing of all files/objects in the volume and scrubs data
periodically for signature verification. All anomalies observed
will be noted in log files.
* Multi threaded epoll for performance improvements
Gluster 3.7 introduces multiple threads to dequeue and process more
requests from epoll queues. This improves performance by processing
more I/O requests. Workloads that involve read/write operations on
a lot of small files can benefit from this enhancement.
* Volume Tiering [Experimental]
Policy based tiering for placement of files. This feature will serve
as a foundational piece for building support for data classification.
Volume Tiering is marked as an experimental feature for this release.
It is expected to be fully supported in a 3.7.x minor release.
Trashcan
This feature will enable administrators to temporarily store deleted
files from Gluster volumes for a specified time period.
* Efficient Object Count and Inode Quota Support
This improvement enables an easy mechanism to retrieve the number
of objects per directory or volume. Count of objects/files within
a directory hierarchy is stored as an extended attribute of a
directory. The extended attribute can be queried to retrieve the
count.
This feature has been utilized to add support for inode quotas.
* Pro-active Self healing for Erasure Coding
Gluster 3.7 adds pro-active self healing support for erasure coded
volumes.
* Exports and Netgroups Authentication for NFS
This feature adds Linux-style exports & netgroups authentication
to the native NFS server. This enables administrators to restrict
access to specific clients & netgroups for volume/sub-directory
NFSv3 exports.
* GlusterFind
GlusterFind is a new tool that provides a mechanism to monitor data
events within a volume. Detection of events like modified files is
made easier without having to traverse the entire volume.
* Rebalance Performance Improvements
Rebalance and remove brick operations in Gluster get a performance
boost by speeding up identification of files needing movement and
a multi-threaded mechanism to move all such files.
* NFSv4 and pNFS support
Gluster 3.7 supports export of volumes through NFSv4, NFSv4.1 and
pNFS. This support is enabled via NFS Ganesha. Infrastructure changes
done in Gluster 3.7 to support this feature include:
- Addition of upcall infrastructure for cache invalidation.
- Support for lease locks and delegations.
- Support for enabling Ganesha through Gluster CLI.
- Corosync and pacemaker based implementation providing resource
monitoring and failover to accomplish NFS HA.
pNFS support for Gluster volumes and NFSv4 delegations are in beta
for this release. Infrastructure changes to support Lease locks and
NFSv4 delegations are targeted for a 3.7.x minor release.
* Snapshot Scheduling
With this enhancement, administrators can schedule volume snapshots.
* Snapshot Cloning
Volume snapshots can now be cloned to create a new writeable volume.
* Sharding [Experimental]
Sharding addresses the problem of fragmentation of space within a
volume. This feature adds support for files that are larger than
the size of an individual brick. Sharding works by chunking files
to blobs of a configurabe size.
Sharding is an experimental feature for this release. It is expected
to be fully supported in a 3.7.x minor release.
* RCU in glusterd
Thread synchronization and critical section access has been improved
by introducing userspace RCU in glusterd
* Arbiter Volumes
Arbiter volumes are 3 way replicated volumes where the 3rd brick
of the replica is automatically configured as an arbiter. The 3rd
brick contains only metadata which provides network partition
tolerance and prevents split-brains from happening.
Update to GlusterFS 3.7.1
* Better split-brain resolution
split-brain resolutions can now be also driven by users without
administrative intervention.
* Geo-replication improvements
There have been several improvements in geo-replication for stability
and performance.
* Minor Improvements
- Message ID based logging has been added for several translators.
- Quorum support for reads.
- Snapshot names contain timestamps by default.Subsequent access
to the snapshots should be done by the name listed in gluster
snapshot list
- Support for gluster volume get <volname> added.
- libgfapi has added handle based functions to get/set POSIX ACLs
based on common libacl structures.
New features:
- Volume Snapshots
Distributed lvm thin-pool based snapshots for backing up volumes
in a Gluster Trusted Storage Pool. Apart from providing cluster
wide co-ordination to trigger a consistent snapshot, several
improvements have been performed through the GlusterFS stack to
make translators more crash consistent. Snapshotting of volumes is
tightly coupled with lvm today but one could also enhance the same
framework to integrate with a backend storage technology like btrfs
that can perform snapshots.
- Erasure Coding
Xavier Hernandez from Datalab added support to perform erasure
coding of data in a GlusterFS volume across nodes in a Trusted
Storage Pool. Erasure Coding requires fewer nodes to provide better
redundancy than a n-way replicated volume and can help in reducing
the overall deployment cost. We look forward to build on this
foundation and deliver more enhancememnts in upcoming releases.
- Better SSL support
Multiple improvements to SSL support in GlusterFS. The GlusterFS
driver in OpenStack Manila that provides certificate based access
to tenants relies on these improvements.
- Meta translator
This translator provides a /proc like view for examining internal
state of translators on the client stack of a GlusterFS volume and
certainly looks like an interface that I would be heavily consuming
for introspection of GlusterFS.
- Automatic File Replication (AFR) v2
A significant re-factor of the synchronous replication translator,
provides granular entry self-healing and reduced resource consumption
with entry self-heals.
- NetBSD, OSX and FreeBSD ports
Lot of fixes on the portability front. The NetBSD port passes most
regression tests as of 3.6.0. At this point, none of these ports
are ready to be deployed in production. However, with dedicated
maintainers for each of these ports, we expect to reach production
readiness on these platforms in a future release.
Complete releases notes are available at
https://github.com/gluster/glusterfs/blob/release-3.6/doc/release-notes/3.6.0.md
* Improvements for Virtual Machine Image Storage
A number of improvements have been performed to let Gluster volumes provide
storage for Virtual Machine Images. Some of them include:
- qemu / libgfapi integration.
- Causal ordering in write-behind translator.
- Tunables for a gluster volume in group-virt.example.
The above results in significant improvements in performance for VM image
hosting.
* Synchronous Replication Improvements
GlusterFS 3.4 features significant improvements in performance for
the replication (AFR) translator. This is in addition to bug fixes
for volumes that used replica 3.
* Open Cluster Framework compliant Resource Agents
Resource Agents (RA) plug glusterd into Open Cluster Framework
(OCF) compliant cluster resource managers, like Pacemaker.
The glusterd RA manages the glusterd daemon like any upstart or
systemd job would, except that Pacemaker can do it in a cluster-aware
fashion.
The volume RA starts a volume and monitors individual brick?s
daemons in a cluster aware fashion, recovering bricks when their
processes fail.
* POSIX ACL support over NFSv3
setfacl and getfacl commands now can be used on a nfs mount that
exports a gluster volume to set or read posix ACLs.
* 3.3.x compatibility
The new op-version infrastructure provides compatibility with 3.3.x
release of GlusterFS. 3.3.x clients can talk to 3.4.x servers and
the vice-versa is also possible.
If a volume option that corresponds to 3.4 is enabled, then 3.3
clients cannot mount the volume.
* Packaging changes
New RPMs for libgfapi and OCF RA are present with 3.4.0.
* Experimental Features
- RDMA-connection manager (RDMA-CM)
- New Block Device translator
- Support for NUFA
As experimental features, we don?t expect them to work perfectly
for this release, but you can expect them to improve dramatically
as we make successive 3.4.x releases.
* Minor Improvements:
- The Ext4 file system change which affected readdir workloads for
Gluster volumes has been addressed.
- More options for selecting read-child with afr available now.
- Custom layouts possible with distribute translator.
- No 32-aux-gid limit
- SSL support for socket connections.
- Known issues with replica count greater than 2 addressed.
- quick-read and md-cache translators have been refactored.
- open-behind translator introduced.
- Ability to avoid glusterfs bind to reserved ports.
- statedumps are now created in /var/run/gluster instead of /tmp by default.
Major new features according to http://www.gluster.org/
- Elastic Volume Management
- New Gluster Console Manager
- Dynamic Storage for the data center and cloud