=========================================================
ϡ
Linux-2.6.29/Documentation/filesystems/dentry-locking.txt 
Ǥ
Ρ JF ץ < http://www.linux.or.jp/JF/ >
  2009/07/09
  Seiji Kaneko < skaneko at mbn dot or dot jp >
=========================================================
#RCU-based dcache locking model
RCU ١ dcache åǥ
==============================

#On many workloads, the most common operation on dcache is to look up a
#dentry, given a parent dentry and the name of the child. Typically,
#for every open(), stat() etc., the dentry corresponding to the
#pathname will be looked up by walking the tree starting with the first
#component of the pathname and using that dentry along with the next
#component to look up the next level and so on. Since it is a frequent
#operation for workloads like multiuser environments and web servers,
#it is important to optimize this path.
¿٤Ǥϡdcache ФǤ褯ȤȤϡ dentry ȻҤ̾
Ϳ dentry õǤŵŪˤƤ open()stat() ʤ
ǡѥ̾˰פ dentry ĥ꡼éäõĤޤѥ̾κǽ
Ȥäƥĥ꡼õ򳫻Ϥ줿 dentry Ȥäƥѥ̾ܤǤ
Υ٥õʤɤη֤³ޤϥޥ桼Ķ䡢
֥Фʤɤ٤ˤȯǤ顢ΥѥŬ뤳
פˤʤޤ

#Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus in
#every component during path look-up. Since 2.5.10 onwards, fast-walk
#algorithm changed this by holding the dcache_lock at the beginning and
#walking as many cached path component dentries as possible. This
#significantly decreases the number of acquisition of
#dcache_lock. However it also increases the lock hold time
#significantly and affects performance in large SMP machines. Since
#2.5.62 kernel, dcache has been using a new locking model that uses RCU
#to make dcache look-up lock-free.
2.5.10 Ǥϡdcache_lock  d_lookup ǼƤꡢη̥ѥõ
δ֤ƤΥݡͥȤǼƤޤ2.5.10 ʹߤǤϡ®õ
르ꥺˤꤳξѤäơõκǽ dcache_lock 
å夵줿ѥݡͥ dentry ǽʸ¤éȤ褦Ѥ
ޤˤꡢdcache_lock μ礭ޤå
ݻ֤˿ӡ絬 SMP ޥǽ˱ƶǤޤ2.5.62 
ͥ뤫ϡdcache õåʤǹԤʤᡢdcache ˤ RCU Ѥ
åǥ뤬ȤƤޤ

#The current dcache locking model is not very different from the
#existing dcache locking model. Prior to 2.5.62 kernel, dcache_lock
#protected the hash chain, d_child, d_alias, d_lru lists as well as
#d_inode and several other things like mount look-up. RCU-based changes
#affect only the way the hash chain is protected. For everything else
#the dcache_lock must be taken for both traversing as well as
#updating. The hash chain updates too take the dcache_lock.  The
#significant change is the way d_lookup traverses the hash chain, it
#doesn't acquire the dcache_lock for this and rely on RCU to ensure
#that the dentry has not been *freed*.
ߤ dcache åǥϡ dcache åǥ礭ۤʤΤ
Ϥޤ2.5.62 ΥͥǤϡdcache_lock  hash_chain, d_child,
d_alias, d_lru ꥹȤ d_inode ӥޥȻõʤɤäƤޤ
RCU ١Ǥѹϡhash_chain ݸƤΤߤ˱ƶޤ
ʳΤ٤Ƥǡdcache_lock õȹξǼɬפޤ
hash chain ιˤ dcache_lock μɬפǤ礭ѹϡd_lookup
 hash chain õˡ˲äƤޤѹǤϡõ
dcache_lock 줺dentry ƤʤȤ RCU ¦ݾ
˰Ѥͤޤ


#Dcache locking details
Dcache åξܺ
======================

#For many multi-user workloads, open() and stat() on files are very
#frequently occurring operations. Both involve walking of path names to
#find the dentry corresponding to the concerned file. In 2.4 kernel,
#dcache_lock was held during look-up of each path component. Contention
#and cache-line bouncing of this global lock caused significant
#scalability problems. With the introduction of RCU in Linux kernel,
#this was worked around by making the look-up of path components during
#path walking lock-free.
¿Υޥ桼ٴĶǤϡեФ open()  stat() ˤ
ȯǤν⡢ѥ̾򤿤ɤäоݤȤեб
 dentry õޤߤޤ2.4 ͥǤϡdcache_lock ѥݡ
ͥȤ򸡺Ƥ֡äݻƤޤΥХåǤΡ
ååȥå饤Х (: åΰΥå
Υץå֤Ǥž¿ȯä˥ԥ) ϡ¿ťƥǤǽ
Ӥ˽ȤʤäƤޤLinux ͥؤ RCU Ƴˤꡢ
̾򤿤ɤݤΥѥݡͥȤõåפȤʤꡢϲǤ
褦ˤʤޤ

#Safe lock-free look-up of dcache hash table
ǥåפ dcache ϥåơ֥õ
===========================================

#Dcache is a complex data structure with the hash table entries also
#linked together in other lists. In 2.4 kernel, dcache_lock protected
#all the lists. We applied RCU only on hash chain walking. The rest of
#the lists are still protected by dcache_lock.  Some of the important
#changes are :
Dcache ϡϥåơ֥륨ȥ¾ΥꥹȤⶦѥ󥯤
ʣʥǡ¤ΤǤ2.4 ͥǤϡdcache_lock ꥹȷΤݸ
Ƥޤ(ߤΥͥǤ) 䤿ϥϥåéˤΤ
RCU Ȥ褦ˤޤꥹȤ¾ʬϰȤ dcache_lock ݸ
ƤޤǽפȤʤѹĤʲޤ

#1. The deletion from hash chain is done using hlist_del_rcu() macro
#   which doesn't initialize next pointer of the deleted dentry and
#   this allows us to walk safely lock-free while a deletion is
#   happening.
1. ϥå󤫤κ hlist_del_rcu() ޥȤäƹԤ졢Υ
   Ϻ줿 dentry μΥȥؤݥ󥿤ޤ󡣤
   ȤˤꡢԤ줿Ǥåʤǰ˥é뤳Ȥ
   ޤ

#2. Insertion of a dentry into the hash table is done using
#   hlist_add_head_rcu() which take care of ordering the writes - the
#   writes to the dentry must be visible before the dentry is
#   inserted. This works in conjunction with hlist_for_each_rcu() while
#   walking the hash chain. The only requirement is that all
#   initialization to the dentry must be done before
#   hlist_add_head_rcu() since we don't have dcache_lock protection
#   while traversing the hash chain. This isn't different from the
#   existing code.
2. dentry ϥåơ֥ˤϡhlist_add_head_rcu() ȤäƤ
   Ȥн񤭹߽ʤɤݤ򤹤٤ƤߤƤ館ޤdentry
   ؤν񤭹ߤϡdentry ˲ĻˤʤäƤʤФޤ󡣤
   Сhlist_add_head_rcu() Ȥ߹碌뤳Ȥǡϥåé
   ǽȤʤޤ׵ϡhlist_add_head_rcu() μ¹
   dentry ƤνѤޤƤʤФʤʤȤǤ
   ϥåéκݤ dcache_lock äƤʤǤ
   ϴ¸ΥɤνȰۤʤäƤ櫓ǤϤޤ

#3. The dentry looked up without holding dcache_lock by cannot be
#   returned for walking if it is unhashed. It then may have a NULL
#   d_inode or other bogosity since RCU doesn't protect the other
#   fields in the dentry. We therefore use a flag DCACHE_UNHASHED to
#   indicate unhashed dentries and use this in conjunction with a
#   per-dentry lock (d_lock). Once looked up without the dcache_lock,
#   we acquire the per-dentry lock (d_lock) and check if the dentry is
#   unhashed. If so, the look-up is failed. If not, the reference count
#   of the dentry is increased and the dentry is returned.
3. dcache_lock ʤõ dentry ϡϥå夵Ƥʤˤϡé
    dentry Ȥ֤ͤȤԲǽǤᤷ硢NULL d_inode
   䡢Τ̵ۤʾ֤뤫⤷ޤ󡣤 RCU Ǥ dentry
   ¾ΥեɤݸƤʤǤΤᡢDCACHE_UNHASHED ե饰
   Ȥäƥϥå夵Ƥʤ dentry 򼨤 dentry Υå
   (d_lock) Ȥ߹碌ƻѤޤޤ dcache_lock ʤõ򤪤ʤ
   硢dentry ˤå (d_lock) ơdentry ϥå夵
   Ƥ뤫ɤǧޤϥå夵Ƥʤ硢õϼԤǤϥ
   夵Ƥˤϡdentry λȥȤ 1 ädentry 
   Ȥޤ

#4. Once a dentry is looked up, it must be ensured during the path walk
#   for that component it doesn't go away. In pre-2.5.10 code, this was
#   done holding a reference to the dentry. dcache_rcu does the same.
#   In some sense, dcache_rcu path walking looks like the pre-2.5.10
#   version.
4. ö dentry õԤäȤϡγǤΥѥé֡ƤѲ
   ȤݾڤʤФޤ2.5.10 ΥɤǤϡ dentry
   ؤλȤݻ뤳ȤݾڤƤޤdcache_rcu ϤƱ
   Ԥޤ̣dcache_rcu Ѥѥéϡ2.5.10 Τ
   ΤƱ褦˸ޤ

#5. All dentry hash chain updates must take the dcache_lock as well as
#   the per-dentry lock in that order. dput() does this to ensure that
#   a dentry that has just been looked up in another CPU doesn't get
#   deleted before dget() can be done on it.
5. dentry ϥåιǤϡdcache_lock ȡdentry Υå
   򤳤νǼʤФޤdput() ϤԤ¾ CPU ˤõ
   ȯ줿 dentry dget() ǽǤޤǺʤ褦ݾ
   ޤ

#6. There are several ways to do reference counting of RCU protected
#   objects. One such example is in ipv4 route cache where deferred
#   freeing (using call_rcu()) is done as soon as the reference count
#   goes to zero. This cannot be done in the case of dentries because
#   tearing down of dentries require blocking (dentry_iput()) which
#   isn't supported from RCU callbacks. Instead, tearing down of
#   dentries happen synchronously in dput(), but actual freeing happens
#   later when RCU grace period is over. This allows safe lock-free
#   walking of the hash chains, but a matched dentry may have been
#   partially torn down. The checking of DCACHE_UNHASHED flag with
#   d_lock held detects such dentries and prevents them from being
#   returned from look-up.
6. RCU ݸ줿֥ȤФ뻲ȥȤˤϡĤμ¸ʤ
   ޤΰĤȤơipv4 饦ȥåǤϡȥȤ 0
   ˤʤȤٱ (call_rcu() Ȥä) ¹Ԥޤdentry ξ
   硢Τ褦ʽǤޤ󡣤ϡdentry 򳰤ˤϡRCU Х
   ǤϥݡȤƤʤ֥å󥰽 (dentry_iput()) ɬפȤʤ
   Ǥä dentry Ǥϡdentry 򳰤 dput() Ʊ
   Ū˼»ܤºݤΰβ RCU ͱͽ֤äƤԤ褦ˤʤ
   Ƥޤˤꡢϥå˥åʤé뤳Ȥǽ
   ʤäƤޤõǸĤ dentry ʬŪ˳줿֤ˤʤäƤ
   ⤷ޤd_lock ä֤ DCACHE_UNHASHED ե饰θԤ
   ȤǤΤ褦 dentry ΤǤ̤Ȥ֤ƤޤΤɤ
   ȤǤޤ


#Maintaining POSIX rename semantics
POSIX ͡ॻޥƥåΰݻ
================================

#Since look-up of dentries is lock-free, it can race against a
#concurrent rename operation. For example, during rename of file A to
#B, look-up of either A or B must succeed.  So, if look-up of B happens
#after A has been removed from the hash chain but not added to the new
#hash chain, it may fail.  Also, a comparison while the name is being
#written concurrently by a rename may result in false positive matches
#violating rename semantics.  Issues related to race with rename are
#handled as described below :
dentry õϥåʤǹԤᡢ¹ԤƼ¹Ԥ͡
򵯤ǽޤ㤨Сե A  B ؤ̾ѹϡA
Ȥ̾Τõ B Ȥ̾ΤõΤɤ餫ɬפޤ
äơA ϥå󤫤ƤϤ뤬ϥåɲ
ƤϤʤߥ󥰤 B ؤõä硢ޤʤ뤫⤷
ޤ󡣤ޤ͡ˤ̾ѹνμ¹¹ԤӤԤ줿
硢̾ѹΥޥƥåȿäõȯ뤫⤷ޤ
󡣶˴ؤϡʲΤ褦ˤƽޤ

#1. Look-up can be done in two ways - d_lookup() which is safe from
#   simultaneous renames and __d_lookup() which is not.  If
#   __d_lookup() fails, it must be followed up by a d_lookup() to
#   correctly determine whether a dentry is in the hash table or
#   not. d_lookup() protects look-ups using a sequence lock
#   (rename_lock).
1. õ̤ˡǹԤȤǤޤ - d_lookup() ƱΥ͡
   Фưǡ__d_lookup() ϰǤϤޤ__d_lookup() Ԥ
   硢³ d_lookup() ¹Ԥƥϥåơ֥˵ dentry 
   ¸ߤ뤫ɤȽꤷʤФޤd_lookup() ϥ
   å (rename_lock) Ȥäõݸޤ

#2. The name associated with a dentry (d_name) may be changed if a
#   rename is allowed to happen simultaneously. To avoid memcmp() in
#   __d_lookup() go out of bounds due to a rename and false positive
#   comparison, the name comparison is done while holding the
#   per-dentry lock. This prevents concurrent renames during this
#   operation.
2. dentry ˴Ϣդ줿̾ (d_name) ϡ͡¹Ԥư뤳
   ȤƤ硢ѹ뤫⤷ޤ__d_lookup() Ǥ memcmp()
   ̾ѹȼäϰϳˤʤΤ򤱤뤿ᡢƸäӷ
   Ȥʤʤ褦ˤ뤿ˡ̾Ӥ dentry Υåä֤ǹ
   ޤˤꡢӽΥ͡¹ưɤޤ

#3. Hash table walking during look-up may move to a different bucket as
#   the current dentry is moved to a different bucket due to rename.
#   But we use hlists in dcache hash table and they are
#   null-terminated.  So, even if a dentry moves to a different bucket,
#   hash chain walk will terminate. [with a list_head list, it may not
#   since termination is when the list_head in the original bucket is
#   reached].  Since we redo the d_parent check and compare name while
#   holding d_lock, lock-free look-up will not race against d_move().
3. Υϥåơ֥éϡ͡ˤ긽ߤ dentry 
   ۤʤäХåȤ˰ưΤȼۤʤäХåȤ˰ưǽ
   ޤξ dcache ϥåơ֥ hlist ѤƤꡢ
    NULL üƤޤäơdentry ۤʤäХåȤ˰ư
   Ǥ⡢ϥåéߤޤʤʤ뤳ȤϤޤ
   ( list_head ΥꥹȤǤϡΥХåȤΥꥹȥإåɤã
   ˴λȤʤ뤿ᡢưη̴λʤʤ뤫⤷ޤ)Ǥ
   d_lock ä d_parent å̾ӤƼ¹ԤƤ뤿ᡢå
   ʤ d_move() ȤζϵʤϤǤ

#4. There can be a theoretical race when a dentry keeps coming back to
#   original bucket due to double moves. Due to this look-up may
#   consider that it has never moved and can end up in a infinite loop.
#   But this is not any worse that theoretical livelocks we already
#   have in the kernel.
4. ΰưȼdentry ΥХåȤᤵ줿硢϶礬ȯ
   ǽޤΤᡢõưƤʤȽǤ
   ǽ䡢̵¥롼פˤʤǽιθʤɤɬפˤʤäƤޤ
   ֤ϸΥͥ¸ߤ롢ϤΥ饤֥åβǽ갭
   ΤȤϸޤ


#Important guidelines for filesystem developers related to dcache_rcu
dcache_rcu ˴ؤե륷ƥ೫ȯԸνפʥɥ饤
=============================================================

#1. Existing dcache interfaces (pre-2.5.62) exported to filesystem
#   don't change. Only dcache internal implementation changes. However
#   filesystems *must not* delete from the dentry hash chains directly
#   using the list macros like allowed earlier. They must use dcache
#   APIs like d_drop() or __d_drop() depending on the situation.
1. ե륷ƥ˸줿¸ dcache 󥿡ե (2.5.62 
   ) ѹƤޤdcache ΤߤѹƤޤ
   ե륷ƥϰƤ褦ʡꥹȥޥȤä dentry 
   å󤫤ΥȥκФ˹ԤäƤϤޤ󡣤ξ
   ˱ d_drop()  __d_drop() ʤɤ dcache API ȤʤФ
   ޤ

#2. d_flags is now protected by a per-dentry lock (d_lock). All access
#   to d_flags must be protected by it.
2. d_flags  dentry ˤå (d_lock) ݸ褦ˤʤޤ
   d_flags ؤΥϤΥåݸ֤ǹԤʤФ
   ޤ

#3. For a hashed dentry, checking of d_count needs to be protected by
#   d_lock.
3. ϥå夵줿 dentry Ǥϡd_count Υåϡd_lock ݸ
   ֤ǹԤʤФޤ


#Papers and other documentation on dcache locking
dcache åˡ˴ؤʸȴϢʸ
=====================================

1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124).

2. http://lse.sourceforge.net/locking/dcache/dcache.html



