Loading .gitignore +1 −0 Original line number Original line Diff line number Diff line Loading @@ -33,6 +33,7 @@ *.lzo *.lzo *.patch *.patch *.gcno *.gcno *.ll modules.builtin modules.builtin Module.symvers Module.symvers *.dwo *.dwo Loading .mailmap +2 −0 Original line number Original line Diff line number Diff line Loading @@ -146,6 +146,8 @@ Santosh Shilimkar <ssantosh@kernel.org> Santosh Shilimkar <santosh.shilimkar@oracle.org> Santosh Shilimkar <santosh.shilimkar@oracle.org> Sascha Hauer <s.hauer@pengutronix.de> Sascha Hauer <s.hauer@pengutronix.de> S.Çağlar Onur <caglar@pardus.org.tr> S.Çağlar Onur <caglar@pardus.org.tr> Sebastian Reichel <sre@kernel.org> <sre@debian.org> Sebastian Reichel <sre@kernel.org> <sebastian.reichel@collabora.co.uk> Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com> Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com> Shuah Khan <shuah@kernel.org> <shuahkhan@gmail.com> Shuah Khan <shuah@kernel.org> <shuahkhan@gmail.com> Shuah Khan <shuah@kernel.org> <shuah.khan@hp.com> Shuah Khan <shuah@kernel.org> <shuah.khan@hp.com> Loading Documentation/00-INDEX +2 −0 Original line number Original line Diff line number Diff line Loading @@ -412,6 +412,8 @@ sysctl/ - directory with info on the /proc/sys/* files. - directory with info on the /proc/sys/* files. target/ target/ - directory with info on generating TCM v4 fabric .ko modules - directory with info on generating TCM v4 fabric .ko modules tee.txt - info on the TEE subsystem and drivers this_cpu_ops.txt this_cpu_ops.txt - List rationale behind and the way to use this_cpu operations. - List rationale behind and the way to use this_cpu operations. thermal/ thermal/ Loading Documentation/RCU/00-INDEX +1 −1 Original line number Original line Diff line number Diff line Loading @@ -17,7 +17,7 @@ rcu_dereference.txt rcubarrier.txt rcubarrier.txt - RCU and Unloadable Modules - RCU and Unloadable Modules rculist_nulls.txt rculist_nulls.txt - RCU list primitives for use with SLAB_DESTROY_BY_RCU - RCU list primitives for use with SLAB_TYPESAFE_BY_RCU rcuref.txt rcuref.txt - Reference-count design for elements of lists/arrays protected by RCU - Reference-count design for elements of lists/arrays protected by RCU rcu.txt rcu.txt Loading Documentation/RCU/Design/Data-Structures/Data-Structures.html +169 −64 Original line number Original line Diff line number Diff line Loading @@ -19,6 +19,8 @@ to each other. The <tt>rcu_state</tt> Structure</a> The <tt>rcu_state</tt> Structure</a> <li> <a href="#The rcu_node Structure"> <li> <a href="#The rcu_node Structure"> The <tt>rcu_node</tt> Structure</a> The <tt>rcu_node</tt> Structure</a> <li> <a href="#The rcu_segcblist Structure"> The <tt>rcu_segcblist</tt> Structure</a> <li> <a href="#The rcu_data Structure"> <li> <a href="#The rcu_data Structure"> The <tt>rcu_data</tt> Structure</a> The <tt>rcu_data</tt> Structure</a> <li> <a href="#The rcu_dynticks Structure"> <li> <a href="#The rcu_dynticks Structure"> Loading Loading @@ -841,6 +843,134 @@ for lockdep lock-class names. Finally, lines 64-66 produce an error if the maximum number of Finally, lines 64-66 produce an error if the maximum number of CPUs is too large for the specified fanout. CPUs is too large for the specified fanout. <h3><a name="The rcu_segcblist Structure"> The <tt>rcu_segcblist</tt> Structure</a></h3> The <tt>rcu_segcblist</tt> structure maintains a segmented list of callbacks as follows: <pre> 1 #define RCU_DONE_TAIL 0 2 #define RCU_WAIT_TAIL 1 3 #define RCU_NEXT_READY_TAIL 2 4 #define RCU_NEXT_TAIL 3 5 #define RCU_CBLIST_NSEGS 4 6 7 struct rcu_segcblist { 8 struct rcu_head *head; 9 struct rcu_head **tails[RCU_CBLIST_NSEGS]; 10 unsigned long gp_seq[RCU_CBLIST_NSEGS]; 11 long len; 12 long len_lazy; 13 }; </pre> <p> The segments are as follows: <ol> <li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed. These callbacks are ready to be invoked. <li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the current grace period. Note that different CPUs can have different ideas about which grace period is current, hence the <tt>->gp_seq</tt> field. <li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next grace period to start. <li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been associated with a grace period. </ol> <p> The <tt>->head</tt> pointer references the first callback or is <tt>NULL</tt> if the list contains no callbacks (which is <i>not</i> the same as being empty). Each element of the <tt>->tails[]</tt> array references the <tt>->next</tt> pointer of the last callback in the corresponding segment of the list, or the list's <tt>->head</tt> pointer if that segment and all previous segments are empty. If the corresponding segment is empty but some previous segment is not empty, then the array element is identical to its predecessor. Older callbacks are closer to the head of the list, and new callbacks are added at the tail. This relationship between the <tt>->head</tt> pointer, the <tt>->tails[]</tt> array, and the callbacks is shown in this diagram: </p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> </p><p>In this figure, the <tt>->head</tt> pointer references the first RCU callback in the list. The <tt>->tails[RCU_DONE_TAIL]</tt> array element references the <tt>->head</tt> pointer itself, indicating that none of the callbacks is ready to invoke. The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback CB 2's <tt>->next</tt> pointer, which indicates that CB 1 and CB 2 are both waiting on the current grace period, give or take possible disagreements about exactly which grace period is the current one. The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt> does, which indicates that there are no callbacks waiting on the next RCU grace period. The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references CB 4's <tt>->next</tt> pointer, indicating that all the remaining RCU callbacks have not yet been assigned to an RCU grace period. Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element always references the last RCU callback's <tt>->next</tt> pointer unless the callback list is empty, in which case it references the <tt>->head</tt> pointer. <p> There is one additional important special case for the <tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt> when this list is <i>disabled</i>. Lists are disabled when the corresponding CPU is offline or when the corresponding CPU's callbacks are offloaded to a kthread, both of which are described elsewhere. </p><p>CPUs advance their callbacks from the <tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the <tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments as grace periods advance. </p><p>The <tt>->gp_seq[]</tt> array records grace-period numbers corresponding to the list segments. This is what allows different CPUs to have different ideas as to which is the current grace period while still avoiding premature invocation of their callbacks. In particular, this allows CPUs that go idle for extended periods to determine which of their callbacks are ready to be invoked after reawakening. </p><p>The <tt>->len</tt> counter contains the number of callbacks in <tt>->head</tt>, and the <tt>->len_lazy</tt> contains the number of those callbacks that are known to only free memory, and whose invocation can therefore be safely deferred. <p><b>Important note</b>: It is the <tt>->len</tt> field that determines whether or not there are callbacks associated with this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt> pointer. The reason for this is that all the ready-to-invoke callbacks (that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted all at once at callback-invocation time. If callback invocation must be postponed, for example, because a high-priority process just woke up on this CPU, then the remaining callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment. Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts are adjusted after the corresponding callbacks have been invoked, and so again it is the <tt>->len</tt> count that accurately reflects whether or not there are callbacks associated with this <tt>rcu_segcblist</tt> structure. Of course, off-CPU sampling of the <tt>->len</tt> count requires the use of appropriate synchronization, for example, memory barriers. This synchronization can be a bit subtle, particularly in the case of <tt>rcu_barrier()</tt>. <h3><a name="The rcu_data Structure"> <h3><a name="The rcu_data Structure"> The <tt>rcu_data</tt> Structure</a></h3> The <tt>rcu_data</tt> Structure</a></h3> Loading Loading @@ -983,62 +1113,18 @@ choice. as follows: as follows: <pre> <pre> 1 struct rcu_head *nxtlist; 1 struct rcu_segcblist cblist; 2 struct rcu_head **nxttail[RCU_NEXT_SIZE]; 2 long qlen_last_fqs_check; 3 unsigned long nxtcompleted[RCU_NEXT_SIZE]; 3 unsigned long n_cbs_invoked; 4 long qlen_lazy; 4 unsigned long n_nocbs_invoked; 5 long qlen; 5 unsigned long n_cbs_orphaned; 6 long qlen_last_fqs_check; 6 unsigned long n_cbs_adopted; 7 unsigned long n_force_qs_snap; 7 unsigned long n_force_qs_snap; 8 unsigned long n_cbs_invoked; 8 long blimit; 9 unsigned long n_cbs_orphaned; 10 unsigned long n_cbs_adopted; 11 long blimit; </pre> </pre> <p>The <tt>->nxtlist</tt> pointer and the <p>The <tt>->cblist</tt> structure is the segmented callback list <tt>->nxttail[]</tt> array form a four-segment list with described earlier. older callbacks near the head and newer ones near the tail. Each segment contains callbacks with the corresponding relationship to the current grace period. The pointer out of the end of each of the four segments is referenced by the element of the <tt>->nxttail[]</tt> array indexed by <tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period), <tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period), <tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next grace period), and <tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated with a specific grace period) respectively, as shown in the following figure. </p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> </p><p>In this figure, the <tt>->nxtlist</tt> pointer references the first RCU callback in the list. The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references the <tt>->nxtlist</tt> pointer itself, indicating that none of the callbacks is ready to invoke. The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback CB 2's <tt>->next</tt> pointer, which indicates that CB 1 and CB 2 are both waiting on the current grace period. The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt> does, which indicates that there are no callbacks waiting on the next RCU grace period. The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references CB 4's <tt>->next</tt> pointer, indicating that all the remaining RCU callbacks have not yet been assigned to an RCU grace period. Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element always references the last RCU callback's <tt>->next</tt> pointer unless the callback list is empty, in which case it references the <tt>->nxtlist</tt> pointer. </p><p>CPUs advance their callbacks from the <tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the <tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments as grace periods advance. The CPU advances the callbacks in its <tt>rcu_data</tt> structure The CPU advances the callbacks in its <tt>rcu_data</tt> structure whenever it notices that another RCU grace period has completed. whenever it notices that another RCU grace period has completed. The CPU detects the completion of an RCU grace period by noticing The CPU detects the completion of an RCU grace period by noticing Loading @@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's <tt>->completed</tt> field is updated at the end of each <tt>->completed</tt> field is updated at the end of each grace period. grace period. </p><p>The <tt>->nxtcompleted[]</tt> array records grace-period <p> numbers corresponding to the list segments. This allows CPUs that go idle for extended periods to determine which of their callbacks are ready to be invoked after reawakening. </p><p>The <tt>->qlen</tt> counter contains the number of callbacks in <tt>->nxtlist</tt>, and the <tt>->qlen_lazy</tt> contains the number of those callbacks that are known to only free memory, and whose invocation can therefore be safely deferred. The <tt>->qlen_last_fqs_check</tt> and The <tt>->qlen_last_fqs_check</tt> and <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent states from <tt>call_rcu()</tt> and friends when callback states from <tt>call_rcu()</tt> and friends when callback Loading @@ -1069,6 +1146,10 @@ lists grow excessively long. fields count the number of callbacks invoked, fields count the number of callbacks invoked, sent to other CPUs when this CPU goes offline, sent to other CPUs when this CPU goes offline, and received from other CPUs when those other CPUs go offline. and received from other CPUs when those other CPUs go offline. The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks are offloaded to a kthread. <p> Finally, the <tt>->blimit</tt> counter is the maximum number of Finally, the <tt>->blimit</tt> counter is the maximum number of RCU callbacks that may be invoked at a given time. RCU callbacks that may be invoked at a given time. Loading Loading @@ -1104,6 +1185,9 @@ Its fields are as follows: 1 int dynticks_nesting; 1 int dynticks_nesting; 2 int dynticks_nmi_nesting; 2 int dynticks_nmi_nesting; 3 atomic_t dynticks; 3 atomic_t dynticks; 4 bool rcu_need_heavy_qs; 5 unsigned long rcu_qs_ctr; 6 bool rcu_urgent_qs; </pre> </pre> <p>The <tt>->dynticks_nesting</tt> field counts the <p>The <tt>->dynticks_nesting</tt> field counts the Loading @@ -1117,11 +1201,32 @@ NMIs are counted by the <tt>->dynticks_nmi_nesting</tt> field, except that NMIs that interrupt non-dyntick-idle execution field, except that NMIs that interrupt non-dyntick-idle execution are not counted. are not counted. </p><p>Finally, the <tt>->dynticks</tt> field counts the corresponding </p><p>The <tt>->dynticks</tt> field counts the corresponding CPU's transitions to and from dyntick-idle mode, so that this counter CPU's transitions to and from dyntick-idle mode, so that this counter has an even value when the CPU is in dyntick-idle mode and an odd has an even value when the CPU is in dyntick-idle mode and an odd value otherwise. value otherwise. </p><p>The <tt>->rcu_need_heavy_qs</tt> field is used to record the fact that the RCU core code would really like to see a quiescent state from the corresponding CPU, so much so that it is willing to call for heavy-weight dyntick-counter operations. This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> code, which provide a momentary idle sojourn in response. </p><p>The <tt>->rcu_qs_ctr</tt> field is used to record quiescent states from <tt>cond_resched()</tt>. Because <tt>cond_resched()</tt> can execute quite frequently, this must be quite lightweight, as in a non-atomic increment of this per-CPU field. </p><p>Finally, the <tt>->rcu_urgent_qs</tt> field is used to record the fact that the RCU core code would really like to see a quiescent state from the corresponding CPU, with the various other fields indicating just how badly RCU wants this quiescent state. This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> code, which, if nothing else, non-atomically increment <tt>->rcu_qs_ctr</tt> in response. <table> <table> <tr><th> </th></tr> <tr><th> </th></tr> <tr><th align="left">Quick Quiz:</th></tr> <tr><th align="left">Quick Quiz:</th></tr> Loading Loading
.gitignore +1 −0 Original line number Original line Diff line number Diff line Loading @@ -33,6 +33,7 @@ *.lzo *.lzo *.patch *.patch *.gcno *.gcno *.ll modules.builtin modules.builtin Module.symvers Module.symvers *.dwo *.dwo Loading
.mailmap +2 −0 Original line number Original line Diff line number Diff line Loading @@ -146,6 +146,8 @@ Santosh Shilimkar <ssantosh@kernel.org> Santosh Shilimkar <santosh.shilimkar@oracle.org> Santosh Shilimkar <santosh.shilimkar@oracle.org> Sascha Hauer <s.hauer@pengutronix.de> Sascha Hauer <s.hauer@pengutronix.de> S.Çağlar Onur <caglar@pardus.org.tr> S.Çağlar Onur <caglar@pardus.org.tr> Sebastian Reichel <sre@kernel.org> <sre@debian.org> Sebastian Reichel <sre@kernel.org> <sebastian.reichel@collabora.co.uk> Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com> Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com> Shuah Khan <shuah@kernel.org> <shuahkhan@gmail.com> Shuah Khan <shuah@kernel.org> <shuahkhan@gmail.com> Shuah Khan <shuah@kernel.org> <shuah.khan@hp.com> Shuah Khan <shuah@kernel.org> <shuah.khan@hp.com> Loading
Documentation/00-INDEX +2 −0 Original line number Original line Diff line number Diff line Loading @@ -412,6 +412,8 @@ sysctl/ - directory with info on the /proc/sys/* files. - directory with info on the /proc/sys/* files. target/ target/ - directory with info on generating TCM v4 fabric .ko modules - directory with info on generating TCM v4 fabric .ko modules tee.txt - info on the TEE subsystem and drivers this_cpu_ops.txt this_cpu_ops.txt - List rationale behind and the way to use this_cpu operations. - List rationale behind and the way to use this_cpu operations. thermal/ thermal/ Loading
Documentation/RCU/00-INDEX +1 −1 Original line number Original line Diff line number Diff line Loading @@ -17,7 +17,7 @@ rcu_dereference.txt rcubarrier.txt rcubarrier.txt - RCU and Unloadable Modules - RCU and Unloadable Modules rculist_nulls.txt rculist_nulls.txt - RCU list primitives for use with SLAB_DESTROY_BY_RCU - RCU list primitives for use with SLAB_TYPESAFE_BY_RCU rcuref.txt rcuref.txt - Reference-count design for elements of lists/arrays protected by RCU - Reference-count design for elements of lists/arrays protected by RCU rcu.txt rcu.txt Loading
Documentation/RCU/Design/Data-Structures/Data-Structures.html +169 −64 Original line number Original line Diff line number Diff line Loading @@ -19,6 +19,8 @@ to each other. The <tt>rcu_state</tt> Structure</a> The <tt>rcu_state</tt> Structure</a> <li> <a href="#The rcu_node Structure"> <li> <a href="#The rcu_node Structure"> The <tt>rcu_node</tt> Structure</a> The <tt>rcu_node</tt> Structure</a> <li> <a href="#The rcu_segcblist Structure"> The <tt>rcu_segcblist</tt> Structure</a> <li> <a href="#The rcu_data Structure"> <li> <a href="#The rcu_data Structure"> The <tt>rcu_data</tt> Structure</a> The <tt>rcu_data</tt> Structure</a> <li> <a href="#The rcu_dynticks Structure"> <li> <a href="#The rcu_dynticks Structure"> Loading Loading @@ -841,6 +843,134 @@ for lockdep lock-class names. Finally, lines 64-66 produce an error if the maximum number of Finally, lines 64-66 produce an error if the maximum number of CPUs is too large for the specified fanout. CPUs is too large for the specified fanout. <h3><a name="The rcu_segcblist Structure"> The <tt>rcu_segcblist</tt> Structure</a></h3> The <tt>rcu_segcblist</tt> structure maintains a segmented list of callbacks as follows: <pre> 1 #define RCU_DONE_TAIL 0 2 #define RCU_WAIT_TAIL 1 3 #define RCU_NEXT_READY_TAIL 2 4 #define RCU_NEXT_TAIL 3 5 #define RCU_CBLIST_NSEGS 4 6 7 struct rcu_segcblist { 8 struct rcu_head *head; 9 struct rcu_head **tails[RCU_CBLIST_NSEGS]; 10 unsigned long gp_seq[RCU_CBLIST_NSEGS]; 11 long len; 12 long len_lazy; 13 }; </pre> <p> The segments are as follows: <ol> <li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed. These callbacks are ready to be invoked. <li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the current grace period. Note that different CPUs can have different ideas about which grace period is current, hence the <tt>->gp_seq</tt> field. <li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next grace period to start. <li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been associated with a grace period. </ol> <p> The <tt>->head</tt> pointer references the first callback or is <tt>NULL</tt> if the list contains no callbacks (which is <i>not</i> the same as being empty). Each element of the <tt>->tails[]</tt> array references the <tt>->next</tt> pointer of the last callback in the corresponding segment of the list, or the list's <tt>->head</tt> pointer if that segment and all previous segments are empty. If the corresponding segment is empty but some previous segment is not empty, then the array element is identical to its predecessor. Older callbacks are closer to the head of the list, and new callbacks are added at the tail. This relationship between the <tt>->head</tt> pointer, the <tt>->tails[]</tt> array, and the callbacks is shown in this diagram: </p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> </p><p>In this figure, the <tt>->head</tt> pointer references the first RCU callback in the list. The <tt>->tails[RCU_DONE_TAIL]</tt> array element references the <tt>->head</tt> pointer itself, indicating that none of the callbacks is ready to invoke. The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback CB 2's <tt>->next</tt> pointer, which indicates that CB 1 and CB 2 are both waiting on the current grace period, give or take possible disagreements about exactly which grace period is the current one. The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt> does, which indicates that there are no callbacks waiting on the next RCU grace period. The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references CB 4's <tt>->next</tt> pointer, indicating that all the remaining RCU callbacks have not yet been assigned to an RCU grace period. Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element always references the last RCU callback's <tt>->next</tt> pointer unless the callback list is empty, in which case it references the <tt>->head</tt> pointer. <p> There is one additional important special case for the <tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt> when this list is <i>disabled</i>. Lists are disabled when the corresponding CPU is offline or when the corresponding CPU's callbacks are offloaded to a kthread, both of which are described elsewhere. </p><p>CPUs advance their callbacks from the <tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the <tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments as grace periods advance. </p><p>The <tt>->gp_seq[]</tt> array records grace-period numbers corresponding to the list segments. This is what allows different CPUs to have different ideas as to which is the current grace period while still avoiding premature invocation of their callbacks. In particular, this allows CPUs that go idle for extended periods to determine which of their callbacks are ready to be invoked after reawakening. </p><p>The <tt>->len</tt> counter contains the number of callbacks in <tt>->head</tt>, and the <tt>->len_lazy</tt> contains the number of those callbacks that are known to only free memory, and whose invocation can therefore be safely deferred. <p><b>Important note</b>: It is the <tt>->len</tt> field that determines whether or not there are callbacks associated with this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt> pointer. The reason for this is that all the ready-to-invoke callbacks (that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted all at once at callback-invocation time. If callback invocation must be postponed, for example, because a high-priority process just woke up on this CPU, then the remaining callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment. Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts are adjusted after the corresponding callbacks have been invoked, and so again it is the <tt>->len</tt> count that accurately reflects whether or not there are callbacks associated with this <tt>rcu_segcblist</tt> structure. Of course, off-CPU sampling of the <tt>->len</tt> count requires the use of appropriate synchronization, for example, memory barriers. This synchronization can be a bit subtle, particularly in the case of <tt>rcu_barrier()</tt>. <h3><a name="The rcu_data Structure"> <h3><a name="The rcu_data Structure"> The <tt>rcu_data</tt> Structure</a></h3> The <tt>rcu_data</tt> Structure</a></h3> Loading Loading @@ -983,62 +1113,18 @@ choice. as follows: as follows: <pre> <pre> 1 struct rcu_head *nxtlist; 1 struct rcu_segcblist cblist; 2 struct rcu_head **nxttail[RCU_NEXT_SIZE]; 2 long qlen_last_fqs_check; 3 unsigned long nxtcompleted[RCU_NEXT_SIZE]; 3 unsigned long n_cbs_invoked; 4 long qlen_lazy; 4 unsigned long n_nocbs_invoked; 5 long qlen; 5 unsigned long n_cbs_orphaned; 6 long qlen_last_fqs_check; 6 unsigned long n_cbs_adopted; 7 unsigned long n_force_qs_snap; 7 unsigned long n_force_qs_snap; 8 unsigned long n_cbs_invoked; 8 long blimit; 9 unsigned long n_cbs_orphaned; 10 unsigned long n_cbs_adopted; 11 long blimit; </pre> </pre> <p>The <tt>->nxtlist</tt> pointer and the <p>The <tt>->cblist</tt> structure is the segmented callback list <tt>->nxttail[]</tt> array form a four-segment list with described earlier. older callbacks near the head and newer ones near the tail. Each segment contains callbacks with the corresponding relationship to the current grace period. The pointer out of the end of each of the four segments is referenced by the element of the <tt>->nxttail[]</tt> array indexed by <tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period), <tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period), <tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next grace period), and <tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated with a specific grace period) respectively, as shown in the following figure. </p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> </p><p>In this figure, the <tt>->nxtlist</tt> pointer references the first RCU callback in the list. The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references the <tt>->nxtlist</tt> pointer itself, indicating that none of the callbacks is ready to invoke. The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback CB 2's <tt>->next</tt> pointer, which indicates that CB 1 and CB 2 are both waiting on the current grace period. The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt> does, which indicates that there are no callbacks waiting on the next RCU grace period. The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references CB 4's <tt>->next</tt> pointer, indicating that all the remaining RCU callbacks have not yet been assigned to an RCU grace period. Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element always references the last RCU callback's <tt>->next</tt> pointer unless the callback list is empty, in which case it references the <tt>->nxtlist</tt> pointer. </p><p>CPUs advance their callbacks from the <tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the <tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments as grace periods advance. The CPU advances the callbacks in its <tt>rcu_data</tt> structure The CPU advances the callbacks in its <tt>rcu_data</tt> structure whenever it notices that another RCU grace period has completed. whenever it notices that another RCU grace period has completed. The CPU detects the completion of an RCU grace period by noticing The CPU detects the completion of an RCU grace period by noticing Loading @@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's <tt>->completed</tt> field is updated at the end of each <tt>->completed</tt> field is updated at the end of each grace period. grace period. </p><p>The <tt>->nxtcompleted[]</tt> array records grace-period <p> numbers corresponding to the list segments. This allows CPUs that go idle for extended periods to determine which of their callbacks are ready to be invoked after reawakening. </p><p>The <tt>->qlen</tt> counter contains the number of callbacks in <tt>->nxtlist</tt>, and the <tt>->qlen_lazy</tt> contains the number of those callbacks that are known to only free memory, and whose invocation can therefore be safely deferred. The <tt>->qlen_last_fqs_check</tt> and The <tt>->qlen_last_fqs_check</tt> and <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent states from <tt>call_rcu()</tt> and friends when callback states from <tt>call_rcu()</tt> and friends when callback Loading @@ -1069,6 +1146,10 @@ lists grow excessively long. fields count the number of callbacks invoked, fields count the number of callbacks invoked, sent to other CPUs when this CPU goes offline, sent to other CPUs when this CPU goes offline, and received from other CPUs when those other CPUs go offline. and received from other CPUs when those other CPUs go offline. The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks are offloaded to a kthread. <p> Finally, the <tt>->blimit</tt> counter is the maximum number of Finally, the <tt>->blimit</tt> counter is the maximum number of RCU callbacks that may be invoked at a given time. RCU callbacks that may be invoked at a given time. Loading Loading @@ -1104,6 +1185,9 @@ Its fields are as follows: 1 int dynticks_nesting; 1 int dynticks_nesting; 2 int dynticks_nmi_nesting; 2 int dynticks_nmi_nesting; 3 atomic_t dynticks; 3 atomic_t dynticks; 4 bool rcu_need_heavy_qs; 5 unsigned long rcu_qs_ctr; 6 bool rcu_urgent_qs; </pre> </pre> <p>The <tt>->dynticks_nesting</tt> field counts the <p>The <tt>->dynticks_nesting</tt> field counts the Loading @@ -1117,11 +1201,32 @@ NMIs are counted by the <tt>->dynticks_nmi_nesting</tt> field, except that NMIs that interrupt non-dyntick-idle execution field, except that NMIs that interrupt non-dyntick-idle execution are not counted. are not counted. </p><p>Finally, the <tt>->dynticks</tt> field counts the corresponding </p><p>The <tt>->dynticks</tt> field counts the corresponding CPU's transitions to and from dyntick-idle mode, so that this counter CPU's transitions to and from dyntick-idle mode, so that this counter has an even value when the CPU is in dyntick-idle mode and an odd has an even value when the CPU is in dyntick-idle mode and an odd value otherwise. value otherwise. </p><p>The <tt>->rcu_need_heavy_qs</tt> field is used to record the fact that the RCU core code would really like to see a quiescent state from the corresponding CPU, so much so that it is willing to call for heavy-weight dyntick-counter operations. This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> code, which provide a momentary idle sojourn in response. </p><p>The <tt>->rcu_qs_ctr</tt> field is used to record quiescent states from <tt>cond_resched()</tt>. Because <tt>cond_resched()</tt> can execute quite frequently, this must be quite lightweight, as in a non-atomic increment of this per-CPU field. </p><p>Finally, the <tt>->rcu_urgent_qs</tt> field is used to record the fact that the RCU core code would really like to see a quiescent state from the corresponding CPU, with the various other fields indicating just how badly RCU wants this quiescent state. This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> code, which, if nothing else, non-atomically increment <tt>->rcu_qs_ctr</tt> in response. <table> <table> <tr><th> </th></tr> <tr><th> </th></tr> <tr><th align="left">Quick Quiz:</th></tr> <tr><th align="left">Quick Quiz:</th></tr> Loading