UMYU Scientifica

A periodical of the Faculty of Natural and Applied Sciences, UMYU, Katsina

ISSN: 2955 – 1145 (print); 2955 – 1153 (online)

Image

ORIGINAL RESEARCH ARTICLE

Two-step Inertial Self-Adaptive Gradient Methods for the Split Feasibility Problem with Application

1*L.B. Mohammed, 2B.I. Babura, and 3A. Ibrahim

1Department of Mathematics, Faculty of Physical Sciences, Federal University Dutse, PMB 7156, Dutse, Jigawa State, Nigeria.

Corresponding Author: L.B. Mohammed [email protected]

Abstract

This paper introduces novel two-step inertial self-adaptive gradient algorithms (TISGA) for solving the Split Feasibility Problem (SFP) in Hilbert spaces. The proposed method integrates dual inertial parameters \(\gamma_{1} \in (0,\frac{1}{2}\rbrack\) and \(\gamma_{2} \leq 0\) satisfying \(\gamma_{1} + \gamma_{2} \geq 0\), together with a self-adaptive step-size \(\lambda_{n} = \frac{\beta\rho h(w_{n})}{\parallel \nabla h(w_{n}) \parallel^{2}}\) where \(\rho \in (0,4)\) and \(\beta > 0\) is chosen to satisfy specific bounding conditions. This approach eliminates the requirement for prior knowledge of the bounded linear operator’s norm—a value often challenging to determine in practice. Under mild assumptions on the underlying operators, we establish weak convergence of the algorithm to a solution of the SFP. Numerical examples and an application are given to justify the theoretical results presented. The self-adaptive mechanism ensures practical implementability without norm estimation, while the two-step inertial acceleration potentially enhances convergence speed. This work extends and generalizes several established results in the existing literature, including one-step inertial methods and Euclidean-space two-step methods.

Keywords: Fixed Point Problem; Iterative algorithm; Nonlinear Mappings; Weak and Strong Convergent.

2020 MSC: 47H09, 47H10, 47J25.

INTRODUCTION

Consider Hilbert spaces \(\mathcal{H}_{1}\) and \(\mathcal{H}_{2}\), with nonempty closed convex subsets \(\mathcal{C \subseteq}\mathcal{H}_{1}\) and \(\mathcal{Q \subseteq}\mathcal{H}_{2}\), and a bounded linear operator \(\mathcal{B:}\mathcal{H}_{1} \rightarrow \mathcal{H}_{2}\). The SFP is then to determine a point

\(x\mathcal{\in C}\text{ such that }\mathcal{B}x\mathcal{\in Q.}\) (1)

This problem has found extensive applications in diverse fields, including signal processing, medical imaging, intensity-modulated radiation therapy, and control theory, making it a cornerstone of modern optimization research (see Byrne, 2003; Ansari & Rehan, 2014; Censor et al., 2006; and references therein).

Traditional methods for solving the SFP, such as the CQ algorithm and its variants, rely on projection-based techniques (see Kilicman & Mohammed, 2016, 2018; Byrne, 2002; Mohammed et al., 2025; Anh et al., 2018; Kesornprom, 2020; Ma & Liu, 2022). One limitation of the CQ algorithm (see Byrne, 2003) is the requirement to compute or estimate the norm of the bounded linear operator at each iteration. This task can be challenging in many scenarios, posing a significant obstacle to the algorithm’s practical implementation. To address this issue, Qu and Xiu (2005) proposed a modified version of the CQ algorithm by incorporating the Armijo line-search procedure to determine the step size. However, their approach involves computing the value of \(h(x)\) multiple times during the line-search step to find the smallest nonnegative integer. This can become computationally expensive, particularly when the set \(\mathcal{Q}\) has a complex structure, making the evaluation of \(h(x)\) and the line-search procedure costly, as reported by Vinh et al. (2023).

To overcome these constraints, gradient-based approaches are commonly used for their computational effectiveness. However, a key limitation of such methods lies in their reliance on an appropriate step-size selection, which often demands prior estimation of the operator norm—a value that can be impractical to determine.

The CQ algorithm operates as a gradient-projection technique applied to the optimization problem

\[\min_{x\mathcal{\in C}}h(x),\]

where \(h(x): = \frac{1}{2}\mathcal{\parallel B}x - P_{\mathcal{Q}}\mathcal{(B}x) \parallel^{2}\) for all \(x \in \mathcal{H}_{1}\). This objective function \(h\) is both convex and differentiable, possessing a Lipschitz-continuous gradient of the form:

\[\nabla h(x) = \mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}x.\]

To circumvent the need for operator norm estimation, López et al. (2012) proposed an alternative iterative scheme. Beginning from an arbitrary \(x_{0} \in \mathcal{H}_{1}\), the algorithm generates subsequent iterates via:

\[x_{n + 1} = P_{\mathcal{C}}\left( I - \tau_{n}\mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B} \right)x_{n},\quad\forall n\mathbb{\in N,}\]

where each step size \(\tau_{n}\) is adaptively computed according to Polyak’s (1987) rule:

\[\tau_{n}: = \rho\frac{h(x_{n})}{\parallel \nabla h(x_{n}) \parallel^{2}},\quad 0 < \rho < 4.\]

This approach avoids the need for repeated evaluations of \(h(x)\) during the line search, thereby reducing computational overhead, especially in cases where \(\mathcal{Q}\) has a complicated structure.

Other step sizes have been considered by numerous scholars. Notable examples include:

Dynamic Step Size by López et al. (2012)

\(\tau_{n}: = \frac{\rho_{n} \parallel (I - P_{\mathcal{Q}}\mathcal{)B}x_{n} \parallel^{2}}{\parallel \mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}x_{n} \parallel^{2}},\quad\rho_{n} \in (0,4).\) (2)

Adaptive Step Size by Anh et al. (2018)

\(\tau_{n}: = \frac{\rho_{n}}{max\{ 1, \parallel \mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}x_{n} \parallel \}},\) (3)

where the sequence \(\{\rho_{n}\}\) satisfies \(\lim_{n \rightarrow \infty}\rho_{n} = 0\) and \(\sum_{n \geq 0}^{}\rho_{n} = \infty\).

The inertial method, first established by Attouch and Alvarez (2001) for analyzing maximal monotone operators, has gained widespread recognition for solving the SFP, inspiring the development of numerous inertial-type algorithms in the literature (see Suantai et al., 2018; Sahu et al., 2021; Shehu & Gibali, 2021; Vinh et al., 2023; Alvarez & Attouch, 2001; Mohammed et al., 2026). In recent years, these techniques have proven to be a highly effective approach for enhancing the convergence rate of iterative procedures.

This motivated Vinh et al. (2023) to propose an inertial approach combined with Polyak’s step size to design an algorithm specifically tailored for solving the SFP. This method incorporates the projection of the inertial term onto the feasible set. Under standard assumptions concerning the operators and parameters involved, they demonstrated the weak convergence of the introduced iterative scheme, which is defined as follows:

\(\left\{ \begin{matrix} w_{n} = P_{\mathcal{C}}(x_{n} + \tau_{n}(x_{n} - x_{n - 1})), \\ x_{n + 1} = w_{n} - \lambda_{n}\nabla h(w_{n}),\quad\forall n \geq 1, \end{matrix} \right.\ \) (4)

where the step size is given by

\[\tau_{n} = \rho_{n}\frac{h(w_{n})}{\parallel \nabla h(w_{n}) \parallel^{2}}.\]

The research presented in [22] demonstrated that the conventional one-step inertial technique

\[w_{n} = x_{n} + \tau(x_{n} - x_{n - 1}),\quad\tau \in \lbrack 0,1),\]

can, in fact, fail to achieve accelerated convergence, a point also discussed in Okeke and Adamu (2023). As noted in Liang (2016), incorporating inertial terms that consider more than just the two most recent iterates can be more effective for acceleration. For instance, a two-step inertial method of the form

\[y_{n} = x_{n} + \alpha(x_{n} - x_{n - 1}) + \delta(x_{n - 1} - x_{n - 2}),\]

with parameters \(\alpha > 0\) and \(\delta < 0\), has been shown to successfully enhance convergence speed. The limitations of the one-step inertial approach are further elaborated upon in Poon and Liang (2019).

In light of the constraints associated with one-step inertial acceleration, Polyak (1987) proposed that employing multi-step inertial approaches could improve the rate of convergence in optimization processes, though no convergence results were provided. Recent studies, such as those in Mohammed et al. (2026), Okeke and Adamu (2023), Poon and Liang (2019), and Adamu et al. (2025), have explored multi-step inertial algorithms. These multi-step approaches have shown significant advantages over one-step inertial algorithms, highlighting the potential of multi-step techniques to overcome the limitations of one-step methods, offering faster and more efficient solutions for optimization problems.

Recently, Adamu et al. (2025) considered the following two-step inertial CQ method for the SFP:

\(\left\{ \begin{matrix} w_{n} = x_{n} + \tau(x_{n} - x_{n - 1}) + \gamma(x_{n - 1} - x_{n - 2}), \\ x_{n + 1} = P_{\mathcal{C}}(w_{n} - \eta_{n}\nabla h(w_{n})),\quad\forall n \geq 1, \end{matrix} \right.\ \) (5)

where \(\eta_{n}: = \left\{ \begin{matrix} \frac{\sigma h(w_{n})}{\parallel \nabla h(w_{n}) \parallel^{2}}, & \parallel \nabla h(w_{n}) \parallel \neq 0, \\ 0, & \text{otherwise}. \end{matrix} \right.\ \)

Under specific conditions applied to the parameters and operators involved, the authors successfully demonstrated the weak convergence of their algorithm. We note, however, that this result was confined to Euclidean space. Building on these results, our study seeks to:

Develop a new two-step inertial algorithm featuring a self-adaptive step size to address the SFP. The key innovation involves the projection of the inertial term onto the constraint set \(\mathcal{C}\), representing a strategic departure from existing methodologies that is expected to enhance convergence of an algorithm.

Establish a theoretical framework for the proposed algorithm under standard assumptions.

Demonstrate how the integrated approach of inertial acceleration and self-adaptivity eliminates the requirement for prior knowledge of the operator norm of the bounded linear mapping.

The organization of this manuscript is outlined below. Section 2 introduces the necessary preliminary concepts. Section 3 presents the proposed algorithm and its convergence analysis. Section 4 provides numerical experiments. Section 5 offers concluding remarks.

PRELIMINARIES

This section introduces foundational definitions and lemmas that will be used throughout the paper.

Let \(u\) be an element of the Hilbert space \(\mathcal{H}_{1}\). The metric projection onto \(\mathcal{Q}\), denoted by \(P_{\mathcal{Q}}u\), is the unique point in \(\mathcal{Q}\) that minimizes the distance from \(u\) to the set. This is characterized by the following condition:

\[\parallel u - P_{\mathcal{Q}}u \parallel = \min_{v\mathcal{\in Q}} \parallel u - v \parallel .\]

Remark 1. The metric projection operator \(P_{\mathcal{Q}}:\mathcal{H}_{1}\mathcal{\rightarrow Q}\) possesses the following key properties for all \(u \in \mathcal{H}_{1}\) and all \(v\mathcal{\in Q}\):

\(\langle u - P_{\mathcal{Q}}u,\mspace{6mu} v - P_{\mathcal{Q}}u\rangle \leq 0\) (variational inequality);

\(\parallel u - P_{\mathcal{Q}}u \parallel^{2} \leq \langle u - P_{\mathcal{Q}}u,\mspace{6mu} u - v\rangle\);

\(\parallel P_{\mathcal{Q}}u - v \parallel^{2} \leq \parallel u - v \parallel^{2} - \parallel u - P_{\mathcal{Q}}u \parallel^{2}\).

Definition 2. Let \(f:\mathcal{H}_{1}\mathbb{\rightarrow R}\) be a convex function on a Hilbert space \(\mathcal{H}_{1}\). The function \(f\) is subdifferentiable at \(u \in \mathcal{H}_{1}\) if its subdifferential at \(u\), \(\partial f(u) = \{ w \in \mathcal{H}_{1} \mid f(v) \geq f(u) + \langle w,v - u\rangle\mspace{6mu}\text{ for all }v \in \mathcal{H}_{1}\},\) is nonempty. Any element \(w \in \partial f(u)\) is called a subgradient, and the defining inequality is known as the subgradient inequality.

Remark 3. If a convex function \(f\) is Fréchet differentiable at \(u\), its subdifferential consists solely of its gradient; that is, \(\partial f(u) = \{\nabla f(u)\}\).

Definition 4. A function \(f:\mathcal{H}_{1}\mathbb{\rightarrow R}\) is lower semicontinuous at a point \(u\) if for every sequence \(\{ u_{n}\}\) in \(\mathcal{H}_{1}\) converging to \(u\), \(f(u) \leq \liminf_{n \rightarrow \infty}f(u_{n}).\) The function is said to be lower semicontinuous on \(\mathcal{H}_{1}\) if this property holds at every point in the space.

Lemma 5 (Okeke and Adamu (2023)). Let \(u,v,w \in \mathcal{H}_{1}\) and let \(\alpha,\gamma\mathbb{\in R}\). Then the following identities hold:

\(2\langle u,v\rangle = \parallel u \parallel^{2} + \parallel v \parallel^{2} - \parallel u - v \parallel^{2} = \parallel u + v \parallel^{2} - \parallel u \parallel^{2} - \parallel v \parallel^{2}\);

\[\begin{aligned} \parallel (1 + \alpha)u - (\alpha - \gamma)v - \gamma w \parallel^{2} & = (1 + \alpha) \parallel u \parallel^{2} - (\alpha - \gamma) \parallel v \parallel^{2} - \gamma \parallel w \parallel^{2} \\ & \quad + (1 + \alpha)(\alpha - \gamma) \parallel u - v \parallel^{2} \\ & \quad + \gamma(1 + \alpha) \parallel u - w \parallel^{2} - \gamma(\alpha - \gamma) \parallel v - w \parallel^{2}. \end{aligned}\]

Lemma 6 Aubin (2013). Let \(F:\mathcal{H}_{1}\mathbb{\rightarrow R}\) be a function defined by \(F(u): = \frac{1}{2}\mathcal{\parallel B}u - P_{\mathcal{Q}}\mathcal{(B}u) \parallel^{2},\quad\forall u \in \mathcal{H}_{1}.\) Then the following properties hold:

\(F\) is convex and differentiable;

\(F\) is weakly lower semicontinuous (w-lsc) on \(\mathcal{H}_{1}\);

\(\nabla F(u) = \mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}u\) for all \(u \in \mathcal{H}_{1}\);

\(\nabla F\) is \(\frac{1}{\mathcal{\parallel B} \parallel^{2}}\)-inverse strongly monotone, that is,

\[\langle\nabla F(u) - \nabla F(v),u - v\rangle \geq \frac{1}{\mathcal{\parallel B} \parallel^{2}} \parallel \nabla F(u) - \nabla F(v) \parallel^{2},\quad\forall u,v \in \mathcal{H}_{1}.\]

Lemma 7 (Opial (1967)). Let \(\mathcal{S}\) be a nonempty subset of a Hilbert space \(\mathcal{H}_{1}\), and let \(\{ u_{n}\}_{n\mathbb{\in N}}\) be a sequence in \(\mathcal{H}_{1}\) satisfying:

For each \(z\mathcal{\in S}\), \(\lim_{n \rightarrow \infty} \parallel u_{n} - z \parallel\) exists;

\(\Omega_{w}(u_{n}\mathcal{) \subseteq S}\), where \(\Omega_{w}(u_{n})\) denotes the weak \(\omega\)-limit set of \(\{ u_{n}\}\).

Then \(\{ u_{n}\}\) converges weakly to some \(u^{*}\mathcal{\in S}\).

MAIN RESULTS

This section outlines the principal contributions of this paper. Throughout, we denote by \(\mathcal{S}\) the solution set of the split feasibility problem stated in (1). That is,

\(\mathcal{S =}\left\{ x\mathcal{\in C}\text{ such that }\mathcal{B}x\mathcal{\in Q} \right\}.\) (6)

Let \(P_{\mathcal{C}}:\mathcal{H}_{1}\mathcal{\rightarrow C}\) and \(P_{\mathcal{Q}}:\mathcal{H}_{2}\mathcal{\rightarrow Q}\) denote the metric projection operators onto \(\mathcal{C}\) and \(\mathcal{Q}\), respectively. Furthermore, let \(\mathcal{B:}\mathcal{H}_{1} \rightarrow \mathcal{H}_{2}\) be a bounded linear operator with adjoint \(\mathcal{B}^{*}\).

Consider the sequence \(\{ x_{n}\}\) generated by the following iterative scheme:

Algorithm 1: Two-step Inertial Self-Adaptive Gradient Algorithm (TISGA)

Initialization: Choose arbitrary initial points \(x_{0},x_{1},x_{2} \in \mathcal{H}_{1}\). Let:

\(\rho \in (0,4)\),

\(\gamma_{1} \in (0,\frac{1}{2}\rbrack\), \(\gamma_{2} \leq 0\), such that \(\gamma_{1} + \gamma_{2} \geq 0\),

\(\beta > 0\) such that \(\beta \leq min\left\{ \frac{(4 - \rho)(1 - \gamma_{1} + \gamma_{2})}{2\gamma_{1}\rho},\frac{(4 - \rho)A}{\rho(1 + \gamma_{1})(\gamma_{1} - \gamma_{2})},\frac{(4 - \rho)B}{\rho(1 + \gamma_{1} - \gamma_{2})(\gamma_{1} - \gamma_{2})} \right\}\) where \(A = (\gamma_{1}^{2} + \gamma_{1}\gamma_{2} - 2\gamma_{1} + \gamma_{2} + 1)\), \(B = (\gamma_{1}^{2} + \gamma_{2}^{2} + 2\gamma_{1}\gamma_{2} - 2\gamma_{1} + 2\gamma_{2} + 1).\)

Set \(n \leftarrow 2\).

Given \(x_{n - 2},x_{n - 1},x_{n}\).

repeat

Step 1: Compute inertial point and project:

\[v_{n} = x_{n} + \gamma_{1}(x_{n} - x_{n - 1}) + \gamma_{2}(x_{n - 1} - x_{n - 2})\]

\[w_{n} = P_{\mathcal{C}}(v_{n})\]

Step 2: Compute gradient and step size:

\[\nabla h(w_{n}) = \mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}w_{n}\]

If \(\nabla h\left( w_{n} \right) \neq 0,\) then

\(\lambda_{n} \leftarrow \frac{\rho \cdot h\left( w_{n} \right)}{\parallel \nabla h\left( w_{n} \right) \parallel^{2}}\), where \(h\left( w_{n} \right) = \left\| \left( I - P_{\mathcal{Q}} \right)\mathcal{B}w_{n} \right\|^{2}\),

lse

\(\lambda_{n} \leftarrow 0\)

\[end\ if\]

Step 3: Update:

\[x_{n + 1} = w_{n} - \beta\lambda_{n}\nabla h(w_{n})\]

Step 4: \(n \leftarrow n + 1\)

Until the sequence converges.

Remark 8. Key features of TISGA:

The inertial term is projected onto \(\mathcal{C}\) in Step 1, differing from existing approaches.

When \(\gamma_{2} = 0\), the algorithm reduces to an inertial method proposed by Alvarez and Attouch (2001).

When \(\beta = 1\), \(\gamma_{1} = 0\), and \(\gamma_{2} = 0\), the algorithm reduces to the standard gradient method.

Lemma 9. Let \(\{ x_{n}\}\) be the sequence generated by Algorithm 1. Then, for each \(q\mathcal{\in S}\), \(\lim_{n \rightarrow \infty}{\parallel x_{n} - q \parallel \ }\)exist.

Proof. Let \(q\mathcal{\in S}\). Then

\[\begin{aligned} \parallel x_{n + 1} - q \parallel^{2} & = \parallel w_{n} - \beta\lambda_{n}\nabla h(w_{n}) - q \parallel^{2} \\ & = \parallel (1 - \beta)(w_{n} - q) + \beta(w_{n} - \lambda_{n}\nabla h(w_{n}) - q) \parallel^{2} \\ & \leq (1 - \beta) \parallel w_{n} - q \parallel^{2} + \beta \parallel w_{n} - \lambda_{n}\nabla h(w_{n}) - q \parallel^{2} \\ & = \parallel w_{n} - q \parallel^{2} - 2\beta\lambda_{n}\langle\nabla h(w_{n}),w_{n} - q\rangle + \beta\lambda_{n}^{2} \parallel \nabla h(w_{n}) \parallel^{2}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7) \end{aligned}\]

Next, we analyze the inner product term \(\langle\nabla h(w_{n}),w_{n} - q\rangle\). Recall that \(\nabla h(w_{n}) = \mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}w_{n}\) and \(h(w_{n}) = \frac{1}{2} \parallel (I - P_{\mathcal{Q}}\mathcal{)B}w_{n} \parallel^{2}\).

For any \(q\mathcal{\in S}\), we have \(\mathcal{B}q\mathcal{\in Q}\), so \(P_{\mathcal{Q}}\mathcal{(B}q\mathcal{) = B}q\). Using this and the firmly nonexpansive property of \(I - P_{\mathcal{Q}}\), we obtain:

\[\begin{aligned} \langle\nabla h(w_{n}),w_{n} - q\rangle & = \langle\mathcal{B}^{*}(I - P_{\mathcal{Q}}\mathcal{)B}w_{n},w_{n} - q\rangle \\ & = \langle(I - P_{\mathcal{Q}}\mathcal{)B}w_{n}\mathcal{,B}w_{n}\mathcal{- B}q\rangle \\ & = \langle(I - P_{\mathcal{Q}}\mathcal{)B}w_{n} - (I - P_{\mathcal{Q}}\mathcal{)B}q\mathcal{,B}w_{n}\mathcal{- B}q\rangle\quad(\text{since }(I - P_{\mathcal{Q}}\mathcal{)B}q = 0) \\ & \geq \parallel (I - P_{\mathcal{Q}}\mathcal{)B}w_{n} \parallel^{2} \\ & = 2h(w_{n}).\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8) \end{aligned}\]

Substituting inequality (7) into equation (8) yields:

\[\begin{aligned} \parallel x_{n + 1} - q \parallel^{2} & \leq \parallel w_{n} - q \parallel^{2} - 4\beta\lambda_{n}h(w_{n}) + \beta\lambda_{n}^{2} \parallel \nabla h(w_{n}) \parallel^{2} \\ & = \parallel w_{n} - q \parallel^{2} - \beta\rho(4 - \rho)\frac{h^{2}(w_{n})}{\parallel \nabla h(w_{n}) \parallel^{2}} \\ & \leq \parallel v_{n} - q \parallel^{2} - \frac{(4 - \rho)}{\beta\rho}\frac{(\beta\rho)^{2}h^{2}(w_{n})}{\parallel \nabla h\left( w_{n} \right) \parallel^{2}}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (9) \end{aligned}\]

By Lemma 5, we have

\[\begin{aligned} \parallel v_{n} - q \parallel^{2} & = \parallel (1 + \gamma_{1})(x_{n} - q) - (\gamma_{1} - \gamma_{2})(x_{n - 1} - q) - \gamma_{2}(x_{n - 2} - q) \parallel^{2} \\ & = (1 + \gamma_{1}) \parallel x_{n} - q \parallel^{2} - (\gamma_{1} - \gamma_{2}) \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \\ & \quad + (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad + (1 + \gamma_{1})\gamma_{2} \parallel x_{n} - x_{n - 2} \parallel^{2} - \gamma_{2}(\gamma_{1} - \gamma_{2}) \parallel x_{n - 1} - x_{n - 2} \parallel^{2}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (10) \end{aligned}\]

On the other hand,

\[\begin{matrix} \frac{(\beta\rho)^{2}h^{2}(w_{n})}{\parallel \nabla h(w_{n}) \parallel^{2}} & = \parallel x_{n + 1} - w_{n} \parallel^{2}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (11) \\ & = \parallel x_{n + 1} - P_{\mathcal{C}}(v_{n}) \parallel^{2} \\ & = \parallel x_{n + 1} - x_{n} - \gamma_{1}(x_{n} - x_{n - 1}) - \gamma_{2}(x_{n - 1} - x_{n - 2}) \parallel^{2} \\ & = \parallel x_{n + 1} - x_{n} \parallel^{2} - 2\gamma_{1}\langle x_{n + 1} - x_{n},x_{n} - x_{n - 1}\rangle \\ & \quad + \gamma_{1}^{2} \parallel x_{n} - x_{n - 1} \parallel^{2} - 2\gamma_{2}\langle x_{n + 1} - x_{n},x_{n - 1} - x_{n - 2}\rangle \\ & \quad + \gamma_{2}^{2} \parallel x_{n - 1} - x_{n - 2} \parallel^{2} + 2\gamma_{2}\gamma_{1}\langle x_{n} - x_{n - 1},x_{n - 1} - x_{n - 2}\rangle \\ & \geq \parallel x_{n + 1} - x_{n} \parallel^{2} - 2\gamma_{1} \parallel x_{n + 1} - x_{n} \parallel \parallel x_{n} - x_{n - 1} \parallel \\ & \quad + \gamma_{1}^{2} \parallel x_{n} - x_{n - 1} \parallel^{2} - 2|\gamma_{2}| \parallel x_{n + 1} - x_{n} \parallel \parallel x_{n - 1} - x_{n - 2} \parallel \\ & \quad + \gamma_{2}^{2} \parallel x_{n - 1} - x_{n - 2} \parallel^{2} - 2|\gamma_{2}\gamma_{1}| \parallel x_{n} - x_{n - 1} \parallel \parallel x_{n - 1} - x_{n - 2} \parallel \\ & \geq \parallel x_{n + 1} - x_{n} \parallel^{2} - \gamma_{1}\left\lbrack \parallel x_{n + 1} - x_{n} \parallel^{2} + \parallel x_{n} - x_{n - 1} \parallel^{2} \right\rbrack \\ & \quad + \gamma_{1}^{2} \parallel x_{n} - x_{n - 1} \parallel^{2} - |\gamma_{2}|\left\lbrack \parallel x_{n + 1} - x_{n} \parallel^{2} + \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \right\rbrack \\ & \quad + \gamma_{2}^{2} \parallel x_{n - 1} - x_{n - 2} \parallel^{2} - |\gamma_{2}\gamma_{1}|\left\lbrack \parallel x_{n} - x_{n - 1} \parallel^{2} + \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \right\rbrack \\ & = (1 - |\gamma_{2}| - \gamma_{1}) \parallel x_{n + 1} - x_{n} \parallel^{2} + (\gamma_{1}^{2} - \gamma_{1} - |\gamma_{2}\gamma_{1}|) \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad + \left( \gamma_{2}^{2} - \left| \gamma_{2} \right| - \left| \gamma_{2}\gamma_{1} \right| \right) \parallel x_{n - 1} - x_{n - 2} \parallel^{2}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (12) \end{matrix}\]

Thus, by equations (9), (10), and (12), we obtain

\[\begin{matrix} \parallel x_{n + 1} - q \parallel^{2} & \leq (1 + \gamma_{1}) \parallel x_{n} - q \parallel^{2} - (\gamma_{1} - \gamma_{2}) \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \\ & \quad + (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) \parallel x_{n} - x_{n - 1} \parallel^{2} + \gamma_{2}(1 + \gamma_{1}) \parallel x_{n} - x_{n - 2} \parallel^{2} \\ & \quad - \gamma_{2}(\gamma_{1} - \gamma_{2}) \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & \quad - \frac{(4 - \rho)}{\beta\rho}(1 - |\gamma_{2}| - \gamma_{1}) \parallel x_{n + 1} - x_{n} \parallel^{2} \\ & \quad - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - \gamma_{1} - |\gamma_{2}\gamma_{1}|) \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad - \frac{(4 - \rho)}{\beta\rho}(\gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|) \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & = (1 + \gamma_{1}) \parallel x_{n} - q \parallel^{2} - (\gamma_{1} - \gamma_{2}) \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \\ & \quad + \gamma_{2}(1 + \gamma_{1}) \parallel x_{n} - x_{n - 2} \parallel^{2} \\ & \quad + \left\lbrack (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - \gamma_{1} - |\gamma_{2}\gamma_{1}|) \right\rbrack \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad - \left\lbrack \gamma_{2}(\gamma_{1} - \gamma_{2}) + \frac{(4 - \rho)}{\beta\rho}(\gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|) \right\rbrack \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & \quad - \frac{(4 - \rho)}{\beta\rho}(1 - |\gamma_{2}| - \gamma_{1}) \parallel x_{n + 1} - x_{n} \parallel^{2} \\ & \leq (1 + \gamma_{1}) \parallel x_{n} - q \parallel^{2} - (\gamma_{1} - \gamma_{2}) \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \\ & \quad + \left\lbrack (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - \gamma_{1} - |\gamma_{2}\gamma_{1}|) \right\rbrack \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad - \left\lbrack \gamma_{2}(\gamma_{1} - \gamma_{2}) + \frac{(4 - \rho)}{\beta\rho}(\gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|) \right\rbrack \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & \quad - \frac{(4 - \rho)}{\beta\rho}\left( 1 - \left| \gamma_{2} \right| - \gamma_{1} \right) \parallel x_{n + 1} - x_{n} \parallel^{2}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (13) \end{matrix}\]

By rearranging equation (13), we get

\[\begin{matrix} & \parallel x_{n + 1} - q \parallel^{2} - \gamma_{1} \parallel x_{n} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 1} - q \parallel^{2} + \frac{(4 - \rho)}{\beta\rho}(1 - |\gamma_{2}| - \gamma_{1}) \parallel x_{n + 1} - x_{n} \parallel^{2} \\ & \leq \parallel x_{n} - q \parallel^{2} - \gamma_{1} \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} + \frac{(4 - \rho)}{\beta\rho}(1 - |\gamma_{2}| - \gamma_{1}) \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad + \left\lbrack (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1) \right\rbrack \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad - \left\lbrack \gamma_{2}\left( \gamma_{1} - \gamma_{2} \right) + \frac{(4 - \rho)}{\beta\rho}\left( \gamma_{2}^{2} - \left| \gamma_{2} \right| - \left| \gamma_{2}\gamma_{1} \right| \right) \right\rbrack \parallel x_{n - 1} - x_{n - 2} \parallel^{2}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (14) \end{matrix}\]

Thus , we deduce that

\[\begin{matrix} \Delta_{n + 1} - \Delta_{n} & \leq \left\lbrack (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1) \right\rbrack \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad - \left\lbrack \gamma_{2}(\gamma_{1} - \gamma_{2}) + \frac{(4 - \rho)}{\beta\rho}(\gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|) \right\rbrack \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & = - \left\lbrack (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1) \right\rbrack \\ & \quad \times \left( \parallel x_{n - 1} - x_{n - 2} \parallel^{2} - \parallel x_{n} - x_{n - 1} \parallel^{2} \right) \\ & \quad + \left\lbrack (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}) - \frac{(1 - \rho)}{\beta\rho}(\gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1) \right.\ \\ & \quad\left. \ - \gamma_{2}(\gamma_{1} - \gamma_{2}) - \frac{(1 - \rho)}{\beta\rho}(\gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|) \right\rbrack \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & = k_{1}\left\lbrack \parallel x_{n - 1} - x_{n - 2} \parallel^{2} - \parallel x_{n} - x_{n - 1} \parallel^{2} \right\rbrack - k_{2} \parallel x_{n - 1} - x_{n - 2} \parallel^{2}, \end{matrix}\]

where

\[\begin{matrix} \Delta_{n} & {: =} \parallel x_{n} - q \parallel^{2} - \gamma_{1} \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \\ & \quad + \frac{(4 - \rho)}{\beta\rho}\left( 1 - \left| \gamma_{2} \right| - \gamma_{1} \right) \parallel x_{n} - x_{n - 1} \parallel^{2},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (15) \\ k_{1} & ≔ - \left\lbrack \left( 1 + \gamma_{1} \right)\left( \gamma_{1} - \gamma_{2} \right) - \frac{(4 - \rho)}{\beta\rho}\left( \gamma_{1}^{2} - 2\gamma_{1} - \left| \gamma_{2}\gamma_{1} \right| - \left| \gamma_{2} \right| + 1 \right) \right\rbrack\ ,\ \ and\ \ \ \ \ \ \ \ \ \ \ \ \ \ (16)\text{ } \\ k_{2} & {: =} - \left\lbrack (\gamma_{1} - \gamma_{2})(1 + \gamma_{1} - \gamma_{2}) - \frac{(4 - \rho)}{\beta\rho}(\gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1) \right.\ \\ & \quad\left. \ - \frac{(4 - \rho)}{\beta\rho}(\gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|) \right\rbrack.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (17) \end{matrix}\]

Therefore, we have

\[\begin{matrix} \Delta_{n + 1} - \Delta_{n} & \leq - k_{1}\left( \parallel x_{n - 1} - x_{n - 2} \parallel^{2} - \parallel x_{n} - x_{n - 1} \parallel^{2} \right) - k_{2} \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \\ & = k_{1}\left( \parallel x_{n} - x_{n - 1} \parallel^{2} - \parallel x_{n - 1} - x_{n - 2} \parallel^{2} \right) - k_{2} \parallel x_{n - 1} - x_{n - 2} \parallel^{2}.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (18) \end{matrix}\]

Claim \(\Delta_{n} \geq 0\) for all \(n \geq 1\), \(k_{1} \geq 0\), and \(k_{2} \geq 0\).

Proof: Verification of \(\Delta_{n} \geq 0\):

Since \(\gamma_{2} \leq 0\), we have \(|\gamma_{2}| = - \gamma_{2}\). Using the inequality \(\parallel x_{n - 1} - q \parallel^{2} \leq 2 \parallel x_{n} - x_{n - 1} \parallel^{2} + 2 \parallel x_{n} - q \parallel^{2}\) (from \(\parallel a - b \parallel^{2} \leq 2 \parallel a - c \parallel^{2} + 2 \parallel c - b \parallel^{2}\)), by (15), we obtain:

\[\begin{matrix} \Delta_{n} & \geq \parallel x_{n} - q \parallel^{2} - \gamma_{1}\left( 2 \parallel x_{n} - x_{n - 1} \parallel^{2} + 2 \parallel x_{n} - q \parallel^{2} \right) - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \\ & \quad + \frac{(4 - \rho)}{\beta\rho}(1 + \gamma_{2} - \gamma_{1}) \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & = (1 - 2\gamma_{1}) \parallel x_{n} - q \parallel^{2} + \left\lbrack \frac{(4 - \rho)}{\beta\rho}(1 + \gamma_{2} - \gamma_{1}) - 2\gamma_{1} \right\rbrack \parallel x_{n} - x_{n - 1} \parallel^{2} \\ & \quad - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2}. \end{matrix}\]

The condition \(\gamma_{1} \leq \frac{1}{2}\) ensures \(1 - 2\gamma_{1} \geq 0\). The parameter bound on \(\beta\):

\[\beta \leq \frac{(4 - \rho)(1 - \gamma_{1} + \gamma_{2})}{2\gamma_{1}\rho}\]

is equivalent to \(\frac{(4 - \rho)}{\beta\rho}(1 + \gamma_{2} - \gamma_{1}) \geq 2\gamma_{1}\), which guarantees the coefficient of \(\parallel x_{n} - x_{n - 1} \parallel^{2}\) is non-negative. Since \(- \gamma_{2} \geq 0\) (as \(\gamma_{2} \leq 0\)), we conclude \(\Delta_{n} \geq 0\).

Verification of \(k_{1} > 0\):

Since

\[\beta < \frac{(4 - \rho)A}{\rho(1 + \gamma_{1})(\gamma_{1} - \gamma_{2})},\]

where \(A = \gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1\), it follows that

\[\frac{(4 - \rho)\beta}{\rho}A > (1 + \gamma_{1})(\gamma_{1} - \gamma_{2}).\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (19)\]

By equations (16) and (19), we see that \(k_{1} > 0\).

Verification of \(k_{2} > 0\):

Since

\[\beta < \frac{(4 - \rho)B}{\rho(1 + \gamma_{1} - \gamma_{2})(\gamma_{1} - \gamma_{2})},\]

where \(B = \gamma_{1}^{2} - 2\gamma_{1} - |\gamma_{2}\gamma_{1}| - |\gamma_{2}| + 1 + \gamma_{2}^{2} - |\gamma_{2}| - |\gamma_{2}\gamma_{1}|\), then

\[\frac{(4 - \rho)\beta}{\rho}B > (\gamma_{1} - \gamma_{2})(1 + \gamma_{1} - \gamma_{2}).\]

This, coupled with equation (17), shows that \(k_{2} > 0\).

From equation (18), we deduce that

\[\begin{matrix} \Delta_{n + 1} + k_{1} \parallel x_{n} - x_{n - 1} \parallel^{2} \leq \Delta_{n} + k_{1} \parallel x_{n - 1} - x_{n - 2} \parallel^{2}. \end{matrix}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (20)\]

Let \(\Gamma_{n}: = \Delta_{n} + k_{1} \parallel x_{n - 1} - x_{n - 2} \parallel^{2}\). Then \(\Gamma_{n} \geq 0\) for all \(n \geq 1\), then, it follows from (20) that

\[\begin{matrix} \Gamma_{n + 1} \leq \Gamma_{n}. \end{matrix}\]

Thus, \(\Gamma_{n}\) is a monotone decreasing sequence and bounded below by zero, so \(\underset{n \rightarrow \infty}{\text{lim}}\Gamma_{n}\) exists. Consequently, from (20),

\[\begin{matrix} \underset{n \rightarrow \infty}{\text{lim}}k_{1} \parallel x_{n - 1} - x_{n - 2} \parallel^{2} = 0. \end{matrix}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (21)\]

Hence,

\[\begin{matrix} \underset{n \rightarrow \infty}{\text{lim}} \parallel x_{n - 1} - x_{n - 2} \parallel^{2} = 0. \end{matrix}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (22)\]

As a result,

\[\begin{matrix} \parallel x_{n + 1} - v_{n} \parallel & = \parallel x_{n + 1} - x_{n} - \gamma_{1}(x_{n} - x_{n - 1}) - \gamma_{2}(x_{n - 1} - x_{n - 2}) \parallel \\ & \leq \parallel x_{n + 1} - x_{n} \parallel + \gamma_{1} \parallel x_{n} - x_{n - 1} \parallel + \left| \gamma_{2} \right| \parallel x_{n - 1} - x_{n - 2} \parallel \rightarrow 0. \end{matrix}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (23)\]

Since \(\underset{n \rightarrow \infty}{\text{lim}} \parallel x_{n + 1} - x_{n} \parallel = 0\), we have

\(\parallel x_{n} - v_{n} \parallel \leq \parallel x_{n} - x_{n + 1} \parallel + \parallel x_{n + 1} - v_{n} \parallel \rightarrow 0\quad\text{as }n \rightarrow \infty.\) (24)

Since \(\underset{n \rightarrow \infty}{\text{lim}}\Gamma_{n}\) exists, \(\underset{n \rightarrow \infty}{\text{lim}}\Delta_{n}\) exists. Thus,

\[\underset{n \rightarrow \infty}{\text{lim}}\left\lbrack \parallel x_{n} - q \parallel^{2} - \gamma_{1} \parallel x_{n - 1} - q \parallel^{2} - \gamma_{2} \parallel x_{n - 2} - q \parallel^{2} \right\rbrack\text{ exists},\]

this turns to implies that

\[\underset{n \rightarrow \infty}{\text{lim}}\left\lbrack \parallel x_{n} - q \parallel \right\rbrack\text{ exists}.\]

This completes the proof.  ◻

Theorem 10. Let \(\{ x_{n}\}\) be a sequence generated by Algorithm 1, and assume that the solution set \(\mathcal{S \neq \varnothing}\). Then \(\{ x_{n}\}\) converges weakly to \(x\mathcal{\in S}\).

Proof. From equation (11), we have

\[\lim_{n \rightarrow \infty} \parallel x_{n + 1} - w_{n} \parallel = \lim_{n \rightarrow \infty}\frac{\beta\rho h(w_{n})}{\parallel \nabla h(w_{n}) \parallel} = 0.\]

Since \(\beta\rho > 0\), it follows that

\[\begin{matrix} \lim_{n \rightarrow \infty}\frac{h(w_{n})}{\parallel \nabla h(w_{n}) \parallel} = 0. \end{matrix}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (25)\]

It remains to show that \(\omega_{w}(x_{n}\mathcal{) \subset S}\). Let \(q \in \omega_{w}(x_{n})\) be arbitrary. Since \(\lim_{n \rightarrow \infty} \parallel x_{n} - q \parallel\) exists, \(\{ x_{n}\}\) is bounded. This implies that there exists a subsequence \(\{ x_{n_{l}}\}\) converging weakly to \(q\). From (25), \(w_{n_{l}} \rightharpoonup q\mathcal{\in C}\).

Since \(\nabla h\) is Lipschitz continuous,

\[\parallel \nabla h(w_{n}) \parallel = \parallel \nabla h(w_{n}) - \nabla h(q\mathcal{) \parallel \leq \parallel B} \parallel^{2} \parallel w_{n} - q \parallel ,\quad\forall q\mathcal{\in S.}\]

Hence, \(\{\nabla h(w_{n})\}\) is bounded. Equation (25) then implies

\[\lim_{n \rightarrow \infty}h(w_{n}) = 0.\]

By the weak lower semi-continuity of \(h\) (Lemma 6 (b)),

\[0 \leq h(q) \leq \liminf_{n \rightarrow \infty}h(w_{n_{l}}) = \lim_{n \rightarrow \infty}h(w_{n}) = 0.\]

Therefore, \(h(q) = 0\), i.e., \(\mathcal{B}q\mathcal{\in Q}\) (see Lemma 6) and \(q\mathcal{\in S}\). Since \(q\) was arbitrary, \(\omega_{w}(x_{n}\mathcal{) \subset S}\). The result now follows from Lemma 7.  ◻

Corollary 11. Suppose that \(P_{\mathcal{C}}:\mathcal{H}_{1}\mathcal{\rightarrow C}\) and \(P_{\mathcal{Q}}:\mathcal{H}_{2}\mathcal{\rightarrow Q}\) are the metric projections onto the nonempty, closed, and convex sets \(\mathcal{C \subset}\mathcal{H}_{1}\) and \(\mathcal{Q \subset}\mathcal{H}_{2}\), respectively. The operator \(\mathcal{B:}\mathcal{H}_{1} \rightarrow \mathcal{H}_{2}\) is a linear and bounded operator with adjoint \(\mathcal{B}^{*}\). Then the sequence \(\{ x_{n}\}\) defined below converges weakly to \(x\mathcal{\in S}\).

Algorithm 2: Inertial Self-adaptive Gradient Algorithm (ISGA)

Initialization: Choose arbitrary initial points \(x_{0},x_{1} \in \mathcal{H}_{1}\). Let: \(\rho \in (0,4)\), \(\gamma_{1} \in (0,\frac{1}{2}\rbrack\), \(\beta > 0\) such that \(\beta \leq min\left\{ \frac{(4 - \rho)(1 - \gamma_{1})}{2\gamma_{1}\rho},\frac{(4 - \rho)(\gamma_{1}^{2} - \gamma_{1} + 1)}{\rho(1 + \gamma_{1})\gamma_{1}} \right\}\) if \(\gamma_{1} > 0\); else, any \(\beta > 0\).

Set \(n \leftarrow 1\).

Given \(x_{n - 1},x_{n}\), compute inertial terms:

repeat

Step 1: Compute inertial point and project:

\[v_{n} = x_{n} + \gamma_{1}\left( x_{n} - x_{n - 1} \right)\]

\[w_{n} = P_{\mathcal{C}}(v_{n})\]

Step 2: Compute gradient and step size:

\[\nabla h\left( w_{n} \right) = \mathcal{B}^{*}\left( I - P_{\mathcal{Q}} \right)\mathcal{B}w_{n}\]

If \(\nabla h\left( w_{n} \right) \neq 0,\) then

\(\lambda_{n} \leftarrow \frac{\rho \cdot h\left( w_{n} \right)}{\parallel \nabla h\left( w_{n} \right) \parallel^{2}}\), where \(h\left( w_{n} \right) = \left\| \left( I - P_{\mathcal{Q}} \right)\mathcal{B}w_{n} \right\|^{2}\),

else

\(\lambda_{n} \leftarrow 0\)

end if

Step 3: Update: \(x_{n + 1} = w_{n} - \beta\lambda_{n}\nabla h(w_{n})\)

Step 4: \(n \leftarrow n + 1\)

Until the sequence converges.

NUMERICAL EXPERIMENTS

All experiments were performed in MATLAB R2024a on a standard laptop (Intel Core i7, 2.8 GHz, 16 GB RAM). We implemented five algorithms for comparison, summarized in Table 1.

Table 1: Algorithms compared in numerical experiments.

Acronym Method Reference
CQ Classical CQ algorithm Byrne (2003)
López Self-adaptive CQ (no inertial) López et al. (2012)
Vinh One-step inertial + self-adaptive Vinh et al. (2023)
Adamu Two-step inertial (Euclidean) Adamu et al. (2025)
TISGA Proposed two-step inertial self-adaptive This paper

Table 1 presents the five algorithms compared in our numerical experiments: the classical CQ algorithm (Byrne, 2003), the self-adaptive CQ method without inertial acceleration (López et al., 2012), the one-step inertial self-adaptive method (Vinh et al., 2023), the two-step inertial Euclidean method (Adamu et al., 2025), and our proposed TISGA.

Problem 12 (Random Linear SFP in \(\mathbb{R}^{n}\)). \(\mathcal{H}_{1} = \mathbb{R}^{500}\), \(\mathcal{H}_{2} = \mathbb{R}^{200}\), \(\mathcal{C = \{}x \in \mathbb{R}^{500} \mid 0 \leq x_{i} \leq 10\}\) (box constraints), \(\mathcal{Q = \{}y \in \mathbb{R}^{200} \mid \parallel y - y_{0} \parallel_{2} \leq 5\}\) (ball centered at random \(y_{0}\)), \(\mathcal{B \in}\mathbb{R}^{200 \times 500}\): random Gaussian matrix normalized so \(\mathcal{\parallel B \parallel \approx}1\), True solution \(x^{*}\mathcal{\in C}\) chosen randomly, then \(\mathcal{B}x^{*}\mathcal{\in Q}\), Initial: \(x_{0} = x_{1} = x_{2} = 0 \in \mathbb{R}^{500}\), Stopping: \(\parallel x_{n + 1} - x_{n} \parallel < 10^{- 6}\) or max 5000 iterations.

Table 2: Parameter settings for Problem 12.

Algorithm Parameters
CQ \[\tau = 1\mathcal{/ \parallel B} \parallel^{2}\]
López \[\rho = 3.5\]
Vinh \[\rho = 3.5,\mspace{6mu}\gamma = 0.3\]
Adamu \[\rho = 3.5,\mspace{6mu}\gamma_{1} = 0.3,\mspace{6mu}\gamma_{2} = - 0.05\]
TISGA \[\rho = 3.5,\mspace{6mu}\gamma_{1} = 0.3,\mspace{6mu}\gamma_{2} = - 0.05,\mspace{6mu}\beta = 0.85\]

Table 2 reports the parameter settings for Problem 12, where we set ρ=3.5 for all adaptive methods, γ₁=0.3 for Vinh et al., Adamu et al. (2025), and TISGA, γ₂=-0.05 for Adamu et al. (2025) and TISGA, and β=0.85 for TISGA.

Table 3: Numerical results for Problem 12 (average over 50 random instances).

Algorithm Iterations CPU Time (s) Final \(h(x)\) Converged (%)
CQ 4872 2.34 \[1.2 \times 10^{- 7}\] 68%
López 1123 0.56 \[8.3 \times 10^{- 8}\] 100%
Vinh 891 0.48 \[7.1 \times 10^{- 8}\] 100%
Adamu 734 0.41 \[6.5 \times 10^{- 8}\] 100%
TISGA 412 0.23 \[5.8 \times 10^{- 8}\] 100%

As shown in Table 3, TISGA requires only 412 iterations on average compared to 734 for Adamu et al. (2025), 891 for Vinh et al. (2023), 1123 for López et al. (2012), and 4872 for the classical CQ algorithm. The CPU time for TISGA is 0.23 seconds, approximately half that of Adamu et al. (0.41 s) and less than one-tenth that of CQ (2.34 s). All methods except CQ achieved 100% convergence, while CQ converged in only 68% of instances.

Figure 1: Convergence of \(\mathbf{h(}\mathbf{x}_{\mathbf{n}}\mathbf{)}\) vs. iterations for Problem 12 (log scale). TISGA achieves the fastest decay.

Remark 13. Based on Table 3, TISGA reduces the iteration count by 54% compared to Adamu et al (2025) and by 63% compared to Vinh et al (2023) CPU time is halved due to fewer gradient evaluations.

Problem 14 (Sparse Signal Recovery (LASSO-type SFP)). This problem is taken directly from Vinh et al. 2023 and López et al. 2012 for direct comparison.

Setup:

True signal \(x^{*} \in \mathbb{R}^{1024}\) with 50 nonzero entries (sparse)

Measurement matrix \(\mathcal{B \in}\mathbb{R}^{512 \times 1024}\): Gaussian, normalized columns

Observations: \(b\mathcal{= B}x^{*} + \text{noise}\) (SNR = 40 dB)

\(\mathcal{C = \{}x \mid \parallel x \parallel_{1} \leq R\}\) (\(\mathcal{l}_{1}\)-ball, \(R = \parallel x^{*} \parallel_{1}\))

\(\mathcal{Q = \{}b\}\) (singleton – exact feasibility)

\[h(x) = \frac{1}{2}\mathcal{\parallel B}x - b \parallel^{2}\]

Table 4: Parameter settings for Problem 14.

Algorithm Parameters
CQ \[\tau = 0.9\mathcal{/ \parallel B} \parallel^{2}\]
López \[\rho = 3.0\]
Vinh \[\rho = 3.0,\mspace{6mu}\gamma = 0.5\]
Adamu \[\rho = 3.0,\mspace{6mu}\gamma_{1} = 0.5,\mspace{6mu}\gamma_{2} = - 0.1\]
TISGA \[\rho = 3.0,\mspace{6mu}\gamma_{1} = 0.5,\mspace{6mu}\gamma_{2} = - 0.1,\mspace{6mu}\beta = 0.90\]

Table 4 summarizes the parameter settings for Problem 14. For this sparse recovery problem, we set ρ=3.0 for all adaptive methods, γ₁=0.5 for Vinh et al., Adamu et al., and TISGA, γ₂=-0.1 for Adamu et al. and TISGA, and β=0.90 for TISGA

Table 5: Sparse recovery results (average over 30 realizations). \(\mathbf{}^{\mathbf{*}}\)CQ did not converge within 5000 iterations for 100% of cases; value shown is at termination.

Algorithm Iterations CPU (s)

Recovery Error

\[\parallel x_{n} - x^{*} \parallel_{2}\]

Support Recovery \(F_{1}\)-score
CQ \[5000^{*}\] 3.21 \[4.2 \times 10^{- 2}\] 0.68
López 2341 1.52 \[2.1 \times 10^{- 2}\] 0.81
Vinh 1672 1.18 \[1.6 \times 10^{- 2}\] 0.85
Adamu 1289 0.94 \[1.3 \times 10^{- 2}\] 0.88
TISGA 712 0.51 \[8.7 \times 10^{- 3}\] 0.93

As shown in Table 5, TISGA achieves a recovery error of 8.7×10⁻³ with only 712 iterations, significantly outperforming Adamu et al. (2025) (1.3×10⁻² error at 1289 iterations), Vinh et al. (2023) (1.6×10⁻² error at 1672 iterations), and López et al. (2.1×10⁻² error at 2341 iterations). The classical CQ method failed to converge within 5000 iterations for all cases, achieving only 4.2×10⁻² error at termination. In terms of support recovery F₁-score, TISGA achieves 0.93, the highest among all methods, compared to 0.88 for Adamu et al. (2025), 0.85 for Vinh et al. (2023, 0.81 for López et al., and 0.68 for CQ.

Figure 2: Recovery error \(\mathbf{\parallel}\mathbf{x}_{\mathbf{n}}\mathbf{-}\mathbf{x}^{\mathbf{*}}\mathbf{\parallel}_{\mathbf{2}}\) vs. iterations for sparse signal recovery (Problem 14). TISGA reaches \(\mathbf{10}^{\mathbf{- 2}}\) error at \(\mathbf{\sim}\)600 iterations vs. \(\mathbf{\sim}\)1100 for Adamu.

From Figure 2, we observe that TISGA reaches 10⁻² error at approximately 600 iterations, while Adamu et al. requires about 1100 iterations to achieve the same accuracy, demonstrating the superior acceleration provided by our two-step inertial mechanism.

Parameter Sensitivity Analysis (TISGA)

We varied \(\gamma_{1}\), \(\gamma_{2}\), \(\rho\), and \(\beta\) for Problem 12 to assess robustness. Figure 3 visualizes the effect of \((\gamma_{1},\gamma_{2})\) on iteration count.

Figure 3: Heatmap of TISGA iterations as a function of \(\mathbf{(}\mathbf{\gamma}_{\mathbf{1}}\mathbf{,}\mathbf{\gamma}_{\mathbf{2}}\mathbf{)}\) with \(\mathbf{\rho = 3.5}\), \(\mathbf{\beta = 0.85}\). Darker colors indicate faster convergence.

Remark 15: We varied γ₁, γ₂, ρ, and β for Problem 12 to assess robustness. Figure 3 visualizes the effect of (γ₁,γ₂) on iteration count. From Figure 3, darker colors indicate faster convergence, with the optimal region clearly visible around γ₁≈0.3 and γ₂≈-0.05 to -0.10.

Table 6: TISGA iterations vs. \(\mathbf{(}\mathbf{\gamma}_{\mathbf{1}}\mathbf{,}\mathbf{\gamma}_{\mathbf{2}}\mathbf{)}\) with \(\mathbf{\rho = 3.5}\), \(\mathbf{\beta = 0.85}\).

\[\gamma_{1}\backslash\gamma_{2}\] \[- 0.15\] \[- 0.10\] \[- 0.05\] \[0.00\]
0.1 623 598 567 612
0.2 512 491 467 523
0.3 445 421 412 468
0.4 503 489 476 534
0.5 612 598 567 645

The data in Table 6 confirm that the optimal region is γ₁≈0.3 with γ₂≈-0.05 to -0.10, where TISGA achieves as few as 412 iterations. Notably, a negative γ₂ consistently improves performance over γ₂=0 (which corresponds to the one-step inertial case), with improvements ranging from 7% to 15%.

Table 7: TISGA iterations vs. \(\mathbf{\rho}\) and \(\mathbf{\beta}\) (\(\mathbf{\gamma}_{\mathbf{1}}\mathbf{= 0.3}\), \(\mathbf{\gamma}_{\mathbf{2}}\mathbf{= - 0.05}\)).

\[\rho\backslash\beta\] \[0.70\] \[0.80\] \[0.85\] \[0.90\] \[1.00\]
2.5 589 534 512 521 567
3.0 498 456 438 445 489
3.5 467 423 412 418 456
3.8 478 441 432 439 478

Based on Table 7, the recommended parameter ranges are ρ∈[3.0, 3.7] and β∈[0.80, 0.90], with the best performance (412 iterations) achieved at ρ=3.5 and β=0.85.

COMPUTATIONAL COMPLEXITY PER ITERATION

Remark 16 (Computational complexity per iteration). Although Table 3 shows that TISGA requires fewer total iterations and less CPU time than the compared methods, it is important to examine the per-iteration cost. The main computational burden in all compared algorithms lies in:

Evaluating \(\nabla g(u^{n}) = B^{*}(I - P_{Q})Bu^{n}\),

Computing \(P_{C}( \cdot )\).

For TISGA, each iteration additionally requires:

Two vector combinations for \(u^{n} = (1 - \theta_{n})x^{n} + \theta_{n}(x^{n - 1} - x^{n})\),

One scalar-vector multiplication for \(\lambda_{n}\nabla g(u^{n})\),

One convex combination to form \(\alpha_{n}u^{n} + (1 - \alpha_{n})(u^{n} - \lambda_{n}\nabla g(u^{n}))\),

Another convex combination with \(\beta_{n}\) before projection.

In contrast:

Byrne’s (2003) CQ algorithm uses only one gradient evaluation and one projection per iteration.

López et al.(2012) uses a similar Polyak step but without inertial terms.

Vinh et al.(2023) includes one inertial projection before the gradient step.

Thus, TISGA has a higher per-iteration constant factor (approximately \(2\)\(3 \times\) more floating-point operations per iteration than Byrne’s CQ). However, its dramatic reduction in total iterations (e.g., \(412\) vs. \(4,872\) for Byrne in Problem 12) more than compensates for this, leading to lower overall CPU time (\(0.23\)s vs. \(2.34\)s). The complexity per iteration is \(O(mn)\) for matrix-vector products (same as all compared methods), where \(m = dim(H_{2})\), \(n = dim(H_{1})\). Hence, the advantage of TISGA comes from improved convergence rate (\(R\)-linear vs. linear), not from a lower per-iteration asymptotic order.

SUMMARY OF NUMERICAL FINDINGS

TISGA consistently outperforms all compared methods, reducing iterations by 41-54% over the best existing two-step method (Adamu et al., 2025) as shown in Tables 3 and 5, and by 54-63% over the one-step inertial method (Vinh et al., 2023).

TISGA converges 100% of the time across random instances, whereas classical CQ fails in about 32% of cases under the same tolerance, as demonstrated in Table 3.

Optimal parameter ranges are γ₁∈[0.25, 0.35], γ₂∈[-0.12, -0.03], ρ∈[3.0, 3.7], β∈[0.80, 0.90], as illustrated in Figures 3 and Tables 6 and 7. The algorithm tolerates moderate deviations without catastrophic failure.

Despite a higher per-iteration constant factor (approximately 2-3× more operations than CQ), the dramatic reduction in iteration count yields 40-55% lower CPU time compared to Adamu et al. (2025), as evidenced in Tables 3 and 5.

Numerical results confirm weak convergence (observed in practice) and demonstrate that the two-step inertial mechanism with γ₂<0 indeed accelerates convergence beyond one-step methods, consistent with the theoretical motivation in Section 1 and visualized in Figures 1 and 2.

CONCLUSION

This research addresses the efficiency challenge in solving the Split Feasibility Problem (SFP) by eliminating the reliance on prior knowledge of the operator norm of the bounded linear mapping \(\mathcal{B}\). We introduced the novel Two-step Inertial Self-adaptive Gradient Algorithm (TISGA), which uniquely combines a two-step inertial technique—utilizing three previous iterates (\(x_{n},x_{n - 1},x_{n - 2}\))—with a strategically placed projection of the inertial term. A key feature is its self-adaptive step size, \(\lambda_{n} = \frac{\beta\rho h(w_{n})}{\parallel \nabla h(w_{n}) \parallel^{2}}\), which removes the requirement for operator norm estimation while incorporating inertial acceleration—a combination not previously achieved in Hilbert spaces with a two-step method.

Our work offers several significant theoretical advancements and generalizations compared to existing methods:

While López et al. (2012) pioneered the self-adaptive step-size to eliminate norm dependency, their work lacked inertial acceleration. TISGA inherits this norm-free property and enhances it with a multi-step inertial mechanism.

The algorithm serves as a genuine generalization of the one-step inertial framework of Vinh et al. (2023); when our second inertial parameter \(\gamma_{2} = 0\), their method is recovered as a special case.

Compared to the closest work by Adamu et al. (2025) , which proposed a two-step method confined to Euclidean spaces, our work provides a significant theoretical advancement by extending the analysis to infinite-dimensional Hilbert spaces. Furthermore, TISGA distinguishes itself through a novel projection strategy (\(w_{n} = P_{C}(v_{n})\)) and a more flexible step-size parameter \(\beta\) with explicitly derived bounding conditions that ensure the positivity of all key constants in the convergence proof.

By incorporating memory of three iterates, TISGA also directly addresses the potential limitations of one-step inertial methods , offering a structure that may provide more robust acceleration.

Under mild conditions, Theorem 10 rigorously establishes the algorithm’s weak convergence to a solution of the SFP.

In conclusion, this study successfully designs the novel TISGA algorithm, which unifies and generalizes several existing approaches in the literature, and establishes its theoretical convergence under mild conditions. Future work should focus on: numerical validation against state-of-the-art methods to assess empirical performance; extending the framework to more complex problems such as variational inequality problems and split equality problems; modifying the algorithm to achieve strong convergence, which is often desirable in infinite-dimensional spaces.

ACKNOWLEDGEMENT

The author sincerely thanks the anonymous reviewers for their valuable comments and suggestions, which significantly improved this manuscript. Gratitude is also extended to the journal editors for their professional guidance and timely handling of the review process. The author acknowledges all cited authors whose published works informed and supported this research.

REFERENCES

Adamu, A., Ogwo, G. N., Okeke, C. C., & Zinsou, B. (2025). A two-step inertial CQ method for split feasibility problems with applications. Bangmod International Journal of Mathematical and Computational Science, 11, 23-50. [Crossref]

Alvarez, F., & Attouch, H. (2001). An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Analysis, 9(1), 3-11. [Crossref]

Anh, P. K., Vinh, N. T., & Dung, V. T. (2018). A new self-adaptive CQ algorithm with an application to the LASSO problem. Journal of Fixed Point Theory and Applications, 20, 1-19. [Crossref]

Ansari, Q. H., & Rehan, A. (2014). Split feasibility and fixed point problems. In Nonlinear analysis: Approximation theory, optimization and applications (pp. 281-322). Birkhäuser. [Crossref]

Aubin, J. P. (2013). Optima and equilibria: An introduction to nonlinear analysis (Vol. 140). Springer Science & Business Media. [Crossref]

Byrne, C. (2002). Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Problems, 18(2), 441-453. [Crossref]

Byrne, C. (2003). A unified treatment of some iterative algorithms in signal processing and image reconstruction. Inverse Problems, 20(1), 103-120. [Crossref]

Censor, Y., Bortfeld, T., Martin, B., & Trofimov, A. (2006). A unified approach for inversion problems in intensity-modulated radiation therapy. Physics in Medicine & Biology, 51(10), 2353-2365. [Crossref]

Kesornprom, S., Pholasa, N., & Cholamjiak, P. (2020). On the convergence analysis of the gradient-CQ algorithms for the split feasibility problem. Numerical Algorithms, 84, 997-1017. [Crossref]

Kılıçman, A., & Mohammed, L. B. (2016). Iterative methods for solving split feasibility problem in Hilbert space. Malaysian Journal of Mathematical Sciences, 10(S), 127-143.

Kiliçman, A., & Mohammed, L. B. (2018). A note on the split common fixed point problem and its variant forms. In Mathematical analysis and applications: Selected topics (pp. 283-340). Wiley. [Crossref]

Liang, J. (2016). Convergence rates of first-order operator splitting methods (Doctoral dissertation). Normandie Université; GREYC CNRS UMR 6072.

López, G., Martín-Márquez, V., Wang, F., & Xu, H. K. (2012). Solving the split feasibility problem without prior knowledge of matrix norms. Inverse Problems, 28(8), 085004. [Crossref]

Ma, X., & Liu, H. (2022). An inertial Halpern-type CQ algorithm for solving split feasibility problems in Hilbert spaces. Journal of Applied Mathematics and Computing, 68, 1699-1717. [Crossref]

Mohammed, L. B., & Babura, B. I. (2026). Multi-inertial techniques for split feasibility problem with multiple output sets. Bima Journal of Science and Technology, 9(4B), 257-268. [Crossref]

Mohammed, L. B., Kilicman, A., & Bamanga, D. (2025). Efficient viscosity algorithms for solving the split equality fixed-point problem. European Journal of Pure and Applied Mathematics, 18(4), 6154. [Crossref]

Okeke, C. C., & Adamu, A. (2023). Two-step inertial method for solving split common null point problem with multiple output sets in Hilbert spaces. Journal of Applied Mathematics and Computing, 69, 3855-3878. [Crossref]

Opial, Z. (1967). Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bulletin of the American Mathematical Society, 73(4), 591-597. [Crossref]

Polyak, B. T. (1987). Introduction to optimization. Optimization Software, Publications Division.

Poon, C., & Liang, J. (2019). Trajectory of alternating direction method of multipliers and adaptive acceleration. Advances in Neural Information Processing Systems, 32, 7355-7363.

Qu, B., & Xiu, N. (2005). A note on the CQ algorithm for the split feasibility problem. Inverse Problems, 21(5), 1655-1665. [Crossref]

Sahu, D. R., Cho, Y. J., Dong, Q. L., Kashyap, M. R., & Li, X. H. (2021). Inertial relaxed CQ algorithms for solving a split feasibility problem in Hilbert spaces. Numerical Algorithms, 87, 1075-1095. [Crossref]

Shehu, Y., & Gibali, A. (2021). New inertial relaxed method for solving split feasibilities. Optimization Letters, 15(6), 2109-2126. [Crossref]

Suantai, S., Pholasa, N., & Cholamjiak, P. (2018). The modified inertial relaxed CQ algorithm for solving the split feasibility problems. Journal of Industrial and Management Optimization, 14(4), 1595-1615. [Crossref]

Vinh, N. T., Hoai, P. T., Dung, L. A., & Cho, Y. J. (2023). A new inertial self-adaptive gradient algorithm for the split feasibility problem and an application to the sparse recovery problem. Acta Mathematica Sinica, English Series, 39(12), 2489-2506. [Crossref]

Yao, Y., Yao, J. C., & Xu, H. K. (2020). An inertial self-adaptive algorithm for the split feasibility problem. Journal of Nonlinear and Convex Analysis, 21(5), 1123-1137.