Feedback-based access schemes in CR networks: A reinforcement learning approach
In this paper, we propose a Reinforcement Learning-based MAC layer protocol for cognitive radio networks, based on exploiting the feedback of the Primary User (PU). Our proposed model relies on two pillars, namely an infinite-state Partially Observable Markov Decision Process (POMDP) to model the system dynamics besides a queuing-theoretic model for the PU queue, where the states represent whether a packet is delivered or not from the PU's queue and the PU channel state. Based on the stability constraint for the primary user queue, the quality of service (QoS) for the PU is guaranteed. Towards the paper's objectives, three Reinforcement Learning approaches are studied, namely Q-Learning, Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG). Our ultimate objective is to enhance the channel access techniques in the MAC protocols by solving the POMDP without any prior knowledge of the environment. © 2021 IEEE.