In this paper, we propose a Reinforcement Learning-based MAC layer protocol for cognitive radio networks, based on exploiting the feedback of the Primary User (PU). Our proposed model relies on two pillars, namely an infinite-state Partially Observable Markov Decision Process (POMDP) to model the system dynamics besides a queuing-theoretic model for the PU queue, where the states represent whether