top of page

Q-Learning Reinforcement Agent Simulation

Role: AI Engineer

AI

• Implemented a Q-learning algorithm to train an autonomous explorer to reach a treasure in a discrete one-dimensional environment, learning through iterative state–action–reward updates.
• Built a Q-table using Pandas to map state–action pairs and progressively optimized policy values via temporal-difference learning.
• Designed the agent’s ε-greedy exploration strategy, dynamically balancing exploration and exploitation to improve convergence efficiency.
• Modeled environmental feedback loops where the agent received state transitions and rewards; implemented terminal-state detection and reset mechanics for episodic training.
• Tuned learning parameters (α, γ, ε) to optimize performance, reducing average episode completion from 92 to 9 steps after training convergence.
• Visualized real-time environment updates and action selection across 13 training epochs, demonstrating clear policy improvement through learned behavior.

bottom of page