Training Parameters

Training Statistics
Episodes
0
Total Reward
0
Success Rate
0%
Steps
0
Q-Learning Status
States Explored: 0
Best Path Length: --
Average Reward: 0.00

Environment & Robot

📈 Learning Progress (Last 100 Episodes)

Robot Status

🤖 Ready to learn. Click Start Training.

Q-Learning Formula

Q(s,a) = Q(s,a) + α[R + γ max Q(s',a') - Q(s,a)]

Bellman equation for Q-learning

How To Use AI Robot Reinforcement Learning Playground

1 Understand the Environment

The robot (blue circle) starts at the bottom-left. The goal (green star) is at top-right. Red squares are obstacles. The robot must learn to reach the goal without hitting obstacles.

💡 Pro Tip: Click anywhere on the canvas to move the goal and create new learning challenges.

2 Adjust Reward Rules

Check/uncheck reward rules to shape the robot's behavior. Give higher rewards for desired actions and penalties for undesired ones.

3 Set Training Parameters

Learning Rate (α): How quickly the robot adapts to new information
Exploration Rate (ε): Chance to try random actions vs. using learned knowledge
Discount Factor (γ): How much the robot values future rewards

4 Start Training

Click "Start Training" to begin. Watch the robot improve over time. The progress bars show learning improvement across episodes.

Learning Concept

Q(s,a) = Q(s,a) + α [R + γ max Q(s',a') - Q(s,a)] — This is the Bellman equation for Q-learning

Frequently Asked Questions

What is reinforcement learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward. It's like training a dog with treats – good actions get rewards, bad actions don't.
How does the robot learn to avoid obstacles?
Through trial and error! When the robot hits an obstacle, it receives a negative reward (-5). Over many episodes, it learns which actions lead to obstacles and avoids them. The Q-learning algorithm updates its "knowledge" (Q-values) based on these experiences.
What do the training parameters mean?
Learning Rate (α): How much new information overrides old information. Exploration Rate (ε): Probability of taking a random action. Discount Factor (γ): Importance of future rewards.
Can I see the robot's decision-making process?
Yes! The progress bars show how the total reward per episode improves over time. The "States Explored" counter shows how many different positions the robot has learned about.
How is this used in real robotics?
Real robots use similar algorithms for navigation, manipulation, and task learning. Companies like Boston Dynamics use RL for walking robots, autonomous vehicles use it for driving policies, and warehouse robots use it for efficient item picking.
Why does the robot sometimes take weird paths?
That's exploration! Early in training, the robot tries random actions to discover the environment. As it learns, it exploits known good paths but still explores occasionally (controlled by ε) to find potentially better routes.