DESIGN

Design

System Architecture

System architecture

How can we know where is the tennis ball?

From the image we received from the camera, we extract the ball trajectory by pushing the RGB image into the HSV color space and ﬁlter out all non-green colors. With the help of OpenCV, we can read the pixel position of ball in camera frame.

What's our initial approach?

Like most of the previous works, firstly we tried to use the classic dynamic model to predict the dropping point.

The trajectory of a throwing ball, whether in 2-dimensional or 3-dimensional space, can be modeled as a parabola. With a bunch of sample points and their time intervals, we can draw the whole trajectory.

There are several challenges in this design. Most importantly, we need to allow enough time for robot arm to move to desired position. Therefore, we decided to select the mirror location of the first point captured by the camera with respect to the symmetry axis of the parabola as the predict point. The image below shows this method in 2-dimensional space.

Why machine learning and how?

Physical model is good only when our hardware has really outstanding performance. Besides, we need to adjust every part of the system until they reflect the real world and match the physics. For example, we need to calibrate the camera, measure the mapping between the 3D coordinate of the ball and the pixel coordinate system, etc. Our question is: can we get rid of all these heavy workload?

The answer is definitely yes! That's where our idea of machine learning approach comes from.

This particular case can be modeled into a supervised learning problem. We decompose the training process into three steps.

Determine the reachable set of the Baxter arm.
Grid the reachable set and collect data.
Fit a particular model using the training data.

Our two candidate algorithms are:

Standard fully connected neural network.
K-NN (K nearest neighbors).

There are some hyper-parameters that we should tune in these two candidate algothms:

For neural network, we need to tune the number of layers and the number of nodes in each layer.
For K-NN, we need to tune the number of nearest neighbors we want to pick.

Fully Connected Neural Network
(Image from wikipedia)

K-NN
(Image from wikipedia)

How to move it?

After some evaluations, the flying time of the all is approximately 0.6 seconds. After kicking out the detecting time, there is not much time left for the robot arm to move. Also, in order to 'catch' the ball, there is also a demand in accuracy. Therefore, we need a stable and accurate controller that can move robot arm very fast.

We decide to use a PD controller with a decaying parameter Kp. In our design, Kp is initially very large so that the arm can move quickly. It is then vanishing as time goes on to make the arm stay stable at the prediction point. Below is the flow chart of our controller.