Working with Mediapipe’s pre-made Calculators in Hand Tracking

Arian Alavi
4 min readOct 2, 2020

--

This article will describe much of what I’ve learned about the already made calculators in Mediapipe over the past year of using it. I’ve exclusively used the hand tracking example, so that is where I will be detailing. The documentation on these calculators exists within the files, but I’ve found it insufficient. Here, I will be breaking down calculators and classes commonly used by hand tracking. This article is made for those who have intermediate knowledge of how Mediapipe functions.

How to put a .tflite file into Mediapipe

The “TfLiteInferenceCalculator” is used to run .tflite files. Setup the .pbtxt file as following:

node {

calculator: “TfLiteInferenceCalculator”

input_stream: “TENSORS:input_tensors”

output_stream: “TENSORS:output_tensors”

node_options: {

[type.googleapis.com/mediapipe.TfLiteInferenceCalculatorOptions] {

model_path: “mediapipe/models/filename.tflite”

}

}

}

Make sure that your .tflite model is in the /models folder. If you are planning on compiling for android, make sure that the models file is also included within the android dependencies.

Note that the input for the TfLiteInferenceCalculator must be of tag TENSORS. In order to get TENSORS you will need to feed in matrices to the “TfLiteConverterCalculator”. Add the following to the .pbtxt file:

node {

calculator: “TfLiteConverterCalculator”

input_stream: “MATRIX:input_matrix”

output_stream: “TENSORS:input_tensors”

}

In order to make use of a data stream of MATRIX, you will need the calculator preceding TfLiteConverterCalculator to have an output stream of MATRIX. Mediapipe makes use of the Eigen c++ library. In order to use it, include the following header:

#include “mediapipe/framework/formats/matrix.h”

Declare a matrix with the following code:

Matrix test;

test.resize(1, 42);

test(0, 1) = 3;

Don’t forget to add the necessary dependencies to your BUILD file as well.

How to work with class NormalizedLandmarkList

NormalizedLandmarkList is the data type that stores normalized hand coordinates (normalized to resolution). There are 21 coordinates which are kept track of. Each coordinate has an x, y, and z value. In previous versions of Mediapipe, there was only x and y. It is possible to remove the z coordinate still and it may improve performance. From our experience, the Z coordinate tends to be inaccurate. According to Mediapipe, the Z coordinate was trained purely synthetically, while the X and Y coordinates were trained with part real data and part synthetic data. The below example shows a hand showing sign language letter A with the coordinates of the NormalizedHandLandmark. Each value corresponds to the landmark value.

ASL letter “A” as seen through Mediapipe’s NormalizedLandmarkList

To access each landmark use NormalizedLandmarkList.landmark(index), which will return a constant NormalizedLandmark. To access the x, y, and z coordinates, make use of .x(), .y() and .z().

In order to create a new NormalizedLandmarkList, it must be created from scratch, as the NormalizedLandmarks are constants. Declare a new NormalizedLandmarkList and use NormalizedLandmarkList.add_landmark() to add a set of x, y, z coordinates. The first landmark you create will be at the 0 index, and further landmarks will be further along. Your new NormalizedLandmark will be a non-constant pointer, allowing you to use NormalizedLandmark -> set_x() to set a float/double value. Similarly you may use set_y() and set_z().

You can find a reference to a stream of NormalizedLandmarkList in “/mediapipe/graphs/hand_tracking/hand_tracking_desktop” where it will have the tag “LANDMARKS:hand_landmarks”.

How to use class RenderData to display text

After declaring a RenderData variable, make use of RenderData.add_render_annotations() and store it in a variable, now called “label_annotation”. From there, you can make multiple texts by calling label_annotation->mutable_text(). It will return a class RenderAnnotation_Text pointer. You may set properties using the following functions.

set_display_text(std::string) “Set the text of the mutable text object”

set_baseline(int) “Set the up/down position of the text”

set_left(int) “Set the left/right position of the text”

set_font_height(int) “Set the height of the text”

set_normalized(bool) “I have not had much luck with this value being true”

set_thickness(int) “Set how thick the text should be”

mutable_color()->set_r(int) “Set red in RGB value of text”

mutable_color()->set_g(int) “Set green in RGB value of text”

mutable_color()->set_b(int) “Set blue in RGB value of text”

set_font_face(int) “Changes the font, accepts values 0–7 for pre-made fonts”

After you have RenderData, you should feed it into the “AnnotationOverlayCalculator”. This calculator is referenced in “/mediapipe/graphs/hand_tracking/subgraphs/renderer_gpu.pbtxt” and in “/mediapipe/graphs/hand_tracking/subgraphs/renderer_cpu.pbtxt”. The calculator has indexed inputs, so simply add your RENDER_DATA stream’s subtag to the calculator and it will automatically be rendered.

Who am I and how did I learn Mediapipe?

I am Arian Alavi, an applied math student at the University of California, Santa Barbara. I’ve been working with Mediapipe on and off for the past year to create a live ASL alphabet interpreter. Most of what I’ve learned about Mediapipe was learned through scouring their hand tracking example and modifying the code.

With the help of many of my friends, we’ve managed to deploy the ASL interpreter to the Android app store. To learn more about our project you may visit our GitHub page: www.github.com/AriAlavi/SigNN and try our app: https://play.google.com/store/apps/details?id=com.signn.mediapipe.apps.handtrackinggpu

--

--

Arian Alavi
Arian Alavi

No responses yet