AISpy — a research driven app concept using TensorFlow Lite + Flutter
Introduction
AISpy is an experimental app concept built using Flutter, a cross-platform toolkit made by Google. The app uses TensorFlow Lite, a deep learning framework for on-device inference.
The app combines artificial intelligence and gamification to acquire user submitted source images to enhance the underlying machine learning object recognition dataset.
The app prototype uses a fairly limited pre-trained data model with just under a hundred recognisable classes of object; ranging from a computer mouse to a fridge, a person or even a toilet!
To help improve the number of recognisable objects, the app asks the user to share photographs of objects it could not guess during the game.
I Spy — the original version
The app is based on the popular game I Spy, players take turns guessing the identity of a nearby object. The challenger provides the first letter of an object as a clue to help other players. If nobody guesses correctly the challenger wins the round.
AISPY — A ghost in the machine
At the heart of the app’s functionality is the ability to ‘recognise’ objects in the real world and remember them throughout the game. Although the classic version of I spy is a kid’s game, this app is aimed at grown ups, too.
It uses a machine learning platform called TensorFlow — “ a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.”
AISPY uses the device’s camera to stream images. It captures frames as the user moves the device around the environment. Each frame is then processed and the app stores a list of objects it has recognized within memory, along with the types of object and frame positioning using bounding boxes.
From this data, the app is able to challenge the human player to guess a spied object. When it is the human player’s turn to challenge, the app draws upon it’s recent inventory of recognized objects from local storage to guess the human player’s object based on the first letter they provided as a clue.
The interaction between the AI player (the device!) and the human player is carried out almost entirely by vocal speech recognition and in return the app uses the devices built in speech capabilities to speak.
Computer Vision
Computer vision provides an exciting tool for creative app developers. However, Artificial Intelligence is still a way off achieving our levels of visual perception.
“It is amazing that humans and animals do this so effortlessly, while computervision algorithms are so error prone”
- Richard Szeliski ( Microsoft Research)
Machine learning mishaps in the realworld would be cause for alarm. But, within AISPY, it provides an element of lighthearted humour, especially when the app mistakes an object for something utterly absurd !
I have played upon this humourous aspect further, by providing the AI player with quirky, charismatic responses, and utterances throughout the game.
Microsoft COCO Dataset
The current version of AISpy uses a model created by Microsoft COCO (common objects in context). The process of using a pre-trained model in this way is call Transfer Learning.
The COCO dataset includes 80 object types derived from over 200 thousand labelled images.
The associated research paper describes coco as a dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.
AISPY seeks to build upon the COCO dataset’s 80 or so classified object types by requesting users to submit a copy of the object from a game when the AI player fails to guess correctly. The app saves two images — a contextual image of the object in-situ as well as a clipped version. This means bounding boxes may be used to identify the object within the image and this will then contribute to the overall dataset and eventually allow new object types to be added to the app’s trained model.
Limitations
The object recognition functionality is limited, due to the small number of available pre-trained datasets. So, I have purposely lowered the threshold for accuracy to ensure the game identifies more objects. I have sought to play upon the humour of mistaken identifications alongside the playful nature othe AI character’s vocal feedback during gameplay.
Conclusion
Overall, it has been an interesting experience working with TensorFlow and Flutter. I am hoping that I can expand upon the trained data model with images collected through this app. The model will then be useful for future projects which require realtime object recognition with a greater number of recognisable objects.
Originally published at https://blog.matwright.dev on September 3, 2020.