Feb 1, 2024, Innovations, Mobile

Gemini – dedicated AI for Android?

Łukasz Szczepański Android Developer
gemini for android
We’ve all heard that AI will change our lives, enslave and use us to make electricity, or just simply send their Terminators. ;) Whatever the future holds, we have to decide what we’re going to do with it. Honestly the main problem with finding a specific usage for AI may lay in the fact that it came to our everyday life so recently that we simply haven’t figured out how to use it yet.

So far, as Android developers, we’ve had the opportunity to integrate some ML models and find them helpful to run through some client provided data. Also the AR has been with us for some time now displaying 3d models in our phones camera preview. However the general purpose artificial intelligence seems to be still a fairly new concept in Android development. Since the integration of Chat GPT to the Bing search engine we’ve seen one of the possible outcomes – the assistant for the user. And if that application proved useful there must be a place for the AI on our mobile devices.

Introducing Gemini

We’ve known for some time now that development of a general purpose AI on the Google side is in progress and announcing the Bard was just a first step. Just recently we were introduced to the Gemini – according to the giant from Mountain View – their most capable AI model. What is worth noting – it comes in three sizes:

  • Nano version to be used on the devices for offline usage

  • Pro version that can be tested right now – a Chat GPT 4.0 equivalent

  • Ultra – advertised as a most capable Google model (still in the development)

The device-localized Nano version is announced for Pixel 8 series phones, so we will focus on the Pro version since now it can be integrated into android apps. More detailed information about this model can be found here

Exploring Use Cases for Mobile General-Purpose AI

Before we focus on the technicalities, let’s quickly dive into possible use cases of the general purpose AI on mobile devices. Most of them will be focussed on an interaction between the AI and a user. Sounds pretty straightforward? Well, it is. There’s no problem with translating texts, analyzing them, shortening them out – this is where Gemini shines. You can use it as a search engine and ask for a historic event – its knowledge reaches April 2023. Gemini also possesses an ability to analyze picture data. It can describe them, make suggestions or simply extract a text from a photographed document. Basically all of the possibilities (and drawbacks) we got used to of an AI such as Chat GPT. 

How easy is it to integrate Gemini into your Android app?

It turns out that’s very easy! First of all, as of the time of writing this article, the API is accessible only to US users so if your location is different, in that case a VPN will be required. Secondly: there are two Gemini models to choose from: 

  • Pro – a text model

  • Pro-vision – a model equipped with image analysis

The second one won’t accept the request until a bitmap will be added to it.

OK, so starting with the documentation here, we can see that integrating AI into an Android app looks like adding any other dependency to the project. Also, there will be a special template prepared to create a new project from scratch in the future release of Android Studio. So, after providing our dependencies in Hilt:

gen mmodel hilt

We can start asking our model by providing it with a content object. This can contain a text prompt, a bitmap as an image prompt or a BLOB (Binary Large OBject) that contains mime type and byte array. Connecting it with a screen state object, this could look something like this:

get content code

And that content is used in the suspend function generateContent() from the GenerativeModel class. It is really that easy!

So after creating a simple app that has a text input and two buttons for getting an image and generating a response, I asked a simple question. How can I make the UI of that app more user friendly? In order to do that I’ve input a text prompt and picked a screenshot of that UI. Here’s what I got:

a text prompt

Well, can’t argue with that assessment, right? 

Summing up

The Gemini looks like another big revolutionary Large Language Model with a power to analyze image data. What will differentiate it from the competition? At this time it’s too early to say. It is very easy to use and it will probably be the default AI to integrate into your Android app. It will probably play the role of an assistant in the system or in your application. How its future will look exactly – nobody knows. Maybe we will find its potential in the upcoming months? In the meantime: be nice to it, don’t forget to use the “please” word – you know, just to be on the safe side. 😉