Google Releases Nano Banana 2 Lite and Gemini Omni Flash to Enhance Developer Workflows through High Speed Image Generation and Conversational Video Editing
Two media generation models released by Google have been developed to enhance developer workflows, boost speed, and reduce costs. The update has seen the release of the quick image generator Nano Banana 2 Lite as well as a developer launch of the Gemini Omni Flash, a conversational video editing and video generation system. Google explained that developers can build complete pipelines from very fast image drafts to animated video sequences.
![]() |
| For comprehensive benchmarking information, please visit Google DeepMind's Gemini Omni webpage. |
The latest implementation of the Google image generation family is the Nano Banana 2 Lite, which uses the model name gemini 3.1 flash lite image. This model is designed to excel in high throughput pipelines where speed and cost are the most significant factors. Google advises upgrading to the gemini 3.1 flash lite image model if using the legacy gemini 2.5 flash image model because the quality is better and it is faster.
Benchmark data demonstrates that Nano Banana 2 Lite outputs text to image in four seconds. It costs 0.034 dollars per 1K resolution image. Therefore, its low latency makes it highly suitable for volume testing and interactive prototyping. At the same time, it is capable of consistent prompt adherence and character repetition, and it produces easy to read text. It adds to the model family of the base Nano Banana 2 for medium performance and Nano Banana Pro for reasoning complex tasks. Apart from the API, Google also deploys this lightweight model to other user facing products like Search AI Mode, NotebookLM, Google Photos, and Google Ads.
As introduced at Google IO, the Gemini Omni Flash model is now accessible for developers through the Gemini API and the Google AI Studio as gemini omni flash preview. This model provides conversational editing using a mix of text, image, and video input besides video synthesis. Google has competitively priced the model at 0.10 dollars per second of generated video, the same as the Veo 3.1 Fast model.
This system offers a wonderful way to conversationally edit video where the user can request changes to the scene with natural language prompts. The use of multimodal referencing through image and text prompts helps the scene to stay visually consistent, and the use of the Google DeepMind real world knowledge database helps construct scenes in a logical manner. Furthermore, the developer has the option to tie text and graphical elements to a particular physical feature within the video through basic prompting. Currently, the model supports only 10 second video generations.
Video inputs of 3 seconds or less are not yet supported by the current API schema.
Unfortunately, audio references are not yet supported at this time.
The main benefit to developers is the ability to combine both models in one application. A user can generate an image with the Nano Banana 2 Lite model for asset creation and then instantly feed that image to the Gemini Omni Flash model to make it move. Using the Interactions API, developers can keep session history so that users can stack up to three sequential video edits in one session. In order to prevent misuse of synthetic media, both models will have SynthID watermarking built in so users can search the content or check authenticity in the Gemini app.
Google has also released three demo applications to illustrate these integrated workflows:
- The Anywhere application: To take a photo, place the subject in a new location with Nano Banana 2 Lite, and then animate the background with Omni Flash.
- The Space Lift application: To produce interior designs from photographs of rooms and animate the design in a cinematic walkthrough.
- The Omni Product Studio application: To convert 2D photographs into high quality ecommerce video advertisements with fast image to video rendering.


