Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Remember Sakana AI? Almost a year ago, the Tokyo-based startup made a striking appearance on the AI scene with its high-profile founders from Google and a novel automated merging-based approach to developing high-performing models. Today, the company announced two new image-generation models: Evo-Ukiyoe and Evo-Nishikie.
Available on Hugging Face, the models have been designed to generate images from text and image prompts. However, there’s an interesting and unique catch: instead of handling regular image generation in different styles, these models are laser-focused on Japan’s popular historic art form ukiyo-e. It flourished between the 17th and 19th centuries, and Sakana hopes to bring it back to modern content consumers using the power of AI.
The move comes as the latest localization effort in the AI space — something that has grown over the past year, with companies in countries like South Korea, India and China building models tailored to their respective cultures and dialects.
What to expect from the new Sakana AI models?
Dating back to the early 1600s, Ukiyo-e – or “pictures of the floating world” – evolved as a popular art in Japan focusing on subjects like historical scenes, landscapes, sumo wrestlers, etc. The genre revolved around monochrome woodblock prints but eventually graduated to full-color prints or “nishiki-e” with multiple woodblocks. Its popularity declined in the 19th due to multiple factors, including the rise of digital photography.
Now, with the release of the two image-generation models, Sakana wants to bring the historic artwork back into popular culture. The first one – Evo-Ukiyoe – is a text-to-image offering that generates images closely resembling ukiyo-e, especially when prompted with text inputs describing elements commonly found in ukiyo-e art such as cherry blossoms, kimono or birds. It can even generate ukiyo-e-style art with things that did not exist back then, like a hamburger or laptop, but the company points out that sometimes the results may veer off track — not resembling ukiyo-e at all.
The model is based on Evo-SDXL-JP, which Sakana developed using its novel evolutionary model merging technique on top of Stability AI’s SDXL and other open diffusion models. The company said it used LoRA (Low-Rank Adaptation) to fine-tune Evo-SDXL-JP on a dataset of over 24,000 carefully-captioned ukiyo-e artworks acquired through a partnership with the Art Research Center (ARC) of Ritsumeikan University in Kyoto.
“We curated this data with a wide range of subjects, covering including whole art and face-centered ones, from the digital images of ukiyo-e in the ARC collection. We also focused on multi-colored nishiki-e with beautiful colors while considering diversity,” the company wrote in a blog post.
The second model, Evo-Nishikie, is an image-to-image offering that colorizes monochrome Ukiyo-e prints. Sakana says it can add color to historical book illustrations that were printed in one color of ink or give entirely new looks to existing multi-colored Nishikie prints. All the user would have to do is provide the source image and maybe pair it with a set of instructions describing the elements to be colored.
Sakana said it brought this model to life by performing ControlNet training on Evo-Ukiyoe, using fixed prompts and condition images.
Goal for further research and development
While the models only support prompting in Japanese and are in the very early stages, Sakana hopes the work to teach AI traditional “Japanese beauty” will spread the appeal of the country’s culture worldwide and find applications in education and new ways of enjoying classical literature.
Currently, the company is providing both models and the associated code to get started on Hugging Face. The Python script included in the repository and LoRA weights are available under the Apache 2.0 license.
“This model is provided for research and development purposes only and should be considered as an experimental prototype. It is not intended for commercial use or deployment in mission-critical environments. Use of this model is at the user’s own risk, and its performance and outcomes are not guaranteed,” the company notes on Hugging Face.
So, far Sakana AI has raised $30 million in funding from multiple investors, including by Lux Capital, which has invested in pioneering AI companies like Hugging Face, and also Khosla Ventures, known for investing in OpenAI way back in 2019.
Source link