AI OMG: Strengths and weaknesses of Open AI’s Sora text to video for the public sector

Every week I’m reading, listening and updatunbg my knowledge on AI tools that public sector comms people can use.

Up till now I’ve not been that impressed by the video production tools I’ve come across.

They can be clunky and tend to miss the point.

However, OpenAIs new tool Sora looks truly astonishing.

It takes text prompts and turns them into video.

First, I’d like to show you some and then I’d like to weigh-up the pros and cons.

Example 1: a Tokyo street

In this clip, the prompt is quite detailed.

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

It’s amazing isn’t it?

Example 2: A spaceman in a knitted motorbike helmet

While the first example hung back from the subject the second goes close in.

Prompt: A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

Again, astounding.

Example 3: Reflections in a train window

I’m sure that some things are easier to produce than not. The difficult of replicating reflections in a window I’d imagine is towards the top of the hardest list.

Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.

And it achieves the look beautifully.

Example 4: Grandma’s birthday

While the other examples have dealt with people in different ways this looks at a group.

Prompt: A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood..Prompt: A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood.

Weakness: Simulating complex interactions between objects and multiple characters is often challenging for the model, sometimes resulting in humorous generations.

Interestingly, OpenAI have also set out the weaknesses of such an approach.

Conclusion

The quality of the images are astounding in their quality. They look like video wheras previous tools didn’t quite ring true.

The visual clues you may look for, like reflections on windows, easily confound the brain.

That’s real, isn’t it?

Only, it isn’t.

Right now OpenAI are pulling a blinder by teasing amazing content but regulating the use of the product. People are talking but not able to use it right now but this will change.

As we can’t use it we can’t see how hard it is to experiment with good content.

The pitfalls of Sora AI video

For the public sector, the flaw isn’t yet cost or even a pathway to start using it. UK Government have released some guidelines to encourage the use of it.

I feel like looking a gifthorse in the mouth when I say this but the issue for the public sector maybe that right now the content is too generic.

A campaign for a commercial could do something enlightening. Filmmakers I suspect will make something useful with this.

I’ve seen an AI how to video made by a council neasr me with a generic English accent and I hated it for its insincerity. I was left feeling played.

One issue with web content for a council, NHS Trust, police force or fire and rescue is that generic content doesn’t do so well. As I’ve blogged this week people pictures work really well. They are both real and of people. They also capture the area. So, generic shots of Tokyo, yes. Shots of Dudley in the West Midlands, probably not.

Right now, shooting your own content of people and landmarks tops it. But can AI-made content be used to supplement it? We wear futuristic artist impressions of new developments. Will we go for AI-made content of a new town centre development? Or what a new hospital ward would look like? I’m guessing yes.

Of course, such is the onward pace of AI this hurdle may well become surmountable. What I’ve just written may seen laughable quite quickly, I accept that. An interface with Google Street View and Google Photos could be one way to do that, I’m speculating. But wouldn’t Google be building their own equivalent?

Oh, heck, my head hurts.

Exit mobile version