엔비디아가 선택한 'K-버추얼 휴먼'... 그의 정체는(K-Virtual Human' chosen by NVIDIA... What's her identity... What does a 'more-than-human' AI singer make? She was chosen by NVIDIA)

글로벌로 향하는 K버추얼 휴먼 스타트업 '스튜디오 메타K'. 기술 중심이 아닌 엔터테인먼트 콘텐츠 중심의 엔터테크 기업

HONG

2024-08-25 - 19분 걸림 - 댓글 남기기

This video explores the innovative realm of virtual humans, focusing on the creation of AI-generated artists and the intricate processes involved in producing their music videos.

Viewers will gain insight into how AI is reshaping entertainment, particularly in developing virtual idols that can perform and engage with audiences, exemplified by the creation of a virtual artist named 'Suvi.'

The main theme emphasizes the intersection of AI technology and artistic production, illustrating both the technical advancements and the potential commercial applications in the entertainment industry.

Key topics

The creation and evolution of AI-generated virtual artists demonstrate remarkable advancements in technology and production techniques.

AI-generated virtual artists are created using a combination of Unreal Engine and 3D modeling techniques, allowing them to seamlessly interact with real people.
These techniques include facial continuity maintenance and incorporation of dynamic expressions , enhancing the realism.
The transition from static images to a majority of scenes featuring singing demonstrates significant progress in music video innovations .
Virtual artists like 'Suvi' can perform without actual location visits due to AI-generated visuals .

Gen AI plays a crucial role in enhancing music video production, making it more dynamic and efficient.

Gen AI replaces the entire face of the shadow actor and creates backgrounds and elements, significantly reducing production time .
This approach allows for the creation of complex scenes, such as those showcasing close-ups, which are typically resource-intensive.
The technology allows for rapid adaptation to emerging trends and innovative content creation.
Errors like face tracking during fast movements need to be identified and corrected , but generative AI helps manage these challenges.

The development and management of virtual humans are being treated similarly to traditional celebrities , aiming for high commercial value.

Virtual humans undergo intensive work in aspects like hair and attire adjustments to closely resemble actual celebrities.
The company aims to integrate AI technologies into broader IP businesses, including movies, dramas , and advertisements .
Unlike traditional celebrities, virtual humans can be continuously invested in and managed without the constraints of contract limits .
The creation of virtual humans like 'Suvi' is expected to result in multiple entertainment ventures and international collaboration .

1. 🌟 AI-Generated Virtual Artists and Their Production (00:00:00)

A newly created virtual artist's music video was filmed not in a real location, but using AI-generated visuals.
The virtual artist dances and sings seamlessly alongside real people, showing no signs of artificiality.
Studio MetaK is the startup behind the creation of virtual artists like 'Suvi' and 'Iya.'
The studio was selected for NVIDIA's programs 'Inception,' which focus on AI technology.
The department in charge primarily focuses on planning and creating virtual idols.
Few companies in Korea produce virtual humans using engine-based 3D files and technology.

View Script

(비디오) 그 어떠세요? 어떤 신인가수가 그리스 산토리니에 가서 뮤비를 찍었구나 이런 생각이 드시죠?

가수도 산토리니도 다 아닙니다. 이게 무슨 소리냐고요?

AI가 만들어낸 버추얼 아티스트가 AI가 그려낸 배경에서 뮤직비디오를 만든 겁니다. 정말 다시 눈을 크게 뜨고 볼까요? 진짜 사람들과 춤을 추는데 1도 어색함이 없죠.

노래까지 따라 부르고, 이걸 화면 전체로 클로즈업까지 합니다. 이게 먼저 한 일이죠. 뮤직비디오, 또 이런 버추얼 휴먼 대체 어떻게 만드는 건지, 또 버추얼 휴먼으로 돈은 어떻게 벌 수 있는지, 이하와 수비를 만든 스타트업 스튜디오 메타에서 직접 알아보겠습니다.

안녕하세요, 대표님? 안녕하세요. 아, 여기가 수비랑 이야가 태어난 곳이죠?

네, 요게 좀 되게 눈에 띄던데요. 제일 핫한 엔비디아 아닌가요?

네, 이번에 저희가 그 엔비디아 인셉션 프로그램 파트너에 선정이 됐습니다. AI 기술을 가지고 버추얼 휴먼을 제작하는 걸로 해서 프로그램에 선정이 되었습니다.

현재 이쪽 부서는 가상 인간, 주로 버추얼 아이돌을 기획하는 부서이고, 버추얼 휴먼을 만들고 이걸 엔진 기반이나 3D 해서 이런 3D 파일까지 같이 하는 회사가 국내에 별로 없습니다.

2. 🎨 Techniques Behind Creating Virtual Humans (00:01:48)

The process incorporates both 2D and 3D elements to create virtual humans.
The team uses Unreal Engine and 3D modeling techniques simultaneously.
Specific expressions and angles that are challenging to depict with deepfake technology are achieved using 3D modeling .
The compositing stage refines the appearance for a more natural look by combining multiple photographs.
The learning process involves maintaining facial continuity by randomly switching between frames.
Emphasizes different expressions and emotions, which are combined and utilized as data for virtual human creation.
Each expression and motion is meticulously crafted to enhance the realism of the virtual human.

View Script

대부분(다른 기업들은)은 딥페이크의 2D 기반인데, 저희는 3D 기반까지 같이 해서 지금 진행을 하고 있는 상태입니다.

이제 버추얼 휴먼을 만드는 제작 방식이 여러 가지가 있는데, 그중에서 저희는 딥페이크, 언리얼과 3D 모델링을 좀 같이 병행해서 사용하고 있는 형태라고 이해해 주시면 될 것 같고요. 제가 수비를 만드는데 있어서도 지금 현재 보이시는 모델링을 지금 좀 고치고 있는 모습입니다.

일반적으로 딥페이크에서 표현할 수 없는, 그런 각도나 표정, 또는 어떤 특징적인 것들을 표현했을 때는 이런 3D 모델링의 언리얼 기술을 많이 사용하고 있는 부분이고요. 그다음에 저희가 요것을 기반으로 해서 이제 딥페이크를 진행하고 있습니다.

현재 보이시는 모습 같은 경우에는 저희가 딥페이크 이후에 조금 더 자연스럽게 만들기 위한 컴프(Composition) 단계를 진행하고 있고요.

이게 지금 영상처럼 보이지만, 사실은 하나하나의 사진이 조합된 거라고 봐야 되는 거잖아요? 네, 맞습니다. 사실 사진 하나하나들이 합성되어 보여주는 거고요. 딥페이크 과정을 통해서 얼굴을 지금 만들어 나가는 과정에 있습니다.

이게 지금 울렁울렁거리는 게 뭐냐면, 얼굴의 연속성을 좀 유지를 해야 되기 때문에 1프레임, 10프레임, 9프레임, 8프레임 이런 식으로 무작위로 왔다 갔다 하면서 학습이 먼저 진행이 되고요.

이게 지금 학습하고 있다고 계속... 굉장히 다양한 표정들을 제가 만들면서 우선 이모티콘, 감정 이모티콘 하는 거 같은 느낌으로 요게 다 합쳐지면, 데이터들로 활용되는 것이라 보시면 될 것 같습니다. 이런 표정이나 그런 것도 다 하나하나 만지는 거죠.

3. 🎤 The Evolution of AI-generated Artist 'Subi' (00:03:30)

Precise Image Generation : Using AI to generate images that get transformed into videos, enabling 'Subi' to perform without actual location visits.
Music Video Innovations : Earlier music videos mainly used static images, while recent ones have 'Subi' singing in 80% of the scenes .
Technological Improvements : Progress in stabilizing internal processes has led to enhanced performance of 'Subi' within a few months.
Competitive Edge : The technological advancements are noteworthy, causing competitors to be skeptical yet impressed.
NVIDIA Partnership : Collaboration with NVIDIA is underway, with expectations for diverse joint projects.
Responsive to Technology : Emphasis on rapidly incorporating emerging technologies and integrating them with current processes.

View Script

이렇게 정밀하게 어색한 부분들을 만들어서 이쪽에서 학습을 시켜서 제너레이팅하게 되면, 수비가 산토리에서 노래를 부르고 그 산토리니가 나오는데, 그거를 거기 가지 않고도...

맞습니다. 저희가 생성의 AI를 이용해서 이미지를 만들고, 그 동영상으로 변환하여 소스를 활용했던 사례가 있다고 볼 수 있을 것 같습니다. 이하를 봤을 때도 굉장히 신기했는데, 사실은 이하는 뮤직비디오 내에서 노래를 많이 따라 부르고. 그러진 못했어요.

그래서 아, 이게 어려운 기술인가 보다 했는데, 아니 몇 달 만에 수비는 어, 굉장히 노래를 많이 부르고, 그니까 어, 사실 기술 발전도 많이 이루어진 것도 맞지만, 저희 내부적으로 조금 더 프로세스를 안정화시켰다고 볼 수 있을 것 같습니다.

이하에서는 이제 저희가 노래를 부르는 것보다는 일부 이미지들을 많이 사용했었고, 중간중간에 노래 부르는 컷을 조금 사용했다면, 이번 뮤직 비디오는 거의 대부분 한 80% 다 노래를 부르는 걸로 저희가 어, 작업을 좀 진행을 해서 기존에 있던 버추얼 휴먼들보다 좀 더 자연스럽게 만드는 결과물이라고 볼 수 있을 것 같습니다.

그러니까 경쟁업체에서 보면, 야 이게 말이 돼? 뭐 요렇게 느낄 것 같아요. 보는 저도 그랬으니까 엔비디아에서 보고, 아 여기랑 개업하기 잘했다는 생각이 들 것 같아요.

아 그렇죠, 저희가 지금 엔비디아와 이제 협력하고 있어서, 엔비디아 파트너십을 지금 진행하고 있는데, 사실 엔비디아에서 저희랑 같이 좀 다양한 것들을 진행했으면 좋겠다라는 의견이라, 저 좀 기대를 하고 있습니다.

저희가 사실은 어, 내부에서 되게 기술에 대해 매우 기민하게 반응을 하고 있습니다.

많이 나오는 것들을 최대한 빨리 적용시키는 것들을 생각하고 있고, 이제 자체적인 기술 개발도 중요하지만, 무엇보다 지금 나와 있는 것들을 어떻게 이제 콘텐츠로 응용시키고, 이것을 기존의 프로세스와 어떻게 융합시킬 수 있는지, 이런 부분들에 조금 더 집중해서 하고 있는 부분이라 보시면 될 것 같습니다.

4. 📹 Challenges and Innovations in Virtual Human Content Creation (00:05:29)

Traditional virtual human content often has issues with close-up shots due to high resource and time requirements.
Therefore, many creators opt to implement other elements instead of close-ups.
However, our video includes a significant number of close-ups.
We have transitioned from a single-process approach to using multiple programs for distributed work, making close-ups feasible.
For this project, we decided to internally develop the capability to produce close-ups accurately.
Currently, many virtual characters cover more than half their faces because creating accurate facial details is challenging.
Internal processes now allow us to easily manage and enhance these aspects of virtual human creation.

View Script

기존 가상 인간들 콘텐츠들이 대부분 갖고 있는 문제 중에 하나가 클로즈 업 샷들, 아무래도 그랬죠.

자원도 너무 많이 들고, 시간도 너무 많이 들고, 그리고 그거를 구현할 바에 다른 컷들을 구현하자는 그런 흐름들이 많아가지고, 사실은 저희 비디오에도 클로즈업 샷들이 꽤 많이 들어가 있거든요.

기존에는 단일 프로세스로 한 가지 프로그램에서 끝났던 것을 여러 가지 프로그램들로 나눠 가지고, 분산해서 작업을 하다 보니까 클로즈업도 충분히 작업이 가능하겠다는 판단이 좀 들어서요

그러니까 이번에 클로즈업이 예술이던데...자체 개발로 하시죠. 현재 요런 것들도 지금 다 얼굴에 반 이상을 가리고 있는 상태잖아요.

저런 것들이 얼굴을 정확하게 만들어 내기가 워낙 어려우니 손으로 가리니까요. 그런 부분들을 저희가 이제 일일히 잡거나 제가 내부 프로세스를 통해서 좀 저런 부분들을 손쉽게 잡아낼 수 있는 부분들을 조금

(이거는 뭐 어디서부터 어디까지가 촬영한 거고, 어디서 어디까지가 입힌 거야, 그런 것도 그냥 쉽게 궁금해 하시거든요..)

5. 🎬 Generative AI in Music Videos (00:06:38)

The primary goal was to make the music video more dynamic than other projects.
The second goal involved maximizing the use of generative AI.
For the parts featuring the virtual human , the entire face of the shadow actor was replaced.
The background and certain elements were created using generative AI , significantly reducing the production time .
Face tracking is challenging with fast movements or covered faces , leading to errors.
Errors in face tracking must be identified and corrected .
The design, hairstyling, and costumes were created to closely resemble actual celebrities .

View Script

다른 뮤직 비디오나 다른 업체에서 진행했었던 것들보다 조금 역동적으로 표현해 보라는 게 일차적인 목표가 있었고, 생성형 AI를 최대한 활용해 보자라는 두 번째 목표가 있었습니다. 지금 버추만이 나오는 부분에 있어서는 얼굴 전체를 다 바꾼 거기 때문에 이제 실제 섀도우 액터에 대한 바디 부분은 동일하게 진행을 했고요.

이제 얼굴에 대한 부분은 전체 뮤직 비디오에서 모두가 얼굴이 다 바뀐 부분이고, 중간중간 생성형 AI나 혹은 배경에 들어가 있는 부분에 있어서는 어떤 공간 작업을 따로 진행하는 것이 아니라 생성형 AI를 통해서 만들어 냈던 것을 그대로 적용시키고 작업 속도는 거의 1 수준으로 줄어들었다고 말씀드릴 수 있을 것 같아요.

아, 이렇게 해서 요런 식으로 이제 얼굴에 대한 트레킹을 다. 진행을 해야 되는 부분이 있는데, 이게 빨리 움직이거나 혹은 얼굴이 가려지거나 하는 부분에서는 그런 트래킹이 좀 어그러지는 부분들이 많이 발생을 합니다.

저런 식으로 이제 오류 얼굴들이 나오거든요. 아, 저런 걸 찾아서 지우고, 되게 디테일한 게 진짜로, 실제로 디자인이나 헤어 스타일링, 의상이나 이런 것들은 실제 연예인과 동일한 형태로 진행을 하거든요.

6. 🌟 Insights into AI-Generated Virtual Humans and Their Business Models (00:07:46)

Hair and attire adjustments : Modifying traditional hairstyles and outfits for their virtual humans involves intensive and meticulous work.
Beyond simple AI : The company doesn't just focus on AI technology but aims to incorporate it into broader IP businesses, like IP-based dramas.
Global outreach : Collaborating with professionals to create documentary programs that can reach international audiences.
Innovative use of AI : Utilizing generative AI to create aesthetically significant content, such as shooting scenes reminiscent of Santorini, for commercial purposes.
Expansion potential : Exploring the possibility of expanding into music video production and commercial advertising.
Celebrity-like management : Treating virtual humans like traditional celebrities; aiming for them to succeed as IPs with commercial value in movies, dramas, and advertisements.
Sustained investment : Unlike traditional celebrities, virtual humans do not have contract limits, allowing for continuous investment and management.

View Script

그래서 자연스러운 단계에서 나왔었던 헤어스타일이나 의상들을 언리얼(Unreal)로 표현했을 때, 그 기존 있었던 머리 스타일을 좀 변경하는 과정 중에 있다고 보시면 될 것 같습니다. 헤어가 진짜 힘든 노가다라 그러시던데, 열심히 노가다 한 땀 한 땀 하고 있어서 어렵고 고된 작업을 계속 하십니다.

이게 단순히 그냥 뭐, AI 휴먼 업체는 아니잖아요? 그러니까 저희는 사실 기존에 있었던 기술들을 잘 활용해서, 사실은 IP 드라마라지만, 결국 IP 사업으로 가는 게 맞다고 보지, 단순한 기술만을 포커스하지 않고 있습니다. 방금 또 외부 프로들이 다플 있는 다큐멘터리 프로그램들을 하시는 회사들 리하고 있어서, 주 후에 조금 더 글로벌로 나가는데 포커스를 잡아보려고 하고 있습니다.

저희가 사실 해외에 이제 이런 컨벤션 다니고 취재하면서 좀 아쉬웠던 게, 사실 국내에도 드 메타가 대표적인 기업이긴 한데, 이렇게 좀 반짝반짝한 기업이 있는데 우리가 또 이렇게 제대로 못 알려 가지고 그런 부분이 좀 아쉬웠거든요. 저희가 가상인간 부분도 중요하지만, 사실 기존과는 틀린 게, 스케일감 있게, 약 산토리니든지 해외처럼 찍었는데, 실제로 그 부분을 전부 다 생성형 AI 가지고 만들었거든요.

그래서 실제로 상업적인 용도로 활용하는 몇 안 되는 사례입니다. 중에가 되지 않을까라고 해서 저희가 생각하지 못했던 향후 뭐, 뮤직 비디오 제작 사업이라는 광고로 사업을 확장할 수 있을 것 같다는 생각을 하고 있습니다.

IP 기반에 IP 얻는 게 목적이다 말씀하셨는데, 어떤 식으로 수비나 이야나 또 나온 두 친구들이 더 나오잖아요. 사실 저는 연예인 사업하고 똑같이 보고 있거든요.

결국은 저희가 만들어진 기술도 중요하지만, 그 기술로 만들어진 수비라는 IP가 결국 연예인처럼 성공하고, 네인 밸류가 있어져야 궁극적인 목표가 된다고 보고요. 단순히 인스타에 올리거나 유튜브에 올리는 수준이 아니라, 상업적인 영화, 드라마, 광고, 그런 데 출연이 된다면 결국은 연예인처럼 커갈 수 있는 거고, 이게 사실 매니지먼트는 계약 기간이 있는데, 가상인간은 사실 계약이 없다 보니 저희와 계속 갈 수 있기 때문에 저희도 지속적인 투자를 더 할 수 있는 거죠.

7. 🎬 Future Prospects of Virtual Humans in Media and Education (00:10:02)

Suvi is expected to show visible results by next year through various projects including movies and dramas.
The company aims to build on the success of creating a virtual human capable of participating in multiple entertainment ventures and winning awards.
They are inspired by technological advancements seen in James Cameron's works, like Avatar and Terminator .
Continuous tech development may lead to unexpected business ventures such as generative AI video creation, extending beyond the initial goal of creating virtual humans.
There is a significant demand for entertainment-based education businesses, especially in international markets.
Suvi is expected to take on roles in projects like those with EBS, possibly as a curator in educational museums.
The company's global outreach and collaboration efforts are crucial for growth in the virtual human and AI markets.

View Script

기존의 예능에 보조 MC로 촬영한 영화, 드라마들도 지금 찍고 있어서 내년 상반기나 하반기 중후반기 정도부터 보니까 가시적인 효과가 나지 않을까 싶습니다. 영화에도 출연하고 다수의 상도 수상하는 케이스는 사실 저희가 지금 처음으로 알고 있어서, 그런 실적들을 빌드업 해서 만들어 가보려고 하고 있습니다.

결과물을 뽑아낼 수 있는 테크 기업, 저희도 뽑아내고 사실 제임스 카메론 감독을 보면 아바타나 여러 가지 터미네이터를 했지만, 기술자보다 기술을 많이 알고 있어요. 기술 보유가 있어요. 그래서 저희는 사실 그런 것 같아요.

그들이 만들었던 기술들이 결국 전 세계 영화 시장이나 이런 CG 시장에 영향을 미쳤던 것처럼 저희도. 계속해서 기술 개발을 하고 활용하다 보면, 저희도 생각 못했던, 아까 말씀드린 저희가 가상 인간을 만들려고 시작을 했는데, 어느 순간 보니까 저희가 생성형 AI로 뮤직 비디오를 만들고 있어서, 그쪽 시장 사업을 할 수도 있겠다.

네, 교육 사업도 되게 크거든요. 한국 교육 사업들, 그래서 해외 쪽에 저희가 일하다 보면 또 그런 약간 엔터테인먼트 기반 교육 사업들을 많이 원하기 때문에, 그런 것들도 지금 실제로 저희가 EBS 수자원 공사랑 이번에 교육 박물관 같은 것에 들어가는, 거기 큐레이터 같은 형식으로 저희 수비가 이제 곧 공개되거나 생길 것 같습니다.

네, 저희도 같이 해서, 예, 그래서 메타키 빌드업하고 저희도 또 글로벌에서 또 이렇게 부탁드리겠습니다.