Tech Published: 7 April 2024 Update: 7 April 2024

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

Tech news
0
(0)
Photo illustration of the shape of a brain on a circuitboard.
Cath Virginia / The Verge | Photos from Getty Images

Earlier this week, The Wall Street Journal reported that AI companies were running into a wall when it comes to gathering high-quality training data. Today, The New York Times detailed some of the ways companies have dealt with this. Unsurprisingly, it involves doing things that fall into the hazy gray area of AI copyright law.

The story opens on OpenAI which, desperate for training data, reportedly developed its Whisper audio transcription model to get over the hump, transcribing over a million hours of YouTube videos to train GPT-4, its most advanced large language model. That’s according to The New York Times, which reports that the company knew this was legally questionable but believed it to be fair use. OpenAI president Greg…

Continue reading…

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.


Author:
Flavius Trica Creatif Agency Flavius Trica

Web designer and Co-Founder of Creatif.Agency, with more than 8 years in the design industry and a strong passion for digital art, has successfully managed to deliver one of the most creative web design agencies in San Francisco, California. Valuing quality, creativity, and customer satisfaction, always strive to improve!

Want to work with us?

Select the service you're interested in