Building The Future Show - Radio / TV / Podcast | Transcript: Ep. 515 w/ Alan Bovik Professor University of Texas at Austin

Ep. 515 w/ Alan Bovik Professor University of Texas at Austin

June 5, 2022 / 53:39/E533

His work broadly focuses on creating new theories and algorithms that allow for the perceptually optimized streaming and sharing of visual media.

Intro / Outro: Welcome to building the future. Hosted by Kevin Horek with millions of listeners a month. Building the future has quickly become one of the fastest rising programs with a focus on interviewing startups, entrepreneurs, investors, CEOs, and more. The radio and TV show airs in 15 markets across the globe, including Silicon valley for full Showtime past episodes. Or to sponsor the show, please visit buildingthefutureshow.com.

Kevin Horek: Welcome back to the show today. We have Alan Bovik. He's a professor. Welcome to the show.

Alan Bovik: Thank you. Nice to be here.

Kevin Horek: The I'm really excited to have you on the show. I think what we're, what you're doing is really innovative and cool. And I'm selfish. You really want to learn more about, but maybe before we get into all that, let's get to know you better and start off with where you grew up.

Alan Bovik: Well, I grew up in the Midwest specifically. I, my, most of my childhood was north of Chicago, north side Evanston. I went to the, the state school university of Illinois in shampoo, banana, and I, Urbana champagne and got all my degrees there. I knew I wanted to work with students and be a professor and do research. I took a job in Austin where it was a lot warmer.

Kevin Horek: Okay. What made you know that at such a early age?

Alan Bovik: What I wanted to do? Golly, I, to be honest, I didn't, I mean, my dad, when I was in high school, I, I was one of those kids, well, I'm doing well at math. I wasn't, I really wasn't a great student for the first couple of years. I buckled down and kind of sped through, but, my dad said, you should be an engineer. To be honest in high school, I didn't, it's not like today where the students know everything, high school students, they, they're probably already coding in Python, but, I'm like, okay, what's an engineer. Do I didn't even really know. So, back then it was like the space age and the nuclear age. I said, okay, I'll be an engineer. I signed up for nuclear engineering, but I got bored really quickly, all, it was too much chemistry for me, chemistry. I'm sure it was great.

Alan Bovik: It's just not great for Al Bobek, as far as being fascinated, but then I started seeing some computer science stuff. I transferred into computer engineering and that thing. And, you know, I liked the program. It was fun, but, I wasn't like really, that motivated and, and I, I got great grades, towards the end and undergraduate. And I said, well, this is great. I liked the math. I liked theory. I like control system theory, which was kind of where I was headed. I went and I worked for a fellow in grad school working on stepper motors. I'm like, well, I liked the, but this is really boring. At that time I took a course from the great Thomas Wong, who is the greatest image processor of his generation for sure. And I just loved that class. He came in, he was wearing a sweater and a mug, and he showed all these amazing image processing algorithm results.

Alan Bovik: I mean, this is, back in the 1980s, nobody was doing digital image processing and the world yet. Right. But there was such potential and promise. And, I realized that, I'm a very visual guy, like artwork. I love the movies, I get, if I don't go see a movie every week, I, I go into withdrawal. So, and I just fell in love with it immediately. I've been doing that ever since, till today.

Kevin Horek: Fascinating. Okay. Very cool. Walk us through getting a job at the university of Texas at Austin and what you're doing there.

Alan Bovik: Oh, sure. Well, when I graduated, it was J that was 1984 with my PhD. It was the best job market you can imagine. It was the beginning of the high-tech revolution. And, that was right when Microsoft had just been, starting to get hot thing, apple was hot, there was just so much going on. I literally, I mean, I had 18 job interviews. It's like the AI guys today, ? And, and so, I interviewed a lot of universities, a lot of companies all over the country, but I knew, I, I didn't really feel like I wanted to go to the coast because a lot of population and all that. I wanted to go someplace warm after slushing around in Chicago for my whole life, and that weather and I heard from Tom long, my advisor, Austin is a, both the university and the town. They have an incredible future.

Alan Bovik: And as in everything he was right. I focused on that interview and it went really well. Well, I ended up there 38 years ago, and I've been at UT Austin since then. What am I doing at the highest level? Well, I'm teaching classes and what I love image processing digital video. I get big classes these days, because instead of being a top, nobody using it now everybody's using video and digital images, right. Your phone, your iPad television, you go into a store and there's walls with digital image. It's just everywhere. Everything is streaming. Now television is all digital and streaming and social media. It's just such a great hot space that, w there was never a boring moment.

Kevin Horek: Very cool. Walk us through kind of what that means and what you teach, and then let's get into the technology that you're developing, because that to me is really fascinating.

Alan Bovik: Yeah, sure. I think today, people are pretty sophisticated about, what are digital images, they know, for example, what's the, of their display on their television, oh, it's 4k, and they may even know what those numbers mean, or on their phone, if it's an iPhone, their retina display and that thing. So, broadly speaking, what I teach is, algorithms for processing images to do all kinds of things, to make them look better, to make them easier to transmit by compressing them into a tiny space compared to their original size. You can send it to a friend that thing for finding things in images like faces, finding faces is really important because we're primates and we want to know our faces are, and that's why, face recognizers are so important or maybe you're recognizing somebody's face, or maybe recognizing them by their Iris of their eye. So, it was very, I mean, images, let me put it this way.

Alan Bovik: Vision is utilizes half of our brain, half the neurons in our brain, that was a hundred billion neurons in some way or other okay. Meaning we're extremely visual creatures, all of us, some of us more than others, not everybody likes to go see movies every weekend, but, we're very visual creatures. It's our main mode of communication between each other, especially today. It's really important in a lot of ways. When my other classes video, of course, this is when you introduce the time dimension. A video is basically a, a motion picture, just like, movies. And now they're digital on your television. They're digital at theater now. There's a lot of other interesting questions that arise to make those videos, smooth and continuous. Again, they take up enormously more space and bandwidth than just single pictures do they are, in fact, if videos today occupy about 80% of all internet traffic.

Alan Bovik: Okay. 80% actually, I think it may be up to 82%. Okay. Recently,

Kevin Horek: I guess that makes sense with YouTube and Tik TOK being kind of the top apps, right?

Alan Bovik: Yeah. YouTube Tik, TOK, Netflix, Amazon prime video, all of that. I mean, you can make a long list and Facebook is also, so yeah, that's what the bandwidth is. It's super important, to people who are, viewing all of these that, if you're watching Netflix, you're paying that, I dunno what it's up to $15 a month or something you want that experience to be as realistic and high quality as possible, and you don't want delays or that little spinning circle in the middle, or are any of that kind of stuff to happen. And, if you're like Netflix or Amazon, you want to be able to send those at a reduced cost. You're not paying for a lot of cloud space or you're not, having to spend too much time, processing the video, all that kind of stuff. It's important to them because they spend a lot of money on streaming those videos to your homes.

Alan Bovik: Of course, it's all a competitive landscape, because you already mentioned several of these and they're increasingly in competition with each other and there's Disney, there's apple, and so on. All of them, pushing those bits, those video bits to, hundreds of millions of people every day.

Kevin Horek: Okay. Talk about the technology that you've developed, because if you're solving this problem for these big companies or helping solve this problem by making everything faster and smaller or faster, because it's smaller, like how did walk us through that?

Alan Bovik: Absolutely. So, I mean, you've got the basic idea of what had been developed for many years before, I became a serious researcher in this area, which is, yeah. They knew that you want to send pictures faster and in a smaller form, because if we didn't have something called video compression, okay. Yeah. We could never send all these videos all around the world. Okay. Because we can press them today, maybe a hundred to one, which means you only send 1%, as many bits as is in that original video. You said that 1% and the other end, they get 1% of the video, but then they're able to decompress it. Oh, there it is. It's the same video at least to your eyes. Okay. That's the background video compression has been around for a long time. I certainly didn't invent video compression. Although my advisor, Tom Wong was one of the inventors of video compression.

Alan Bovik: I learned a lot about it from him. What we have done differently in my lab, this is myself and many brilliant graduate students. I've been fortunate enough to have over the, especially over the last 20 years, what we've done differently. Well, when I came to university, I realized not only am I interested in digital pictures and videos, I began to meet and encounter visual neuroscientist and visual psychologist. I became very interested in that and started doing research into how we see how we process visual information in the brain. Why we look where we look, we move our eyes around all the time. Why don't we look over there and that thing. I learned, I really became a self-trained visual neuroscientist and my graduate students when they joined my lab, I also began to push them towards, becoming also visual neuroscientists at least, in terms of having the coursework and that thing.

Alan Bovik: What we've done differently, which really nobody in the image or video community was doing much at all, was to begin to really take how we see the visual brain into account. In other words, if we understand how we see at some level, and we do understand it at low levels, I mean, we don't understand completely why you recognize, an old friend's face from 30 years ago or something, but we do know how the visual brain takes what it senses at the retina and begins to process it, using some algorithms that crunch it into a very efficient form and we can model those algorithms, and that's a good word algorithms very well, very accurately. In other words, we can create mathematical models of what's happening in the retina and in the primary visual cortex, which is in the back of your head, in the back of your brain.

Alan Bovik: We can model what's going on there very accurately and also some other brain centers as well, which I won't go into the point is we can create math models of how we see at least at a low level that are very accurate, that fit the scientific data. If you make electrophysiological measurements, obviously not on humans, but they've made them on other animals with very similar vision systems like cats and monkeys. What we do is we try to take these mathematical models and we bring them into image and video processing algorithms to improve things that are related to the perception of those. One of the big things that we've done is to create algorithms that can accurately predict what a human will say is the quality of a picture or a video when they observe it. Now at first they say, oh, well, that's easy. You don't, you just look at the number of pixels and the answer is emphatically.

Alan Bovik: No, you can have, a picture with, 50 megapixels. That's a very poor quality, okay. Because the sensor was poor or your hand was moving when you took the picture or was low light. And so the picture was noisy. There was literally an infinity of different kinds of distortions that can affect that picture, making it, less desirable to view. Okay.

Kevin Horek: Sorry, just to cut you off is, and I don't know if this is true and I'm curious, I think this is a good time to ask you, is can the human eye can only also see like after 4k or eight K or whatever the number is, we can't see beyond a certain standard, is that correct? Does that play into what you're talking about or not really at all?

Alan Bovik: Oh, it does. Yes. Because there, you're what you're really bringing up is the concept of a viewing distance. Okay. Your brain, your visual brain, your eye, the whole apparatus has a finite bandwidth. Meaning, if you, I mean, a good example is if you're reading a book from two feet away, no problem. You walk, you know, 20 feet away. Well, it's all blurry now. You can't read it anymore. That's because of the finite bandwidth of your visual system. Okay.

Kevin Horek: That factors in then too, like how good my eyes aren't into what you're talking about as well.

Alan Bovik: Yeah. Televisions, the old HD televisions of say the 2010 time period, we call it 10 80 P those, if you get up close and you watch a basketball game, you're like, man, this is full of artifacts. You know, every player is running around. He has all these little compression artifacts. It looks like little, mosquitoes running around them. A little blocking artifacts is terrible, but if you are laying in bed or in your couch, 20 feet away, you don't really see those, the distance effect removes that. Okay. So, today we have very high resolution televisions and of course, a content provider like a Netflix or Hulu or something they're captured with the finest cameras, with the finest cinematographers, very high quality content. Before it is streamed, it looks great. It really doesn't have distortions, but when you compress it, then it, then it starts to have it, distortions.

Alan Bovik: In fact, the way that a typical streaming provider operates is that they'll we'll have for each video, they will have maybe 15 to 20 compressed versions of it, ready to go in the cloud for when you order that video. Okay. Each one, how is it decided which one is downloaded to your television at any given time? Well, it's because they can actually measure the bandwidth conditions because, especially if you're on a mobile, imagine if you're on a mobile device, your phone, your iPad and you're watching your favorite cooking show. All right. The provider can measure, if you're in an urban conditions where it's high traffic, meaning high internet traffic, then the band was conditions are difficult. They will send a version of the video that has fewer bits now, because it has fewer bits it's of lower quality because it's compressed more, the more you compress.

Alan Bovik: Well, what does compression do? As I already said, you're throwing away information, never to be seen again. Right. Okay. Although you can throw away a lot and still reconstruct it and it still looks good. This is one place where our algorithms come in. We have developed a couple of algorithms in our laboratory, which are used by pretty much everybody to control. For example, the definition of those say 15 versions of each video. Okay. So I've simplified it a little bit. They actually usually do this on a per shot basis or a per scene, imagine, different scenes of a movie say, okay, so they will compress usually on a whole scene basis. Basically it's a file transfer when they send it to you, but they'll create like 15 versions each one at a different quality as measured by one of our quality prediction algorithms, our perceptual quality prediction algorithms, which will say, well, a human being would say that this scene the way you've compressed, it would get a score of an average human with very high reliability of say 0.95, which would be very good.

Alan Bovik: Okay. Another version of the same scene might have, might have a reading of only 0.9. Another one might be 0.8, five. These are just examples I'm giving. The ones that have lower scores will typically also be much smaller files so that they can send it to you in those difficult traffic conditions. Okay. Obviously if they're sending fewer bits, it makes it through better and there's less likelihood that there'll be one of those spinning circles, which everybody, doesn't like,

Kevin Horek: Right. You're giving, but then, so that's basically what happens when you start streaming something, the quality might be like an eight, like you mentioned. After a few minutes it kicks into that 9.5, it just sends you a different file. Or how does that work?

Alan Bovik: Well, the client, but what I mean by the client is your actual device, your television, or your iPad or iPhone, whatever you're watching on will know that it has a selection of files to pick from. Okay. Okay. Based on those bandwidth conditions and also based on, well, how much bandwidth will this file take to transfer and what is the perceptual quality? It will take those into account and then ask for the optimal file. Okay. The optimal file amongst those. Your television asked for the best one and typically it will say, well, I want something that has a quality at these at least this level, if possible, as measured by one of our algorithms. It'll find the one that has amongst those maybe that has, a low bandwidth. It was a smaller file to transfer.

Kevin Horek: Got it. Interesting. That then part of that loading period, it's deciding which one to send me or what's happening there or it's just connecting or,

Alan Bovik: Well, I think that, there is a loading because, you have to transfer the files and that takes some time. What most people don't know is that, while you're watching your television, there are files loading and there's something in your television and in any device you're watching on called a buffer. Okay. That buffer is basically, holds a few seconds of the video in it. All right. That way, I mean, for what Netflix, the bandwidth gets so bad that for a split second, Netflix is transmitting nothing to you. Okay. It's stream case, but it does happen, in downtown Manhattan or something for maybe for a couple of seconds. Well, the TV will still keep playing. Cause it's got a, it's already buffered, meaning stored two seconds of the video. We'll, if the bandwidth conditions improve, then what fill up the buffer again, ? And, and it's just, you never know that the available track, the bandwidth, what to zero, when that happens.

Alan Bovik: However, sometimes, it cuts off for too long. The bandwidth gets too low, or it goes to zero for too long. If the buffer empties out, that's called, a rebuffering event and you get a little spinny in the middle of your screen, that's when you get the Spinney and everybody goes, and it goes back to sleep if they're watching at night, so that's the compression scenario of what we do. That's what some of our algorithms do. Those algorithms, as I mentioned, are based on models of, how we see, but more specifically, well, let me just back up people have known about distortions for a long time. You know, they've known about blur. They've known about, you know, noise. They've learned about, jitter or shake. If your hand is unsteady, all that kind of stuff. Every type of distortion really has an infinite number of variations. You can blur a picture infinite number of ways.

Alan Bovik: It can be, there's different lenses that have different blur functions. If you're out of focus in different ways, depending on how you're moving the camera accidentally, cause you have shaky hands that blurs in different ways. If the light is low, then the sensors then give a blurry. I mean, it just goes on and on. There's just really an infinity. People haven't been able to adequately modeled distortions well enough. Okay. That's what, but nevertheless, that was 30 years of research trying to model all the distortions and then predict them, but it never worked. What we did that was different is we said, we're not going to try to model the distortions instead, we're going to go inside the brain. Okay. We're going to try to statistically model the responses of neurons to pictures that are distorted instead of, pictures that are not distorted. We found that if you do that, there are very predictable differences between the statistics of distorted pictures, no matter why and how they're distorted and pictures that aren't distorted.

Alan Bovik: Okay. It turns out to be very accurate predictors. It's all, internal to the brain kind of prediction. And that's why our algorithms worked. They were far better than anything that came before. It, became noticed by television people and the, around the 2006, 2007 time period, which gave me an opportunity to be a little opportunistic and reach back. Pretty soon it was being used throughout the television space to control the quality of the, the content that people are receiving. And now it's pretty much everywhere.

Kevin Horek: Very cool. No, that's awesome. Okay. How does, what you and the team have developed play into some of the newer technologies that are probably going to make their way to the web more and more coming kind of from a VR AR metaverse type technology?

Alan Bovik: Absolutely. Let me just back up because there's other providers that don't send high quality content. There's other people than the Netflix is and all that. And that means social media. If you go on YouTube, you play a video half the time. It's, well, I don't know what percent of the time, but you see videos that are terrible, right. It's really important to YouTube and to Facebook and to Tik TOK and the companies like that to understand the quality of those videos. Those videos are afflicted by a much wider range of distortion. Netflix is just compression pretty much in some scaling. Sometimes they make the video smaller than expanded, but YouTube is just huge number of possible distortions because people have, so many different kinds of cameras, so many different kinds of lighting conditions, so many different skillsets. Sometimes the videos are old, you'll get, 1940s, videos, television from the fifties or six days, just everything in the indie studios, which, videos that have been processed like a hundred times over the years, so crazy.

Alan Bovik: There's just no way to model that. But again, these brain models, they work. Okay. Companies use these to assess, they go on by content. If you're a provider, they use our algorithms to examine the videos they're considering by, because there's too many videos for humans to look at it, their company, they buy lots and lots of videos. Another big application is, algorithms, which can, just take a video and decide, what's the quality right there. That's another, series of algorithms that we created that are very widely used. Back to your question, it's a great one, VR, okay. Immersive AR XR. The whole idea there is it's true of television too, is a more immersive experience where you're there. Right. You're in the middle of it. That's part of the reason why we go to the movies because you feel like you're there. So, VR is kind of the ultimate of that or has the potential because you don't visually, you don't have any other experience.

Alan Bovik: There is no boundaries to the screen. It's a three S another term is 360 video, which you're watching through a head-mounted display. We're doing very interesting work there with some companies, including Metta labs, who was, big in this space. Sure. When you put on a helmet and play a game, all right, that game is pretty much graphical computer graphics, which means that it's rendered and rendered content. Well, it's pretty high bandwidth, but it's not as high bandwidth is cinema. Cinema is much more detailed and not as predictable because you don't create it. Right. If you can create something, then it should be easier to compress, right. Cause heck you created it, everything about it, but cinema or television, it's much higher bandwidth. You notice that if you ever, played with a, an Oculus, you don't really look at much content that is like, super high quality movies, but that is absolutely a whole holy grail of the whole VR industry, not AR is a little different, but the VR industry has to put you there.

Alan Bovik: So, you're watching the Avengers, and it's so realistic and high resolution that, oh, what at Sonos, is sitting there sadly you can pat them on the hand or something. You feel like you're there. Right? Yeah. The problem with that is it's so high bandwidth. So, we have televisions to that are 4k. 4k is totally inadequate for virtual reality. First of all, well, think about it. The screens are just, an inch away from your eyes. Right. So first of all, it's very close. Secondly, it's complete surround view. Okay. Thirdly, you're moving your eyes around and your head. Okay. So, it's a whole huge 360, so the videos really need to be, eight K at least 10 K preferably. Okay. That means, that's like four to eight times as much data right off the top and you want it to be high quality because boy are, they're going to notice, I mean, I think people are, would be a little forgiving now, new technology or it's cool enough, that thing, but they want it to be perfect.

Alan Bovik: And so what does that mean? Gigantic bandwidths Ganek amounts of data. There's another thing they don't want that big wire coming off the back of your head, the tether, that's the, you can buy VR helmets that have a big tether to your PC, that carries that bandwidth pretty well. You want to be able to walk around in your yard or out in the park or wherever you're going to do your VR experience. That means, somehow it's gotta be, all of the helmet somehow, which is pretty darn hard right now because you don't want it to, it's just something on your face. You don't want a huge form factor or a wifi or something equivalent. Okay. Wifi is the best solution right now, but still too much data. So what are we doing? That's kind of a big lead-up okay. What are we doing? Well, back in the nineties, one of the things that were doing as I already mentioned was, predicting where people look where they look okay.

Alan Bovik: One, yeah, one reason we wanted to do that is because of the way the retina of every human being's eyes is built. So it's basically a sphere. The back half of the sphere is covered with, rods and cones. We'll just talk about cones, which are, what you use what to watching videos and daylight and all that. The other rods are for nighttime. The cones have a very high density right in the middle, right? Where the optical axis of the lens strikes the retina, very high density of hundreds, of thousands of cones per millimeter squared. Away from that, it falls off very fast, becomes many, much lower density of columns. The reason why the brain does that, it's an immediate form of data compression. The human visual system is an incredible feedback control system, where somehow you choose where to put your eyes based on like activity or color or action, or is a face or whatever the eye moves around.

Alan Bovik: What's really happening is that the brain is allocating the highest density area of, your photo receptors to the area of interest around it. Everything's blurry. If you ever reading a book, the word you're looking at is sharp, clear, but all the words around are blurry. That's because they're being sensed at a much lower resolution by these surrounding areas of the retina called the periphery. Okay. Okay. So why is this important? I know it's two or three steps, but if we know where a person is looking, we can compress those gigantic VR videos in a different way. Okay. By making them, the compressed versions, very high resolution at the point of gaze where we know the person is looking and then have the resolution follow more. So we compressing normally more. We get, another 10 times factor maybe of compression in that way, which is enough to be able to maybe do VR, in, with cinematic quality at say, 10 K inside, a head-mounted display.

Alan Bovik: That kind of Fulvia added compression, we're doing again, uses brain models. We are both looking at, predicting where people look as well as using visual eye trackers, which sit in we've have helmets had head-mounted displays that have visual eye trackers inside that very accurately actually measure where you're looking. These, you know, are commercially available too. So, it's not in every Oculus you buy or anything like that, but it might be in the future is, cinema really becomes a big thing in VR. And so w that's what we're doing. We're using principles of visual neuroscience. Again, how the mapping of the retina is very high resolution in the middle and then lower and lower as you get away from the center of the retina to create, VR, specific ways of compressing that content. We were doing that in the 1990s, we just thought it was exciting and cool.

Alan Bovik: Everybody was just yawning at us, and nobody's cited our papers and Fulvia compression much, but suddenly VR comes and now it's a hot topic.

Kevin Horek: Fascinating. No, very cool. Just to go back to what you said, if he's not going to go that some of these sensors might actually be in headsets in the future. Are you suggesting to companies like meadow or other companies you work with, look, we can do compression to this point, but if you want better compression, you guys need to put in these actual pieces of hardware into your headset or whatever it is. Are you guys there yet? Or it's just, that's the hope down the road.

Alan Bovik: Oh, you can put the, you can put eye trackers in now for sure. And they do. I mean, I, typically it's not sold it cause there isn't a big reason yet because the fovea, the compression algorithms, this is current research because the old stuff we did in the nineties was, first baby steps, what we're doing today is we're using well, because it's gotta be really good, right. We're using, deep learning to learn, foveated video compression, that thing. The algorithms and they, hardware, or let's just say the solution to knowing where a person looks are evolving together. I mentioned that maybe we can predict what I look, we studied that back in the night we wrote, we had, big grants from national science foundation for, figuring out where people look and why, and we made good predictors, but there weren't good enough just not good enough because it's complicated.

Alan Bovik: People are, people are complicated, what I mean, three people watching a TV show, depending on the content, people are looking at different places. Right, right. So it's complicated. Maybe some people want to see the shiny new car, the handsome guy, the pretty girl, or the action in the background, who knows what, right. So, the algorithms are too low level, not smart enough, but it's maybe possible that deep learning engines based on massive datasets of, where people look as measured by trackers, which is a great ground truth. If we can, develop deep learning algorithms that are accurate enough, then we won't need the eye trackers. It'll just be more processing in the helmets as well for this content. These are the two or three places people are most likely to look probably not just one but two or three. You can have, multiple places where you allocate, higher resolution compression.

Alan Bovik: So that's a possibility too. So both are possible. It might be the eye tracker solution because the predicting where people look problem is still very hard. Some people think all deep learning can do anything. It's not really true. Maybe that's a maybe on that, but we know eye trackers are accurate and pretty cheap today and can be used for that.

Kevin Horek: Interesting. Could you take it a step further then, because you say, okay, if people traditionally look at these three parts, but if I was willing to give you my personal information to say, Kevin would probably look at this one of the three, because we know his personality type. Like, could you take it that step further and personalize it to me?

Alan Bovik: I would say no reason not to be able to do that. Okay. I mean, anytime you were talking about personalization information, I just want to say, I don't do any work like that. I'm really a video guy, I'm just brain in videos, but people could do that. And, naturally anytime you talk about personal information, a little sensitive topic too. Right. I mean, assuming that some, let's just say innocuous personal information that is available such as, oh, you like basketball. Okay. Or your favorite player, you wrote somewhere as, Shakila Neil or, stuff like that. Even that, I dunno if that's giving away too much or it should be private or what gender age, that thing. I'd say it's conceivable of course not that I'm suggesting it because I'm not sure how much it would be an add on not, not withstanding what I just said, but I think it's conceivable it could help, determine with greater reliability, which direction somebody's going to look.

Alan Bovik: Then, there's an awful lot of assumptions there's guys who like looking at guys for example. So, I mean, it's just too hard to, hard to, hard to know, necessarily, but people see in, very much the same way, the people's vision systems setting aside, those that are visually impaired, right. People's visual systems are remarkably similar. Okay. We actually, tend to largely look, if you look at, a, a, an image that shows where a bunch of people looked at, get pointed, their gaze little red dots or something, the point of gaze now people largely agree just like, people incredibly agree about picture quality. Okay. I mean, if you show a thousand people a picture and ask, what's the quality, and then you divide those people into two groups, those two groups will have a very high correlation above 0.9 typically in agreement about whether it was the quality of that picture.

Alan Bovik: So, quality where you look all that to some degree, there's a lot of repeatability or, or predictability amongst people, but, predicting what general population we're looking at, still a difficult problem.

Kevin Horek: No, that makes a lot of sense. Okay. I'm curious because I don't think it happens very often and it's probably really challenging, but you've seem to bridge the gap between research into actually making this a real business. What advice do you give to other people like yourself that have technology that they could actually leverage to companies out there? Because I know at least the university here, they put a ton of money to try to actually like productize research.

Alan Bovik: Yeah. Well, my approach has been a little different because, I entered as an engineer. I was very much a theoretician, prove theorems, that thing about pictures and videos. And, but then I became interested in the science side. Okay. A big key thing there is I crossed disciplines. Okay. Which is if I give, any promising young research engineer, scientists, any advice, first thing I'll say is, it's the problems between the spaces that are often the most interesting and often have the greatest impact because people, your general engineer or scientist, you have to leave your comfort zone. You've already been to school once, but then go learn visual neuroscience, forget it, that's, I'm already doing something interesting. So it's hard to do that. I encourage, crossing those disciplines so that you can find the problems that are in between. Okay. Now, when you do that, you find interesting things.

Alan Bovik: One thing I didn't mention is, we don't just do theoretical modeling of the brain we put in the perspiration. Okay. That's the second thing, the kind of huge perspiration we put in, and not just a lot of thinking about theories and all that, we conduct very large scale human studies and have been doing it pretty much every semester for the last 20 years. What we do is we, we sit people down in front of, monitors, we show them pictures and videos and VR and 3d movies and all that kind of stuff. We asked them to rate them in various ways. Like what's their quality, what kind of distortions do you see? Where, or maybe we record their eye tracking, where they're looking and we've, literally collected tens of millions of human judgments of, picture 2d, 3d streaming, video, high frame rate HDR, ultra HD, every aspect of video and pictures of the quality of all those different kinds of quantities over the years.

Alan Bovik: We've even gone on the internet to, do crowdsourcing, which is really hard to do. It takes months and months to design a crowdsource study of, pick people, looking at pictures over all over the world and recording the scores, right? Like quality and that thing, because one is, well probably the biggest thing really well, there's the logistics and all that. It costs a lot of money. We have to get grants from industry typically. There's a lot of cheaters out there, there's a, there's a tool called Amazon Turk where people will get on there and they make money because the people are making money. 90% of them are honest, but that 10% can really follow up your human study because they literally like write computer programs to just do your study automatically. All the, all the, everything that's entered is just garbage. You know, things like that.

Alan Bovik: Or they just tap the button over and over again, as it goes from video events. Were able to detect all those things at this point, and just throw them out of the study and saw, but it's a lot of work, but to get to your point about industry, and I know I'm flying along verbally, I hope, if you have any questions. No, no.

Kevin Horek: I've been interrupting you. It's good.

Alan Bovik: Yeah. Okay. So, I mean, I have to tell you, they noticed us first okay. To the credit of industry and, we are, we've just been doing what we've been interested in. Okay. However,

Kevin Horek: Sorry. Sorry. Can I cut you off there? Sorry. Yeah. You got like, they reached out to you, but you must have, well, you mentioned you were publishing papers and whatnot, but was there anything else that you were doing to put yourself out there to potentially get found by somebody industry?

Alan Bovik: Well, yeah. I mean, we're mostly, as you say, publishing papers, one of the biggest things we did was I've never really been entrepreneurial type, at least not in those days. Okay. Recently I started thinking about just for something else to do. Okay. Okay. There's some people I'm talking to, but I won't get into all that. Okay. I'm engaged. I'm actually involved in a couple of things, but not talking about that today. What we did was we created an algorithms and we just put them online. You know, we create a website. We said anybody who wants to use these, and that includes all of the results from every human study we've ever done.

Kevin Horek: Okay.

Alan Bovik: We put the data out there and we put the software out there. Okay. And then people start to use it. People who are perspicacious industry noticed it and they tried it. They said, wow, this works okay. We can actually predict quality is kind of the reaction. So yeah, we got calls. There was one company called video clarity, which was a very early one is run by a guy named Blake Homan, very small company at the time. He just said, this is great. Can, we'd love to work with, work with you on this. I, my response was, look, we will help you out here. And this code is already public. All right. There's the intellectual property is vanished in terms of dollars, ? We'll help you in every way for you to help market your video clarity products. Pretty soon, they were selling one of our algorithms called these structural similarity index or SIM they were selling it all over the world and their systems.

Alan Bovik: And, being very successful with that later on, Netflix approached us. They were the first of the real biggies to approach us. And, we're, I'm just sitting there doing my research and they were using one of our algorithms in a larger algorithm, their system, which is called V math. Okay. It's mostly our algorithm and that, and they were experimenting with that and finding that it was, working well for streaming Netflix content. I said, well, we'd like to work with you on other problems too. They have funded us for, a better part of a decade on all aspects of perceptual streaming video, in their space, far more than I can describe right now, but I can say it, evolves things like, high frame rates or, deep learning, based video compression, all kinds of things that, can happen there in their workflows. Same thing happened with YouTube. Same thing happened with Facebook and I'll call it Metta labs.

Alan Bovik: Same thing happened with Amazon prime. They wanted to, they saw that were, kind of unique in this crossover between science and engineering and they came to us. Were, I just say, we're very lucky, right place in the right time doing kind of work that fascinated us in which we love doing. A lot of fortune there in terms of becoming relevant.

Kevin Horek: Very cool. Are you actively trying to recruit more companies now? Or are you still kind of just, well, when they come to us, we'll try to accommodate or a bit of both?

Alan Bovik: Well, I mean, I, I meet people and so on, but I honestly haven't recruited a company to fund our laboratory and more than a decade. Well,

Kevin Horek: It sounds like you don't need to,

Alan Bovik: I don't need to, I did this year that, there's two or three more companies, including a company called share chat india who reached out to us, they're funding one of my students. So, this it's just happened that way. Okay. That's very unusual, if I'm telling a young engineer, well, don't be that way go out and be very proactive. I used to be that way, so beyond 10 years ago, I would contact companies. I'd meet people and I'd say, look, I've got this great student, you've got an interesting problem. So I used to do that. It's just that we, we just had such a unique position in this space. We haven't had to do it anymore. We're funded by like nine or 10 different, high-tech, streaming or social media companies right now.

Kevin Horek: That's awesome. Congrats on that. That's huge.

Alan Bovik: Yeah. It's been great, but what it's really been great for is the students. They come to me, I tell them two things. I say, number one, the students here are number one. Okay. The reason I'm talking to you guys is I want to match up fantastic engineering problems in the video space with fantastic students. And, and you are going to become a collaborator in education with me if we have this relationship and it's worked out that way. With every company I work with, we agree that we're going to meet once a month, at least, although usually we meet much more than that in one way or another, where the student will present what they've done and if slides and all that, and just think about it, and then they give feedback and just how wonderful it is for a graduate student to be getting feedback from the video team at Netflix.

Kevin Horek: Totally. You can't read out bigger than that,

Alan Bovik: Right? Yeah. If Facebook other strings, and we're talking to the core, video team at Amazon, every once a month, that's what I would call. I was on this morning with two of my students. That has been, what's worked really well for me and the students get to work on the best problems and do relevant research that actually goes into these, globe spanning workflows.

Kevin Horek: Sure. What was the second thing you tell them?

Alan Bovik: Oh, well, I, I may have balled that all up and then work on, oh, there is a second thing. Of course, since it's the benefit of the students, everything we do is publishable and it goes into the open literature and you think, well, how would they ever agree to that? Okay. Because naturally companies are competitive and want to be proprietary and this and that and so on, but they agree. One reason for that is that they all realize that if we succeed on their problem and well, yeah. Others learn too well, it raises the whole space, right? It's like all ships rise with the tide. So, if we create an algorithm which we've done recently, that helps with high frame rate videos, meaning like faster than 60 frames per second, like your current television, like 90 or one 20 that can, be used to control and monitor the quality of those videos.

Alan Bovik: Boy, is that going to be important for what's coming because live sports, that's where they want that. So everybody wants to go into that. So, yeah, they could, if we develop it, everybody benefits is kind of the thing. The internet, isn't used as much. W we haven't really talked about that too much, but the internet is even with 4g 5g and, fiber and all that, the internet is stressed, data stressed. Okay. Video is continuing to increase exponentially because the videos keep getting bigger. The televisions are getting bigger. They get, they're getting, deeper, meaning more bits like HDR, and a high frame rate, a richer colors, everything is, keeps getting bigger. The more people everywhere are using it, watching videos. So, it's, it's really an ongoing problem that has to be continuously addressed.

Kevin Horek: No fascinating. I feel like we just kind of scratched the surface, but sadly we're out of time. How about we close with mentioning where people can get more information about yourself, the program and the students and what you guys are working on?

Alan Bovik: Well, we've got a website, the laboratory for image and video engineering at the university of Texas at Austin. If you go there, things you can find are, first of all of our datasets completely free. There's, like 35 terabytes of video data and human opinion scores on all of them that you all, every scenario you can imagine involving pictures and videos, 2d, 3d, all that. Okay. Every algorithm that we've created, we put right out there, you can download the code and play with it, try to apply it to your own city. Maybe you've got a camera you want to put it in or something like that. Descriptions of what we do. Also, my course notes, I confess, I don't update the online course notes on the website. There often enough relative to one of my class, but you can see what kind of stuff I teach and that thing as well, and pretty good description of everything too.

Kevin Horek: Very cool. Well, I really appreciate you taking the time on your day to be on the show, and I look forward to keeping in touch with you and have a good rest of your day, man.

Alan Bovik: Thank you so much. And thanks for having us.

Kevin Horek: Thank you. Okay. Bye.

Alan Bovik: Bye.

Intro / Outro: Thanks for listening. Please visit our website@buildingthefutureshow.com to join the free community, sign up for our newsletter or to sponsor the show. The music is done by electric mantra. You can check him out@electricmantra.com and keep building the future.

Ep. 515 w/ Alan Bovik Professor University of Texas at Austin

Broadcast by

headphones Listen Anywhere

Listen Anywhere