Over the past few years, more and more universities have been posting videos and podcasts of lectures online, giving students instant access to classes. While the content is useful, locating specific information within the lectures can be difficult and frustrating. In order to solve this problem, the lectures featured in the Lecture Browser have been divided into several more manageable sections. In addition, the browser features a transcript of each lecture. The user can search the transcripts for certain keywords, and will be directed to their location in the video.
While several companies, such as the Blinkx and EveryZing online audio and video search engines, make use of software that converts speech into searchable text, the MIT researchers faced new challenges because they were dealing with academic lectures. For one, many lecturers are not native English speakers, making automatic transcription tricky for systems trained to identify only American English accents. Secondly, the words favored in scientific lectures are rather obscure at times. In addition, lectures do not necessarily have a discernable structure, making it difficult to break them into sections and organize the text comfortably for easy searching. When dealing with a non-native English speaker with an accent and vocabulary that the browser hasn’t been trained to recognize, the system’s transcription accuracy can drop to 50 percent. Although not useful for direct transcription, this low accuracy transcript can still be useful for keyword searches.
Despite the fact that there is still much work to be done, the browser is proving to be popular, with an average of 21,000 hits a day. Within the next few months, the MIT researchers would like to add a feature that attaches a text outline to lectures, enabling users to skip to a desired section. In the future, MIT plans to allow users to make corrections in the transcript, in the same way that users are welcome to contribute to Wikipedia. By bringing people into the transcription loop, researchers hope to improve the text’s accuracy and improve the users’ experience.
TFOT has recently covered the “StupidFilter” – an open-source filter software that will be able to detect “rampant stupidity” of web-content in written English.
More information about the MIT Lecture Browser can be found on the university’s website.