Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

Can someone please figure out how to use this on C-SPAN footage?

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion (1/22-2007 thru 12/14/2010) Donate to DU
 
Ian David Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Feb-20-08 08:22 AM
Original message
Can someone please figure out how to use this on C-SPAN footage?
Edited on Wed Feb-20-08 08:24 AM by IanDB1
Speech Activity Detection on Multichannels of Meeting Recordings
Book Series Lecture Notes in Computer Science
Publisher Springer Berlin / Heidelberg
ISSN 0302-9743 (Print) 1611-3349 (Online)
Volume Volume 3869/2006
Book Machine Learning for Multimodal Interaction
DOI 10.1007/11677482
Copyright 2006
ISBN 978-3-540-32549-9
Category VIII NIST Meeting Recognition Evaluation
DOI 10.1007/11677482_35
Pages 415-427
Subject Collection Computer Science
SpringerLink Date Wednesday, February 15, 2006

Zhongqiang Huang1 Contact Information and Mary P. Harper1 Contact Information
(1) Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907-1285,
Abstract
The Purdue SAD system was originally designed to identify speech regions in multichannel meeting recordings with the goal of focusing transcription effort on regions containing speech. In the NIST RT-05S evaluation, this system was evaluated in the ihm condition of the speech activity detection task. The goal for this task condition is to separate the voice of the speaker on each channel from silence and crosstalk. Our system consists of several steps and does not require a training set. It starts with a simple silence detection algorithm that utilizes pitch and energy to roughly separate silence from speech and crosstalk. A global Bayesian Information Criterion (BIC) is integrated with a Viterbi segmentation algorithm that divides the concatenated stream of local speech and crosstalk into homogeneous portions, which allows an energy based clustering process to then separate local speech and crosstalk. The second step makes use of the obtained segment information to iteratively train a Gaussian mixture model for each speech activity category and decode the whole sequence over an ergodic network to refine the segmentation. The final step first uses a cross-correlation analysis to eliminate crosstalk, and then applies a batch of post-processing operations to adjust the segments to the evaluation scenario. In this paper, we describe our system and discuss various issues related to its evaluation.

More:
http://www.springerlink.com/content/q16uh1k0703005tp/


Hat tip to WikiLeaks:
http://www.wikileaks.be/wiki/On_the_take_and_loving_it







See prior thread:


How to solve a vast majority of our problems, and answer many of our most important questions
Topic started by IanDB1 on Nov-02-07 04:51 PM (0 replies)
http://www.democraticunderground.com/discuss/duboard.php?az=show_topic&forum=389&topic_id=2193159


Printer Friendly | Permalink |  | Top

Home » Discuss » Archives » General Discussion (1/22-2007 thru 12/14/2010) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC