http://216.239.51.104/search?q=cache:ajIr7eo0XcUJ:stat.case.edu/~pillar/PRL/PhysRevLett_95_230202.pdf+Physical+Review+Letters+Finding+Needles+in+Haystacks&hl=enhttp://stat.case.edu/~pillar/PRL/PhysRevLett_95_230202.pdf--------------------------------------------------------------------------------
Source: Case Western Reserve University
Date: 2005-12-06
URL:
http://www.sciencedaily.com/releases/2005/12/051205161956.htm --------------------------------------------------------------------------------
Case Researchers Discover Methods To Find 'Needles In Haystack' In Data
A Case Western Reserve University research team from physics and statistics has recently created innovative statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but also have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissues in a mammogram.
Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters. <snip>
The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.
"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."
At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal. <snip>