Information retrieval (IR) is the science of searching and organizing data and/or metadata. A search engine provides a good example of using IR.
A simple search engine may search for documents using a keyword search - finding all documents that contain the specified keyword and then sorting by how frequently the keyword appears in that document. The problem with this approach though, is that it is not context dependent, and could produce inaccurate results because of semantic ambiguity. Also, content providers could game this system by flooding their documents with irrelevant keywords so that their documents would appear higher up in the search results. This was the state of the early web; there had to be a better way.
Another way of searching for data might be to relate documents by how they link to each other, and how much their content relates. Essentially, we would group related documents together. This would provide more relevant search results with less error. This is (approximately) how the Google search engine works. It is this type of data structuring that is often of importance in MIR.
Applications of MIR
So what is MIR good for? What can we do with all of this data? Although there is academic research into a wide array of MIR applications, the following have become ubiquitous:
- Recommender systems - find similar (or dissimilar) artists or songs based on one or both of the following methods:
- Playist / Purchase history - use metadata from a users' playlist or purchase history to find related metadata. In order for this approach to work, the metadata has to be correct. This system is used, for example, in the iTunes Genius sidebar.
- Audio similarity - use actual audio data to find similar artists or songs. This is more difficult to implement because it must work with actual data, but the results are generally more accurate
- Playist / Purchase history - use metadata from a users' playlist or purchase history to find related metadata. In order for this approach to work, the metadata has to be correct. This system is used, for example, in the iTunes Genius sidebar.
- Fingerprinting - identify a given song or artist from only a small fragment of a song. This approach can be used by listeners to identify unknown songs / artists, find a song by singing / humming it, or by publishers to identify songs that are used in broadcast.