Balancing precision and recall in search results with a good controlled vocabulary is key for an excellent user experience.

Why is this important?

You want to be able to find, within your collection, not only exactly what you are looking for, but also what is similar so you can browse a well rounded collection of search results. It isn’t always about the first item returned in a search.

There is a balance that must be struck between precision and recall. As one goes up, the other goes down.

Using a controlled vocabulary has been shown to dramatically increase the precision in a search over free-text searching alone. When it comes to assets that are non-text based, like audio, image and video files, a controlled vocabulary is essential for adding key metadata to the asset to create precise searching for the user.

What is PRECISION? – It is the fraction of retrieved assets that are relevant to the search

credit: Ferrari.com

This is represented as:

Precision= number of items retrieved and relevant / total retrieved in collection

Is an equation that represents an ideal. You would always like to return all relevant assets each time a user searches your collection. BUT, each time your collection is searched there will be SOME assets that are irrelevant.

To increase precision you use specific terms (Ferrari NOT Sports Cars) to find the specific Ferrari assets.
So in a hypothetical controlled vocabulary you would find the following hierarchy with the number of results for each term:

VEHICLES (would return 400 images)

CARS (would return 276 images)

SPORTS CARS (would return 113 images)

FERRARIS (would return 15 images)

When a search is conducted you need to evaluate:

Are users getting too many results or too many irrelevant results.

Are users not finding what they are looking for or are their searches returning NO results.

What is RECALL?  – It is the fraction of relevant assets returned in a search

This is represented as:

Recall= number of items retrieved and relevant/total relevant in collection

High recall means that there is a comprehensive set of assets returned in a search, though there will be a number of assets that are irrelevant to what the user was looking for.

This would mainly be a free-text based search and requires that the text searched is spelled as the user inputs it in to the search field and in to the text field of the asset.

So searching for FERRARI, will chiefly return assets with this spelling, possibly the variation FERARRIS, but only if either of these words have been inputted in a text field belonging to the asset’s metadata.

But, if FERRARI is misspelled FERRERI, this would not be returned in a search for FERRARI.

When a search is conducted you need to evaluate:

How many relevant assets did the search return for the user?

Or did the search return NO relevant assets for the user?