Django PostgreSQL Full-Text Search
Django offers powerful tools for PostgreSQL full-text search. Let's explore them one by one.
Instead of using icontains, iexact, etc., Django has a database function in the django.contrib.postgres.search module. It eases the use of PostgreSQL's full-text search engine. Let us check what can we do one by one.
The Search Lookup
You can perform a basic full-text search by filtering a single term against a single field.
>>> Blog.objects.filter(category__title='Programming')
[<Blog: New Features on Python 3.11>, <Blog: NextJS 13>]
Remember to use the search lookup, there must be ‘django.contrib.postgres’ in your INSTALLED_APPS.
SearchVector
A SearchVector allows you to search against multiple fields at once. In this case, the search is being performed on instances of the “Entry” class, which belongs to a “Blog” class. The “Blog” class has a field called “tagline”, and by using a SearchVector, it is possible to search against both the “Entry” field and the “Blog” field. This allows for more flexibility and a more comprehensive search.
from django.contrib.postgres.search import SearchVector
>>> Entry.objects.annotate(
... search=SearchVector('body_text', 'blog__tagline'),
... ).filter(search='Cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
The arguments passed to SearchVector can be any Expression or the name of a field in the database. These arguments are used to specify the fields that should be included in the search vector. When multiple arguments are passed, they are concatenated together using a space, which means that all the arguments are combined into a single string. This creates a single search document that includes all the specified fields. This allows for a more comprehensive search as all the specified fields in the search vector will be included in the search document. As I said, SearchVector can be combined allowing you to reuse them like:
>>> Entry.objects.annotate(
... search=SearchVector('body_text') + SearchVector('blog__tagline'),
... ).filter(search='Cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
SearchQuery
SearchQuery takes the user’s search terms and converts them into a search query object that is compared against a search vector. By default, the user’s terms are stemmed, meaning that the terms are reduced to their basic form, and then matches are looked for all of the resulting terms. Different search types can be used such as ‘plain’, ‘phrase’, ‘raw’, and ‘web search’ each changes the way the terms are treated in the search query and the query is formatted accordingly.
>>> from django.contrib.postgres.search import SearchQuery
>>> SearchQuery('red tomato') # two keywords
>>> SearchQuery('tomato red') # same results as above
>>> SearchQuery('red tomato', search_type='phrase') # a phrase
>>> SearchQuery('tomato red', search_type='phrase') # a different phrase
>>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw') # boolean operators
>>> SearchQuery("'tomato' ('red' OR 'green')", search_type='websearch') # websearch operators
SearchQuery terms can be combined logically to provide more flexibility:
>>> from django.contrib.postgres.search import SearchQuery
>>> SearchQuery('meat') & SearchQuery('cheese') # AND
>>> SearchQuery('meat') | SearchQuery('cheese') # OR
>>> ~SearchQuery('meat') # NOT
SearchRank
Here, the results returned so far are those that match the vector and the query, but it is possible to order the results by relevance using a ranking function provided by PostgreSQL. This ranking function considers factors such as the frequency of query terms in the document, the proximity of the terms in the document, and the importance of the location of the terms in the document. The better the match, the higher the rank value will be. We can order the relevancy by:
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector('body_text')
>>> query = SearchQuery('cheese')
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
SearchHeadline
Syntax:
class SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
The highlighted search results can be obtained by passing a text field or expression, a query, a config, and a set of options to a function in PostgreSQL. The start_sel and stop_sel parameters can be used to set the values that will be used to wrap around highlighted query terms in the document. The max_words and min_words parameters can be used to determine the longest and shortest headlines. The short_word parameter can be used to discard words of a certain length or less in each headline. The highlight_all parameter can be set to true to use the whole document in place of a fragment and ignore max_words, min_words, and short_word parameters. The max_fragments parameter can be set to a non-zero integer value to set the maximum number of fragments to display. The fragment_delimiter string parameter can be set to configure the delimiter between fragments. We can use like:
>>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
>>> query = SearchQuery('red tomato')
>>> entry = Entry.objects.annotate(
... headline=SearchHeadline(
... 'body_text',
... query,
... start_sel='<span>',
... stop_sel='</span>',
... ),
... ).get()
>>> print(entry.headline)
Sandwich with <span>tomato</span> and <span>red</span> cheese.
SearchVectorField
If the approach of searching against multiple fields is too slow, it is possible to improve the performance by adding a SearchVectorField to the model. This creates a separate field in the database that contains a pre-computed search vector, which can be searched against more efficiently, thus improving the performance of the search. E.g.
>>> Entry.objects.update(search_vector=SearchVector('body_text'))
>>> Entry.objects.filter(search_vector='cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
Trigram similarity
Trigram similarity is another approach to searching that uses groups of three consecutive characters, called trigrams, to compare and match data. In addition to the trigram_similar and trigram_word_similar lookups, there are also other expressions available in this approach, that can be used to search the data in a more efficient way. This approach can be useful when searching for data that may not be an exact match but is still similar to the search term.
Syntax:
class TrigramSimilarity(expression, string, **extra)
It accepts a field name or expression, and a string or expression. Also returns the trigram similarity between the two arguments.
>>> from django.contrib.postgres.search import TrigramSimilarity
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Katie Stephens'
>>> Author.objects.annotate(
... similarity=TrigramSimilarity('name', test),
... ).filter(similarity__gt=0.3).order_by('-similarity')
[<Author: Katy Stevens>, <Author: Stephen Keats>]
There are more methods of Django Search in Django 4.0. We’ll surely discuss that later. Hope you find it useful.