Pencarian teks penuh¶
Fungsi-fungsi basisdata dalam modul django.contrib.postgres.search
kemudahan penggunaan full text search engine PostgreSQL.
Sebagai contoh dalam dokumen ini, kami akan menggunakan model ditentukan dalam Membuat query.
Pencarian search
¶
A common way to use full text search is to search a single term against a single column in the database. For example:
>>> Entry.objects.filter(body_text__search="Cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
Ini membuat sebuah to_tsvector
dalam basisdata dari bidang body_text
dan plainto_tsquery
dari istilah pencarian 'Cheese'
, kedua menggunakan konfigurasi pencarian basisdata awalan. Hasil didapatkan dengan mencocokkan permintaan dan vektor.
Untuk menggunakan pencarian search
, 'django.contrib.postgres'
harus berada dalam INSTALLED_APPS
anda.
SearchVector
¶
Searching against a single field is great but rather limiting. The Entry
instances we're searching belong to a Blog
, which has a tagline
field.
To query against both fields, use a SearchVector
:
>>> from django.contrib.postgres.search import SearchVector
>>> Entry.objects.annotate(
... search=SearchVector("body_text", "blog__tagline"),
... ).filter(search="Cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
Argumen-argumen pada SearchVector
dapat berupa Expression
apapun atau nama dari sebuah bidang. Banyak argumen akan dihubungkan bersama menggunakan sebuah ruang sehingga pencarian dokumen menyertakan mereka semua.
SearchVector
objects can be combined together, allowing you to reuse them.
For example:
>>> Entry.objects.annotate(
... search=SearchVector("body_text") + SearchVector("blog__tagline"),
... ).filter(search="Cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
Lihat Merubah konfigurasi pencarian dan Meminta pembobotan untuk sebuah penjelasan dari parameter config
dan weight
.
SearchQuery
¶
SearchQuery
menterjemahkan istilah pengguna sediakan kedalam obyek permintaan pencarian yang basisdata bandingkan pada pencarian vektor. Secara awalan, semua kata pengguna sediakan dilewatkan melalui algoritma berasal, dan kemudian itu mencari kecocokan untuk semua istilah yang dihasilkan.
If search_type
is 'plain'
, which is the default, the terms are treated
as separate keywords. If search_type
is 'phrase'
, the terms are treated
as a single phrase. If search_type
is 'raw'
, then you can provide a
formatted search query with terms and operators. If search_type
is
'websearch'
, then you can provide a formatted search query, similar to the
one used by web search engines. 'websearch'
requires PostgreSQL ≥ 11. Read
PostgreSQL's Full Text Search docs to learn about differences and syntax.
Examples:
>>> from django.contrib.postgres.search import SearchQuery
>>> SearchQuery("red tomato") # two keywords
>>> SearchQuery("tomato red") # same results as above
>>> SearchQuery("red tomato", search_type="phrase") # a phrase
>>> SearchQuery("tomato red", search_type="phrase") # a different phrase
>>> SearchQuery("'tomato' & ('red' | 'green')", search_type="raw") # boolean operators
>>> SearchQuery(
... "'tomato' ('red' OR 'green')", search_type="websearch"
... ) # websearch operators
SearchQuery
terms can be combined logically to provide more flexibility:
>>> from django.contrib.postgres.search import SearchQuery
>>> SearchQuery("meat") & SearchQuery("cheese") # AND
>>> SearchQuery("meat") | SearchQuery("cheese") # OR
>>> ~SearchQuery("meat") # NOT
Lihat Merubah konfigurasi pencarian untuk sebuah penjelasan dari parameter config
.
SearchRank
¶
So far, we've returned the results for which any match between the vector and the query are possible. It's likely you may wish to order the results by some sort of relevancy. PostgreSQL provides a ranking function which takes into account how often the query terms appear in the document, how close together the terms are in the document, and how important the part of the document is where they occur. The better the match, the higher the value of the rank. To order by relevancy:
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector("body_text")
>>> query = SearchQuery("cheese")
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by("-rank")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
Lihat Meminta pembobotan untuk penjelasan dari parameter weights
.
Set the cover_density
parameter to True
to enable the cover density
ranking, which means that the proximity of matching query terms is taken into
account.
Provide an integer to the normalization
parameter to control rank
normalization. This integer is a bit mask, so you can combine multiple
behaviors:
>>> from django.db.models import Value
>>> Entry.objects.annotate(
... rank=SearchRank(
... vector,
... query,
... normalization=Value(2).bitor(Value(4)),
... )
... )
The PostgreSQL documentation has more details about different rank normalization options.
SearchHeadline
¶
- class SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)[sumber]¶
Accepts a single text field or an expression, a query, a config, and a set of options. Returns highlighted search results.
Set the start_sel
and stop_sel
parameters to the string values to be
used to wrap highlighted query terms in the document. PostgreSQL's defaults are
<b>
and </b>
.
Provide integer values to the max_words
and min_words
parameters to
determine the longest and shortest headlines. PostgreSQL's defaults are 35 and
15.
Provide an integer value to the short_word
parameter to discard words of
this length or less in each headline. PostgreSQL's default is 3.
Set the highlight_all
parameter to True
to use the whole document in
place of a fragment and ignore max_words
, min_words
, and short_word
parameters. That's disabled by default in PostgreSQL.
Provide a non-zero integer value to the max_fragments
to set the maximum
number of fragments to display. That's disabled by default in PostgreSQL.
Set the fragment_delimiter
string parameter to configure the delimiter
between fragments. PostgreSQL's default is " ... "
.
The PostgreSQL documentation has more details on highlighting search results.
Usage example:
>>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
>>> query = SearchQuery("red tomato")
>>> entry = Entry.objects.annotate(
... headline=SearchHeadline(
... "body_text",
... query,
... start_sel="<span>",
... stop_sel="</span>",
... ),
... ).get()
>>> print(entry.headline)
Sandwich with <span>tomato</span> and <span>red</span> cheese.
Lihat Merubah konfigurasi pencarian untuk sebuah penjelasan dari parameter config
.
Merubah konfigurasi pencarian¶
You can specify the config
attribute to a SearchVector
and
SearchQuery
to use a different search configuration. This allows using
different language parsers and dictionaries as defined by the database:
>>> from django.contrib.postgres.search import SearchQuery, SearchVector
>>> Entry.objects.annotate(
... search=SearchVector("body_text", config="french"),
... ).filter(search=SearchQuery("œuf", config="french"))
[<Entry: Pain perdu>]
The value of config
could also be stored in another column:
>>> from django.db.models import F
>>> Entry.objects.annotate(
... search=SearchVector("body_text", config=F("blog__language")),
... ).filter(search=SearchQuery("œuf", config=F("blog__language")))
[<Entry: Pain perdu>]
Meminta pembobotan¶
Every field may not have the same relevance in a query, so you can set weights of various vectors before you combine them:
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector("body_text", weight="A") + SearchVector(
... "blog__tagline", weight="B"
... )
>>> query = SearchQuery("cheese")
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by(
... "rank"
... )
The weight should be one of the following letters: D, C, B, A. By default,
these weights refer to the numbers 0.1
, 0.2
, 0.4
, and 1.0
,
respectively. If you wish to weight them differently, pass a list of four
floats to SearchRank
as weights
in the same order above:
>>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
>>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by("-rank")
Penampilan¶
Konfigurasi basisdata khusus tidak diperlukan untuk menggunakan fungsi ini apapun, bagaimapun, jika anda sedang mencari lebih dari sedikit ratusan rekaman, anda mungkin berjalan kedalam masalah penampilan. Pencarian teks penuh adalah pengolahan lebih intensif daripada membandingkan ukuran dari integer, sebagai contoh.
In the event that all the fields you're querying on are contained within one
particular model, you can create a functional
GIN
or
GiST
index which matches
the search vector you wish to use. For example:
GinIndex(
SearchVector("body_text", "headline", config="english"),
name="search_vector_idx",
)
The PostgreSQL documentation has details on creating indexes for full text search.
SearchVectorField
¶
If this approach becomes too slow, you can add a SearchVectorField
to your
model. You'll need to keep it populated with triggers, for example, as
described in the PostgreSQL documentation. You can then query the field as
if it were an annotated SearchVector
:
>>> Entry.objects.update(search_vector=SearchVector("body_text"))
>>> Entry.objects.filter(search_vector="cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
Kemiripan trigram¶
Another approach to searching is trigram similarity. A trigram is a group of
three consecutive characters. In addition to the trigram_similar
,
trigram_word_similar
, and trigram_strict_word_similar
lookups, you can use a couple of other expressions.
Untuk menggunakan mereka, anda butuh mengaktifkan pg_trgm extension pada PostgreSQL. Anda dapat memasang itu menggunakan tindakan perpindahan TrigramExtension
.
TrigramSimilarity
¶
Menerima nama bidang atau pernyataan, dan string atau pernyataan. Mengembalikan kemiripan trigram diantara dua argumen.
Usage example:
>>> from django.contrib.postgres.search import TrigramSimilarity
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Katie Stephens"
>>> Author.objects.annotate(
... similarity=TrigramSimilarity("name", test),
... ).filter(
... similarity__gt=0.3
... ).order_by("-similarity")
[<Author: Katy Stevens>, <Author: Stephen Keats>]
TrigramWordSimilarity
¶
Accepts a string or expression, and a field name or expression. Returns the trigram word similarity between the two arguments.
Usage example:
>>> from django.contrib.postgres.search import TrigramWordSimilarity
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Kat"
>>> Author.objects.annotate(
... similarity=TrigramWordSimilarity(test, "name"),
... ).filter(
... similarity__gt=0.3
... ).order_by("-similarity")
[<Author: Katy Stevens>]
TrigramStrictWordSimilarity
¶
Accepts a string or expression, and a field name or expression. Returns the
trigram strict word similarity between the two arguments. Similar to
TrigramWordSimilarity()
, except that it forces
extent boundaries to match word boundaries.
TrigramDistance
¶
Menerima nama bidang atau pernyataan, dan string atau pernyataan. mengembalikan jarak trigram diantara dua argumen.
Usage example:
>>> from django.contrib.postgres.search import TrigramDistance
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Katie Stephens"
>>> Author.objects.annotate(
... distance=TrigramDistance("name", test),
... ).filter(
... distance__lte=0.7
... ).order_by("distance")
[<Author: Katy Stevens>, <Author: Stephen Keats>]
TrigramWordDistance
¶
Accepts a string or expression, and a field name or expression. Returns the trigram word distance between the two arguments.
Usage example:
>>> from django.contrib.postgres.search import TrigramWordDistance
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Kat"
>>> Author.objects.annotate(
... distance=TrigramWordDistance(test, "name"),
... ).filter(
... distance__lte=0.7
... ).order_by("distance")
[<Author: Katy Stevens>]
TrigramStrictWordDistance
¶
Accepts a string or expression, and a field name or expression. Returns the trigram strict word distance between the two arguments.