More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="food", limit=3 ) for o in response.objects: print(o.properties)
const jeopardy = client.collections.use('JeopardyQuestion'); const result =await jeopardy.query.bm25('food',{ limit:3, }) for(let object of result.objects){ console.log(JSON.stringify(object.properties,null,2)); }
var jeopardy = client.Collections.Use("JeopardyQuestion"); var response =await jeopardy.Query.BM25( "food", limit:3 ); foreach(var o in response.Objects) { Console.WriteLine(JsonSerializer.Serialize(o.Properties)); }
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import BM25Operator jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="Australian mammal cute", operator=BM25Operator.or_(minimum_match=1), limit=3, ) for o in response.objects: print(o.properties)
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import BM25Operator jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="Australian mammal cute", operator=BM25Operator.and_(),# Each result must include all tokens (e.g. "australian", "mammal", "cute") limit=3, ) for o in response.objects: print(o.properties)
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="food", return_metadata=MetadataQuery(score=True), limit=3 ) for o in response.objects: print(o.properties) print(o.metadata.score)
const jeopardy = client.collections.use('JeopardyQuestion'); const result =await jeopardy.query.bm25('food',{ returnMetadata:['score'], limit:3 }) for(let object of result.objects){ console.log(JSON.stringify(object.properties,null,2)); console.log(object.metadata?.score); }
A keyword search can be directed to only search a subset of object properties. In this example, the BM25 search only uses the question property to produce the BM25F score.
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="safety", query_properties=["question"], return_metadata=MetadataQuery(score=True), limit=3 ) for o in response.objects: print(o.properties) print(o.metadata.score)
const jeopardy = client.collections.use('JeopardyQuestion'); const result =await jeopardy.query.bm25('safety',{ queryProperties:['question'], returnMetadata:['score'], limit:3 }) for(let object of result.objects){ console.log(JSON.stringify(object.properties,null,2)); console.log(object.metadata?.score); }
You can weight how much each property affects the overall BM25F score. This example boosts the question property by a factor of 2 while the answer property remains static.
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="food", query_properties=["question^2","answer"], limit=3 ) for o in response.objects: print(o.properties)
const jeopardy = client.collections.use('JeopardyQuestion'); const result =await jeopardy.query.bm25('food',{ queryProperties:['question^2','answer'], returnMetadata:['score'], limit:3 }) for(let object of result.objects){ console.log(JSON.stringify(object.properties,null,2)); console.log(object.metadata?.score); }
var jeopardy = client.Collections.Use("JeopardyQuestion"); var response =await jeopardy.Query.BM25( "food", searchFields:["question^2","answer"], limit:3 ); foreach(var o in response.Objects) { Console.WriteLine(JsonSerializer.Serialize(o.Properties)); }
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Configure, Property, DataType, Tokenization client.collections.create( "Article", vector_config=Configure.Vectors.text2vec_cohere(), properties=[ Property( name="title", data_type=DataType.TEXT, vectorize_property_name=True,# Use "title" as part of the value to vectorize tokenization=Tokenization.LOWERCASE,# Use "lowercase" tokenization description="The title of the article.",# Optional description ), Property( name="body", data_type=DataType.TEXT, skip_vectorization=True,# Don't vectorize this property tokenization=Tokenization.WHITESPACE,# Use "whitespace" tokenization ), ], )
client.collections.create("Article", col -> col.properties( Property.text("title", p -> p.description("The title of the article.") .tokenization(Tokenization.LOWERCASE) .vectorizePropertyName(false)), Property.text("body", p -> p.skipVectorization(true) .tokenization(Tokenization.WHITESPACE))));
Property titleProperty =Property.builder() .name("title") .description("title of the article") .dataType(Arrays.asList(DataType.TEXT)) .tokenization(Tokenization.LOWERCASE) .build(); Property bodyProperty =Property.builder() .name("body") .description("body of the article") .dataType(Arrays.asList(DataType.TEXT)) .tokenization(Tokenization.WHITESPACE) .build(); //Add the defined properties to the class WeaviateClass articleClass =WeaviateClass.builder() .className("Article") .description("Article Class Description...") .properties(Arrays.asList(titleProperty, bodyProperty)) .build(); Result<Boolean> result = client.schema().classCreator() .withClass(articleClass) .run();
Tokenization and fuzzy matching
For fuzzy matching and typo tolerance, use trigram tokenization. See the fuzzy matching section above for details.
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="safety", limit=3, offset=1 ) for o in response.objects: print(o.properties)
const jeopardy = client.collections.use('JeopardyQuestion'); const result =await jeopardy.query.bm25('safety',{ limit:3, offset:1 }) for(let object of result.objects){ console.log(JSON.stringify(object.properties,null,2)); }
var jeopardy = client.Collections.Use("JeopardyQuestion"); var response =await jeopardy.Query.BM25( "safety", limit:3, offset:1 ); foreach(var o in response.Objects) { Console.WriteLine(JsonSerializer.Serialize(o.Properties)); }
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="safety", auto_limit=1 ) for o in response.objects: print(o.properties)
const jeopardy = client.collections.use('JeopardyQuestion'); const result =await jeopardy.query.bm25('safety',{ autoLimit:1, }) for(let object of result.objects){ console.log(JSON.stringify(object.properties,null,2)); }
var jeopardy = client.Collections.Use("JeopardyQuestion"); var response =await jeopardy.Query.BM25( "safety", autoCut:1 ); foreach(var o in response.Objects) { Console.WriteLine(JsonSerializer.Serialize(o.Properties)); }
{ "data":{ "Get":{ "JeopardyQuestion":[ { "_additional":{ "score":"2.6768136" }, "answer":"OSHA (Occupational Safety and Health Administration)", "question":"The government admin. was created in 1971 to ensure occupational health & safety standards" } ] } } }
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import GroupBy jeopardy = client.collections.use("JeopardyQuestion") # Grouping parameters group_by = GroupBy( prop="round",# group by this property objects_per_group=3,# maximum objects per group number_of_groups=2,# maximum number of groups ) # Query response = jeopardy.query.bm25( query="California", group_by=group_by ) for grp_name, grp_content in response.groups.items(): print(grp_name, grp_content.objects)
const jeopardy = client.collections.use('JeopardyQuestion'); // Grouping parameters const groupByProperties ={ property:"round",// group by this property objectsPerGroup:3,// maximum objects per group numberOfGroups:2// maximum number of groups } // Query const response =await jeopardy.query.bm25("California",{ groupBy: groupByProperties }) for(let groupName in response.groups){ console.log(groupName) // Uncomment to view group objects // console.log(response.groups[groupName].objects) }
CollectionHandle<Map<String,Object>> jeopardy = client.collections.use("JeopardyQuestion"); var response = jeopardy.query.bm25("California", q -> q,// No query options needed for this example GroupBy.property("round",// group by this property 2,// maximum number of groups 3// maximum objects per group )); response.groups().forEach((groupName, group)->{ System.out.println(group.name()+" "+ group.objects()); });
var jeopardy = client.Collections.Use("JeopardyQuestion"); var response =await jeopardy.Query.BM25( "California", groupBy:newGroupByRequest { PropertyName ="round",// group by this property NumberOfGroups =2,// maximum number of groups ObjectsPerGroup =3,// maximum objects per group } ); foreach(vargroupin response.Groups.Values) { Console.WriteLine($"{group.Name}{JsonSerializer.Serialize(group.Objects)}"); }
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import Filter jeopardy = client.collections.use("JeopardyQuestion") response = jeopardy.query.bm25( query="food", filters=Filter.by_property("round").equal("Double Jeopardy!"), return_properties=["answer","question","round"],# return these properties limit=3 ) for o in response.objects: print(o.properties)
var jeopardy = client.Collections.Use("JeopardyQuestion"); var response =await jeopardy.Query.BM25( "food", filters: Filter.Property("round").Equal("Double Jeopardy!"), returnProperties:["answer","question","round"],// return these properties limit:3 ); foreach(var o in response.Objects) { Console.WriteLine(JsonSerializer.Serialize(o.Properties)); }
{ "data":{ "Get":{ "JeopardyQuestion":[ { "_additional":{ "score":"3.0140665" }, "answer":"food stores (supermarkets)", "question":"This type of retail store sells more shampoo & makeup than any other", "round":"Double Jeopardy!" }, { "_additional":{ "score":"1.9633813" }, "answer":"honey", "question":"The primary source of this food is the Apis mellifera", "round":"Double Jeopardy!" }, { "_additional":{ "score":"1.6719631" }, "answer":"pseudopods", "question":"Amoebas use temporary extensions called these to move or to surround & engulf food", "round":"Double Jeopardy!" } ] } } }
Weaviate converts filter terms into tokens. The default tokenization is word. The word tokenizer keeps alphanumeric characters, lowercase them and splits on whitespace. It converts a string like "Test_domain_weaviate" into "test", "domain", and "weaviate".
For details and additional tokenization methods, see Tokenization.
You can enable fuzzy matching and typo tolerance in BM25 searches by using trigram tokenization. This technique breaks text into overlapping 3-character sequences, allowing BM25 to find matches even when there are spelling errors or variations.
This enables matching between similar but not identical strings because they share many trigrams:
"Morgn" and "Morgan" share trigrams like "org", "rga", "gan"
Set the tokenization method to trigram at the property level when creating your collection:
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
client.collections.create("Article", col -> col .vectorConfig(VectorConfig.text2vecTransformers()) .properties( Property.text("title", p -> p.tokenization(Tokenization.TRIGRAM))));
Best practices
Use trigram tokenization selectively on fields that need fuzzy matching. Filtering behavior will change significantly, as text filtering will be done based on trigram-tokenized text, instead of whole words
Keep exact-match fields with word or field tokenization for precision.