Part 2 of Full Text Search with MongoDB, MySQL and SQLite
This article covers full text search in 3 databases – MongoDB, MySQL and SQLite. In previous article of Full Text Search with MongoDB, MySQL and SQLite – Part 1, loading data into tables and creating full text indexes.
Our search requirements are:
- How many people live in Mapletown
- Of those living in Mapletown, how many are in California
- How many people live in Mapletown (city) or the state of California
- How many folks live in city starting with Maple and the state of California
- How many 27 year old people live in Mapletown in California
- Switch this a bit – How many people live 27-30 year old people live in Mapletown in California
- How many people live 27-30 year old people live in Mapletown in California whose first or last name is Jerry
Server is running Ubuntu 16.04 with 1GB memory, 1 core i5 2.7GHz and around 30GB disk space. MySQL and MongoDB are running with default configuration.
Let’s begin searching!
1. People in Mapletown – Matching a single full text search keyword
Let’s find how many people live in Mapletown.
SQLite version – Matching a single word is FAST! Notice the syntax where <tablename> match '<keyword>'
. This syntax is a little unique.
sqlite> select count(*) from test where test match 'mapletown'; 2204 Run Time: real 0.048 user 0.008000 sys 0.000000
MySQL version – Matching a single word is FAST here as well! Notice that the syntax is easier to understand with where match(<fts field>) against ('<keyword>')
. MySQL has 2 modes of comparison: in natural language mode
and in boolean mode
.
mysql> select count(*) from test where match(raw_json) against('mapletown' in boolean mode); +----------+ | count(*) | +----------+ | 2204 | +----------+ 1 row in set (0.03 sec)
MongoDB version – Matching a single word is pretty slow in comparison with SQLite and MySQL
> db.test.count( {$text: {$search: "mapletown"}} ) 2204 > d = new Date(); db.test.count( {$text: {$search: "mapletown"}} ); print(new Date - d + ' ms') 31 ms
2. People in Mapletown, CA – Matching two full text search keywords with AND
Let’s find how many people live in Mapletown in California!
SQLite version – Searching for both words is also FAST! Notice that there are two ways to write the same query. One with capitalized AND and one without any reserved word. By default, SQLite will use AND.
sqlite> select count(*) from test where test match 'mapletown AND california' 38 Run Time: real 0.037 user 0.000000 sys 0.008000 -- alternate way of writing sqlite> select count(*) from test where test match 'mapletown california'
MySQL version – Notice the operator +
before mapletown and california. That sign indicates an AND. We need records that contain both mapletown and california – that’s the indication MySQL gets with +
mysql> select count(*) from test where match(raw_json) against('+mapletown +california' in boolean mode); +----------+ | count(*) | +----------+ | 38 | +----------+ 1 row in set (0.15 sec)
MongoDB version – When both mapletown and california are quoted, an AND search is conducted
> db.test.count( {$text: {$search: "\"mapletown\" \"california\""}} ) 38 > d = new Date(); db.test.count( {$text: {$search: "\"mapletown\" \"california\""}} ); print(new Date - d + ' ms') 2658 ms
3. People in Mapletown or CA – Matching two full text search keywords with OR
Let’s find how many people live in Mapletown city or in the state of California.
SQLite version – Searching for either mapletown or california is ridiculously FAST. Notice how quickly results are returned.
sqlite> select count(*) from test where test match 'mapletown OR california'; 38441 Run Time: real 0.005 user 0.004000 sys 0.004000
MySQL version – When no operator is provided, an OR search is performed. MySQL is also very fast at retrieving data from full text index.
mysql> select count(*) from test where match(raw_json) against('mapletown california' in boolean mode); +----------+ | count(*) | +----------+ | 38441 | +----------+ 1 row in set (0.15 sec)
MongoDB version – In comparison with SQLite and MySQL, MongoDB is really slow. As far as an application is concerned, MongoDB can perform satisfactorily.
> db.test.count( {$text: {$search: "mapletown california"}} ) 38441 > d = new Date(); db.test.count( {$text: {$search: "mapletown california"}} ); print(new Date - d + ' ms') 2484 ms
4. People in CA with city starting with Maple – Matching a partial full text search keyword with AND
Let’s find how many people live in a city starting with Maple in California!
SQLite version – Blazing fast search with partial match continues with SQLite. Notice the wildcard *
behind the first keyword maple
.
sqlite> select count(*) from test where test match 'maple* AND california'; 138 Run Time: real 0.027 user 0.008000 sys 0.000000
MySQL version – MySQL is also blazing fast. Like SQLite, MySQL also uses *
wildcard.
mysql> select count(*) from test where match(raw_json) against('+maple* +california' in boolean mode); +----------+ | count(*) | +----------+ | 138 | +----------+ 1 row in set (0.08 sec)
MongoDB version – Comparatively, much slower! Not only is the syntax usually convoluted, results are also slower. Notice the lack of wildcard. By default, Mongo is doing partial matches.
> db.test.count( {$text: {$search: "\"maple\" \"california\""}} ) 138 > d = new Date(); db.test.count( {$text: {$search: "\"maple\" \"california\""}} ); print(new Date - d + ' ms') 3093 ms
5. 27 year olds in Mapletown, CA – Matching multiple fields and keywords in full text search
Let’s find how many 27 year olds live in Mapletown in California!
SQLite version – Blazing fast results. Notice age:27
provided in the keyword. That tells SQLite to search for 27 in age field. Mapletown and California can be searched in all fields.
sqlite> select count(*) from test where test match 'age:27 mapletown california'; 1 Run Time: real 0.108 user 0.008000 sys 0.020000
MySQL version – Notice how MySQL allows for full text search and regular search. If age was indexed, we could have hinted MySQL to use index on age. SQLite won’t be able do this kind of a match, but its own FTS4 match is insanely fast anyway.
mysql> select count(*) from test where match(raw_json) against('+mapletown +california' in boolean mode) and age = 27; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.07 sec)
MongoDB version – As queries are becoming somewhat complex, MongoDB is taking longer to extract results.
> db.test.count( {$text: {$search: "\"mapletown\" \"california\" \"27\""}} ) 1 > d = new Date(); db.test.count( {$text: {$search: "\"mapletown\" \"california\" \"27\""}} ); print(new Date() - d + ' ms') 3388 ms
This same query written differently is much faster as we are searching for age of 27 in a different field. Age field is not indexed, yet the speed is better than the above version. The speed is still nowhere close to SQLite and MySQL.
> db.test.count( {$and: [ {$text: {$search: "\"mapletown\" \"california\""}}, {age: 27} ]} ) 1 > d = new Date(); db.test.count( {$and: [ {$text: {$search: "\"mapletown\" \"california\""}}, {age: 27} ]} ); print(new Date() - d + ' ms') 160 ms
6. 27-30 year olds in Mapletown, CA – Matching multiple fields, ranges and keywords in full text search
Let’s find how many 27-30 year olds live in Mapletown in California!
SQLite version – Speed is not impacted at all. However, query is getting a little longer and un-pretty.
sqlite> select count(*) from test where test match '(age:27 OR age:28 OR age:29 OR age:30) mapletown california'; 5 Run Time: real 0.231 user 0.056000 sys 0.032000
MySQL version – Notice that the query is still fast and syntax is still quite familiar.
mysql> select count(*) from test where match(raw_json) against('+mapletown +california' in boolean mode) and age between 27 and 30; +----------+ | count(*) | +----------+ | 5 | +----------+ 1 row in set (0.09 sec)
MongoDB version – Look at the query now. This is looking like GraphQL and becoming difficult to type. Also notice that MongoDB is looking for age in its own separate column, which is not indexed – just like we have in MySQL. Query performance is above average in comparison with previous queries.
db.test.count( {$and: [ {$text: {$search: "\"mapletown\" \"california\""}}, {age: {$gt:26, $lt:31}} ]} ) 5 > d = new Date(); db.test.count( {$and: [ {$text: {$search: "\"mapletown\" \"california\""}}, {age: {$gt:26, $lt:31}} ]} ); print(new Date() - d + ' ms') 1636 ms
7. 27-30 year old named Jerry in Mapletown, CA – Matching multiple fields, ranges, exact word and keywords in full text search
Let’s find how many 27-30 year old people named Jerry live in Mapletown in California!
SQLite version – Speed is not impacted at all. However, query is getting a little longer and un-pretty.
sqlite> select count(*) from test where test match '(age:27 OR age:28 OR age:29 OR age:30) AND firstname:jerry AND mapletown AND california'; 1 Run Time: real 0.065 user 0.048000 sys 0.004000
MySQL version – Notice that the query is still fast and syntax is still quite familiar.
mysql> select count(*) from test where match(raw_json) against('+mapletown +california' in boolean mode) and age between 27 and 30 and firstname = 'jerry'; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.10 sec)
MongoDB version – Note the first version where jerry is lower case and we find no match. The next query is the right case and we get a match, but it takes 1.6s to get the result.
> db.test.count( {$and: [ {$text: {$search: "\"mapletown\" \"california\""}}, {age: {$gt:26, $lt:31}}, {firstname:"jerry"} ]} ) 0 > db.test.count( {$and:[ {$text: {$search: "\"mapletown\" \"california\""}}, {age: {$gt:26, $lt:31}}, {firstname:"Jerry"} ]} ) 1 > d = new Date(); db.test.count( {$and: [ {$text: {$search: "\"mapletown\" \"california\""}}, {age: {$gt:26, $lt:31}}, {firstname:"Jerry"} ]} ); print(new Date() - d + ' ms') 1608 ms
Summary
There are some observations from this unscientific, quick study:
Property | SQLite | MySQL | MongoDB |
All columns FTS indexed by default | Yes | No | No |
Speed of all searches in the article | Blazing fast | Blazing fast | Reasonable |
Default configuration | Yes | Yes | Yes |
Service needed | No | Yes, mysqld | Yes, mongod |
Immediate FTS upon insert | Yes | No, but so fast it is barely noticeable | No, but so fast it is barely noticeable |
Can FTS table have other regular indexes | No, whole table is FTS indexed | Yes | Yes |
Speed of creating full text index on all columns | Fast | Slow | Slow |
Stop words | Not available. Create your own C tokenizer | Yes | Yes |
Inflection - search for feet and find foot too | No | Yes, with expanded query option | Don't know enough |
Defaults to AND when 2 or more tokens | Yes | No | No |
Keywords for OR, AND, NOT | AND default, OR works. For NOT - subquery | Easy with Boolean Search | |
Wildcard support | Yes, some | Yes, some | Don't know enough |
What would I use? That choice depends on a lot of factors. However, the easy answer is: I’ll use SQLite for read-heavy queries where a single writer is sufficient. Areas where multiple processes are writing to the database, I’ll use MySQL or PostgreSQL. PostgreSQL also has full text search capabilities; I’ll cover that some other time.