Part 1 of Full Text Search with MongoDB, MySQL and SQLite
This article covers the installation of the 3 databases – MongoDB, MySQL and SQLite. Following that, we will cover how to load data and create full text indexes used for full text search. Full Text SEarch with MongoDB, MySQL and SQLite – Part 2 will cover full text searching.
Creating Full Text Search (FTS)
MongoDB, MySQL, SQLite, PostgreSQL, SQL Server and Oracle have many capabilities including Full Text Search. This article describes simple comparison of full text search in MongoDB, MySQL and SQLite on Ubuntu 16.04 LTS.
Collect data for FTS
Full Text Search and full text indexing requires some data to begin. Let’s see how were can collect sample data. There are numerous places we can use to collect data. Some of them are:
- https://www.mockaroo.com/
- https://www.generatedata.com/
- http://www.freedatagenerator.com/csv-data-generator
- https://www.json-generator.com/
I generate some JSON to test FTS taking the motivation from https://www.json-generator.com/ site. My JSON had the following field:
- _id – this was a GUID, especially because I wanted to use this with MongoDB and _id can serve as the primary key
- idx – this was a number (kind of auto incremented, but not quite)
- balance – this was a decimal value such as 10.5
- age – this was an integer
- eyeColor – this was just one of 10 colors I chose to use
- firstname
- lastname
- company
- phone
- address – completely bogus and made-up address
- about – bogus text
You can choose to directly generate JSON from the site above. However, it might get cumbersome after a couple of copy-pastes. If you are a developer, you can whip up a simple code to build yourself a dataset.
I created a couple million JSON with the above format and I was ready to import.
Installing SQLite
Installing SQLite is straight-forward. We’ll update and upgrade the system before we begin. Then, we’ll install SQLite. At the time of this writing, sqlite3 gets installed with the commands below.
sudo apt-get update time sudo apt-get upgrade sudo apt-get install sqlite -y
Note that time
is used in the middle command just to measure how long it took to do upgrade. You can choose to just type sudo apt-get upgrade
.
Create SQLite DB and load data
Let’s create SQLite DB and load data in it. Actually, creating a SQLite database is super easy.
sqlite3 fts.db create virtual table test using fts4( idx, id, firstname, lastname, age, phone, company, eyecolor, address, about, balance, raw_json );
Awesome. So, we’ve just created ourselves an table capable of doing full text indexing using FTS4 engine of SQLite. Notice that we didn’t give any datatypes of our columns. That’s completely fine. In the simplest of words, SQLite assumes all text columns. To perform full text search, SQLite has to break data into words anyway.
Then, we’ll load data. Loading the data is relatively easy. I had 12 files named collected_data1.txt.csv, collected_data2.txt.csv etc. So, my script looked like this:
cat import_sqlite.sh #!/bin/bash for i in {1..12}; do echo `date` Importing ${i} ----------------- sqlite3 fts.db '.separator "\t"' '.import /home/me/data/collected_data'${i}'.txt.csv test' done date
2.13 million records loaded in 3 minutes and 20 seconds in SQLite along with Full Text Indexing.
Installing Percona MySQL 5.7
MySQL comes in multiple flavors/forks. I chose to use MySQL distributed by Percona. Percona adds some good tooling and performance enhancement/monitoring with MySQL. I like that. Installing Percona is straight-forward.
wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb sud dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb sudo apt-get update sudo apt-get install percona-server-server-5.7 -y
Enter a password for root
user when asked.
Create MySQL DB
Let’s log on to MySQL with user root
and the password you entered during installation. Then, let’s create a database/schema called fts. Thereafter, we’ll create a table called test.
mysql -u root -p <Type your password when asked> create database fts; use fts; CREATE TABLE test ( rowid int(11) NOT NULL AUTO_INCREMENT, idx int(11) DEFAULT NULL, id varchar(100) DEFAULT NULL, firstname varchar(100) DEFAULT NULL, lastname varchar(100) DEFAULT NULL, age int(11) DEFAULT NULL, phone varchar(100) DEFAULT NULL, company varchar(100) DEFAULT NULL, eyecolor varchar(100) DEFAULT NULL, address varchar(100) DEFAULT NULL, about text, balance decimal(10,2) DEFAULT NULL, raw_json varchar(2000), raw_json_real json, -- this will be used to JSON'ify raw json PRIMARY KEY (rowid) );
Excellent. So, now we have our database/schema created. We have our table created also. Notice that we have a raw_json
column that will hold the raw JSON we have in our text files. Also notice that we have raw_json_real
column. We will keep it as NULL
to begin. Then, after populating the table, we will update this field. After that, we will create a full text index for full text search. Let’s begin.
Password-less login to MySQL
First, in order to run mysql command line utility without requiring password, I created a file like so:
cat ~/.my.cnf [client] user=root password=<password I gave to MySQL during installation> database=fts
Load data into MySQL
I created a script to load data into MySQL. Here’s the script:
cat import_mysql.sh #!/bin/bash for i in {1..12}; do echo `date` Importing ${i} ----------------- mysql -e "load data local infile '/home/me/data/collected_data${i}.txt.csv' into table test fields terminated by '\t' ignore 1 lines (idx, id, firstname, lastname, age, phone, company, eyecolor, address, about, balance, raw_json);" done date
2.13 million records loaded in 1 minute and 14 seconds in MySQL without Full Text Indexing. Let’s make a full text index.
Create Full Text Index in MySQL
Full text index is going to be created on just the raw_json field. Unlike SQLite, which creates full text search capability on the entire table, MySQL can create a regular and full text indexes, latter of which can be used for full text search.
mysql -e "create fulltext index fts_test_raw_json on test (raw_json);"
MySQL’s journaled InnoDB engine took 9 minutes and 14 seconds to create a full text index. In comparison, MySQL’s MyISAM engine took 6 minutes and 47 seconds to create a full text index. In this exercise, we will continue to use InnoDB.
JSON-ify raw json data in MySQL
Raw JSON from text file was imported as-is in raw_json column. Let’s send this data as true JSON into raw_json_real column.
update test set raw_json_real = cast( replace( replace(raw_json, '"', ''), '''', '"' ) as json );
The above statement will set raw_json_real
field to a real JSON object.
Installing MongoDB
Installing MongoDB is really easy as well. I executed the following statements to install MongoDB.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927 echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list sudo apt-get update sudo apt-get install mongodb-org -y # start it sudo systemctl start mongod.service
That’s all we need to do to install MongoDB and start its service.
Preparing data for import
My text dataset contained raw_json
as a field. I wanted to extract just that field so that it is easy to load into MongoDB. So, I wrote a script for that.
cat create_mongo_import_file.sh #!/bin/bash rm /home/me/data/mongofile.txt.csv for i in {1..12}; do awk -F'\t' '{print $12}' /home/me/data/collected_data${i}.txt.csv | tr "'" '"' >> /home/me/data/mongofile.txt.csv done # remove lines containingn raw_json since that's the header line from each of the 12 files grep -v raw_json /home/me/data/mongofile.txt.csv > temp && mv temp /home/me/data/mongofile.txt.csv
Let’s run this file. In 1 minute, raw_json
field from all the text files is written to mongofile.txt.csv. This is the file we can import into MongoDB.
Load data into MongoDB
Let’s load data from mongofile.txt.csv. This file was around 1.2 GB.
mongoimport --db fts --collection test --drop --file /home/me/data/mongofile.txt.csv
2.13 million records loaded in 1 minute and 43 seconds in MongoDB without Full Text Indexing. Let’s make a full text index.
Create Full Text Index in MongoDB
In Mongo, we will create an index only on a composite of firstname, lastname, address and age. That’s it. Let’s get started with building it.
mongo fts db.test.createIndex({"firstname":"text", "lastname":"text", "address":"text", "age":1})
Easy. Now, in MongoDB, we have a full text index. MongoDB took 6 minutes and 20 seconds to create a full text index
This concludes the installation and loading of data into SQLite, MySQL and MongoDB. In the next article we will do light full text search using Full Text Indexes we built on a dataset of roughly 2 million records.
Summary
Let’s review timing of our work:
Property | SQLite | MySQL | MongoDB |
Installation | Easiest | Easy | Easy |
Data load time | 3 min 20 s | 1 min 14 s | 1 min 43 s |
Full text indexing | Included in load time | 9 min 14 s | 6 min 20 s |
Total time | 3 min 20 s | 10 m 28 s | 8 min 3 s |
Indexed what? | All columns | raw_json only | fn, address, index only |
JSON-ify data | No support | 3 m 30 s | Already JSON |
Prep'ing friendly data | None needed | < 10s | < 1 min |
See Full Text Search with MongoDB, MySQL and SQLite – Part 2