Full Text Search with MongoDB, MySQL and SQLite – Part 1

Part 1 of Full Text Search with MongoDB, MySQL and SQLite

This article covers the installation of the 3 databases – MongoDB, MySQL and SQLite. Following that, we will cover how to load data and create full text indexes used for full text search. Full Text SEarch with MongoDB, MySQL and SQLite – Part 2 will cover full text searching.

Creating Full Text Search (FTS)

MongoDB, MySQL, SQLite, PostgreSQL, SQL Server and Oracle have many capabilities including Full Text Search. This article describes simple comparison of full text search in MongoDB, MySQL and SQLite on Ubuntu 16.04 LTS.

Full text search

Collect data for FTS

Full Text Search and full text indexing requires some data to begin. Let’s see how were can collect sample data. There are numerous places we can use to collect data. Some of them are:

I generate some JSON to test FTS taking the motivation from https://www.json-generator.com/ site. My JSON had the following field:

  1. _id – this was a GUID, especially because I wanted to use this with MongoDB and _id can serve as the primary key
  2. idx – this was a number (kind of auto incremented, but not quite)
  3. balance – this was a decimal value such as 10.5
  4. age – this was an integer
  5. eyeColor – this was just one of 10 colors I chose to use
  6. firstname
  7. lastname
  8. company
  9. phone
  10. address – completely bogus and made-up address
  11. about – bogus text

You can choose to directly generate JSON from the site above. However, it might get cumbersome after a couple of copy-pastes. If you are a developer, you can whip up a simple code to build yourself a dataset.

I created a couple million JSON with the above format and I was ready to import.

Installing SQLite

Installing SQLite is straight-forward. We’ll update and upgrade the system before we begin. Then, we’ll install SQLite. At the time of this writing, sqlite3 gets installed with the commands below.

sudo apt-get update
time sudo apt-get upgrade
sudo apt-get install sqlite -y

Note that time is used in the middle command just to measure how long it took to do upgrade. You can choose to just type sudo apt-get upgrade.

Create SQLite DB and load data

Let’s create SQLite DB and load data in it. Actually, creating a SQLite database is super easy.

sqlite3 fts.db

create virtual table test using fts4(
  idx,   id,      firstname, lastname, age, 
  phone, company, eyecolor,  address,  about, 
  balance, raw_json
);

Awesome. So, we’ve just created ourselves an table capable of doing full text indexing using FTS4 engine of SQLite. Notice that we didn’t give any datatypes of our columns. That’s completely fine. In the simplest of words, SQLite assumes all text columns. To perform full text search, SQLite has to break data into words anyway.

Then, we’ll load data. Loading the data is relatively easy. I had 12 files named collected_data1.txt.csv, collected_data2.txt.csv etc. So, my script looked like this:

cat import_sqlite.sh

#!/bin/bash

for i in {1..12}; do
  echo `date`	Importing ${i} -----------------
  sqlite3 fts.db '.separator "\t"' '.import /home/me/data/collected_data'${i}'.txt.csv test'
done

date

2.13 million records loaded in 3 minutes and 20 seconds in SQLite along with Full Text Indexing.

Installing Percona MySQL 5.7

MySQL comes in multiple flavors/forks. I chose to use MySQL distributed by Percona. Percona adds some good tooling and performance enhancement/monitoring with MySQL. I like that. Installing Percona is straight-forward.

wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb
sud dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb
sudo apt-get update
sudo apt-get install percona-server-server-5.7 -y

Enter a password for root user when asked.

Create MySQL DB

Let’s log on to MySQL with user root and the password you entered during installation. Then, let’s create a database/schema called fts. Thereafter, we’ll create a table called test.

mysql -u root -p
<Type your password when asked>

create database fts;

use fts;

CREATE TABLE test (
  rowid int(11) NOT NULL AUTO_INCREMENT,
  idx int(11) DEFAULT NULL,
  id varchar(100) DEFAULT NULL,
  firstname varchar(100) DEFAULT NULL,
  lastname varchar(100) DEFAULT NULL,
  age int(11) DEFAULT NULL,
  phone varchar(100) DEFAULT NULL,
  company varchar(100) DEFAULT NULL,
  eyecolor varchar(100) DEFAULT NULL,
  address varchar(100) DEFAULT NULL,
  about text,
  balance decimal(10,2) DEFAULT NULL,
  raw_json varchar(2000),
  raw_json_real json, -- this will be used to JSON'ify raw json
  PRIMARY KEY (rowid)
);

Excellent. So, now we have our database/schema created. We have our table created also. Notice that we have a raw_json column that will hold the raw JSON we have in our text files. Also notice that we have raw_json_real column. We will keep it as NULL to begin. Then, after populating the table, we will update this field. After that, we will create a full text index for full text search. Let’s begin.

Password-less login to MySQL

First, in order to run mysql command line utility without requiring password, I created a file like so:

cat ~/.my.cnf

[client]
user=root
password=<password I gave to MySQL during installation>
database=fts

Load data into MySQL

I created a script to load data into MySQL. Here’s the script:

cat import_mysql.sh

#!/bin/bash

for i in {1..12}; do
  echo `date`	Importing ${i} -----------------
  mysql -e "load data local infile '/home/me/data/collected_data${i}.txt.csv' into table test fields terminated by '\t' ignore 1 lines (idx, id, firstname, lastname, age, phone, company, eyecolor, address, about, balance, raw_json);"
done

date

2.13 million records loaded in 1 minute and 14 seconds in MySQL without Full Text Indexing. Let’s make a full text index.

Create Full Text Index in MySQL

Full text index is going to be created on just the raw_json field. Unlike SQLite, which creates full text search capability on the entire table, MySQL can create a regular and full text indexes, latter of which can be used for full text search.

mysql -e "create fulltext index fts_test_raw_json on test (raw_json);"

MySQL’s journaled InnoDB engine took 9 minutes and 14 seconds to create a full text index. In comparison, MySQL’s MyISAM engine took 6 minutes and 47 seconds to create a full text index. In this exercise, we will continue to use InnoDB.

JSON-ify raw json data in MySQL

Raw JSON from text file was imported as-is in raw_json column. Let’s send this data as true JSON into raw_json_real column.

update test
set raw_json_real = cast(
  replace(  replace(raw_json, '"', ''), '''', '"'  )
  as json
);

The above statement will set raw_json_real field to a real JSON object.

Installing MongoDB

Installing MongoDB is really easy as well. I executed the following statements to install MongoDB.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install mongodb-org -y

# start it
sudo systemctl start mongod.service

That’s all we need to do to install MongoDB and start its service.

Preparing data for import

My text dataset contained raw_json as a field. I wanted to extract just that field so that it is easy to load into MongoDB. So, I wrote a script for that.

cat create_mongo_import_file.sh

#!/bin/bash

rm /home/me/data/mongofile.txt.csv

for i in {1..12}; do
	awk -F'\t' '{print $12}' /home/me/data/collected_data${i}.txt.csv | tr "'" '"' >> /home/me/data/mongofile.txt.csv
done

# remove lines containingn raw_json since that's the header line from each of the 12 files
grep -v raw_json /home/me/data/mongofile.txt.csv > temp && mv temp /home/me/data/mongofile.txt.csv

Let’s run this file. In 1 minute, raw_json field from all the text files is written to mongofile.txt.csv. This is the file we can import into MongoDB.

Load data into MongoDB

Let’s load data from mongofile.txt.csv. This file was around 1.2 GB.

mongoimport --db fts --collection test --drop --file /home/me/data/mongofile.txt.csv

2.13 million records loaded in 1 minute and 43 seconds in MongoDB without Full Text Indexing. Let’s make a full text index.

Create Full Text Index in MongoDB

In Mongo, we will create an index only on a composite of firstname, lastname, address and age. That’s it. Let’s get started with building it.

mongo fts

db.test.createIndex({"firstname":"text", "lastname":"text", "address":"text", "age":1})

Easy. Now, in MongoDB, we have a full text index. MongoDB took 6 minutes and 20 seconds to create a full text index

This concludes the installation and loading of data into SQLite, MySQL and MongoDB. In the next article we will do light full text search using Full Text Indexes we built on a dataset of roughly 2 million records.

Summary

Let’s review timing of our work:

PropertySQLiteMySQLMongoDB
InstallationEasiestEasyEasy
Data load time3 min 20 s1 min 14 s1 min 43 s
Full text indexingIncluded in load time9 min 14 s6 min 20 s
Total time3 min 20 s10 m 28 s8 min 3 s
Indexed what?All columnsraw_json onlyfn, address, index only
JSON-ify dataNo support3 m 30 sAlready JSON
Prep'ing friendly dataNone needed< 10s< 1 min

See Full Text Search with MongoDB, MySQL and SQLite – Part 2