Full Text Search with MongoDB, MySQL and SQLite – Part 1

Part 1 of Full Text Search with MongoDB, MySQL and SQLite

This article covers the installation of the 3 databases – MongoDB, MySQL and SQLite. Following that, we will cover how to load data and create full text indexes used for full text search. Full Text SEarch with MongoDB, MySQL and SQLite – Part 2 will cover full text searching.

Creating Full Text Search (FTS)

MongoDB, MySQL, SQLite, PostgreSQL, SQL Server and Oracle have many capabilities including Full Text Search. This article describes simple comparison of full text search in MongoDB, MySQL and SQLite on Ubuntu 16.04 LTS.

Full text search

Collect data for FTS

Full Text Search and full text indexing requires some data to begin. Let’s see how were can collect sample data. There are numerous places we can use to collect data. Some of them are:

I generate some JSON to test FTS taking the motivation from https://www.json-generator.com/ site. My JSON had the following field:

  1. _id – this was a GUID, especially because I wanted to use this with MongoDB and _id can serve as the primary key
  2. idx – this was a number (kind of auto incremented, but not quite)
  3. balance – this was a decimal value such as 10.5
  4. age – this was an integer
  5. eyeColor – this was just one of 10 colors I chose to use
  6. firstname
  7. lastname
  8. company
  9. phone
  10. address – completely bogus and made-up address
  11. about – bogus text

You can choose to directly generate JSON from the site above. However, it might get cumbersome after a couple of copy-pastes. If you are a developer, you can whip up a simple code to build yourself a dataset.

I created a couple million JSON with the above format and I was ready to import.

Installing SQLite

Installing SQLite is straight-forward. We’ll update and upgrade the system before we begin. Then, we’ll install SQLite. At the time of this writing, sqlite3 gets installed with the commands below.

Note that time is used in the middle command just to measure how long it took to do upgrade. You can choose to just type sudo apt-get upgrade.

Create SQLite DB and load data

Let’s create SQLite DB and load data in it. Actually, creating a SQLite database is super easy.

Awesome. So, we’ve just created ourselves an table capable of doing full text indexing using FTS4 engine of SQLite. Notice that we didn’t give any datatypes of our columns. That’s completely fine. In the simplest of words, SQLite assumes all text columns. To perform full text search, SQLite has to break data into words anyway.

Then, we’ll load data. Loading the data is relatively easy. I had 12 files named collected_data1.txt.csv, collected_data2.txt.csv etc. So, my script looked like this:

2.13 million records loaded in 3 minutes and 20 seconds in SQLite along with Full Text Indexing.

Installing Percona MySQL 5.7

MySQL comes in multiple flavors/forks. I chose to use MySQL distributed by Percona. Percona adds some good tooling and performance enhancement/monitoring with MySQL. I like that. Installing Percona is straight-forward.

Enter a password for root user when asked.

Create MySQL DB

Let’s log on to MySQL with user root and the password you entered during installation. Then, let’s create a database/schema called fts. Thereafter, we’ll create a table called test.

Excellent. So, now we have our database/schema created. We have our table created also. Notice that we have a raw_json column that will hold the raw JSON we have in our text files. Also notice that we have raw_json_real column. We will keep it as NULL to begin. Then, after populating the table, we will update this field. After that, we will create a full text index for full text search. Let’s begin.

Password-less login to MySQL

First, in order to run mysql command line utility without requiring password, I created a file like so:

Load data into MySQL

I created a script to load data into MySQL. Here’s the script:

2.13 million records loaded in 1 minute and 14 seconds in MySQL without Full Text Indexing. Let’s make a full text index.

Create Full Text Index in MySQL

Full text index is going to be created on just the raw_json field. Unlike SQLite, which creates full text search capability on the entire table, MySQL can create a regular and full text indexes, latter of which can be used for full text search.

MySQL’s journaled InnoDB engine took 9 minutes and 14 seconds to create a full text index. In comparison, MySQL’s MyISAM engine took 6 minutes and 47 seconds to create a full text index. In this exercise, we will continue to use InnoDB.

JSON-ify raw json data in MySQL

Raw JSON from text file was imported as-is in raw_json column. Let’s send this data as true JSON into raw_json_real column.

The above statement will set raw_json_real field to a real JSON object.

Installing MongoDB

Installing MongoDB is really easy as well. I executed the following statements to install MongoDB.

That’s all we need to do to install MongoDB and start its service.

Preparing data for import

My text dataset contained raw_json as a field. I wanted to extract just that field so that it is easy to load into MongoDB. So, I wrote a script for that.

Let’s run this file. In 1 minute, raw_json field from all the text files is written to mongofile.txt.csv. This is the file we can import into MongoDB.

Load data into MongoDB

Let’s load data from mongofile.txt.csv. This file was around 1.2 GB.

2.13 million records loaded in 1 minute and 43 seconds in MongoDB without Full Text Indexing. Let’s make a full text index.

Create Full Text Index in MongoDB

In Mongo, we will create an index only on a composite of firstname, lastname, address and age. That’s it. Let’s get started with building it.

Easy. Now, in MongoDB, we have a full text index. MongoDB took 6 minutes and 20 seconds to create a full text index

This concludes the installation and loading of data into SQLite, MySQL and MongoDB. In the next article we will do light full text search using Full Text Indexes we built on a dataset of roughly 2 million records.

Summary

Let’s review timing of our work:

[table id=1 /]

See Full Text Search with MongoDB, MySQL and SQLite – Part 2