#include <dastrie.h>
A key type can be either char*
or std::string
for your convenience. The string class std::string
is usually more convenient than char*
, but some may prefer char*
for efficiency, e.g., allocating a single memory block that can store all of keys and reading keys from a file at a time.
If you would like dastrie::trie to behave like std::set
, use dastrie::empty_type as a value type, which is a dummy value type that serializes nothing with a tail array. This is an example of a trie builder without values in which keys are represented by std::string
,
typedef dastrie::builder<std::string, dastrie::empty_type> builder_type;
If you would like to use dastrie::trie in a similar manner as std::set
, specify a value type to the second template argument. This is an example of a trie builder whose keys are char*
and values are double
,
typedef dastrie::builder<char*, double> builder_type;
DASTrie supports the following types as a value type : bool
, short
, unsigned
short
, int
, unsigned
int
, long
, unsigned
long
, float
, double
, long
double
, char*
, std::string
. In addition, it is possible to use a user-defined type as a value type for a trie. In order to do this, you must define operator<<()
and operator>>()
for serializing values with a tail array. This is an example of a value type (string_array
) that holds an array (vector) of strings and reads/writes the number of strings and their actual strings from/to a tail array,
class string_array : public std::vector<std::string> { public: friend dastrie::itail& operator>>(dastrie::itail& is, string_array& obj) { obj.clear(); uint32_t n; is >> n; for (uint32_t i = 0;i < n;++i) { std::string dst; is >> dst; obj.push_back(dst); } return is; } friend dastrie::otail& operator<<(dastrie::otail& os, const string_array& obj) { os << (uint32_t)obj.size(); for (size_t i = 0;i < obj.size();++i) { os << obj[i]; } return os; } }; typedef dastrie::builder<std::string, string_array> builder_type;
The third template argument (traits_type
) of a builder class is to customize the size of double-array elements. By default, DASTrie uses 5 bytes per double-array element, which can store 0x7FFFFFFF elements in a trie. This footprint is already smaller than other double-array implementations, you can choose 4 bytes per element and save 1 byte per element if your data is small enough to be stored with no longer than 0x007FFFFF elements (note that the number of elements is different from the number of records). Specify dastrie::doublearray4_traits at the third argument for implementing a double array with 4 bytes per element.
typedef builder_type::record_type record_type;
An instance of dastrie::builder::record_type consists of two public member variables, dastrie::builder::record_type::key and dastrie::builder::record_type::value. Records can be represented by an array of records, e.g.,
record_type records[10] = { {"eight", 8}, {"five", 5}, {"four", 4}, {"nine", 9}, {"one", 1}, {"seven", 7}, {"six", 6}, {"ten", 10}, {"three", 3}, {"two", 2}, };
std::vector<record_type> records;
Make sure that records are sorted by dictionary order of keys. It may be a good idea to use STL's sort()
function if you have unsorted records.
Now you are ready to build a trie. Instantiate the builder class,
builder_type builder;
Call dastrie::builder::build to build a trie from records. The first argument is the iterator (pointer) addressing to the first record, and the second argument is the iterator (pointer) addressing one past the final record.
builder.build(records, records + 10);
You can store the newly-built trie to a file by using dastrie::builder::write. This method outputs the trie to a binary stream (std::ostream
).
std::ofstream ofs("sample.db", std::ios::binary);
builder.write(ofs);
This example defines a trie class trie_type
that can access records whose values are represented by int
.
typedef dastrie::trie<int> trie_type;
The class dastrie::trie can read a trie structure from an input stream; the dastrie::trie::read() function prepares a trie from an input stream (std::istream
). This example reads a trie from an input stream.
std::ifstream ifs("sample.db", std::ios::binary);
trie_type trie;
trie.read(ifs);
Alternatively, you can read a trie structure from a memory block by using the dastrie::trie::assign() function. This function may be useful when you would like to use mmap() API for reading a trie.
Now you are ready to access the trie. Please refer to the sample code for retrieving a record (dastrie::trie::get() and dastrie::trie::find()), checking the existence of a record (dastrie::trie::in()), and retrieving records that are prefixes of keys (dastrie::trie::prefix()).