當前位置：首頁 > 编程语言 > c/c++ >内容正文

c/c++

C++ STL : 模拟实现STL中的关联式容器unordered_map/unordered_set

發布時間：2024/4/11 c/c++ 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 C++ STL : 模拟实现STL中的关联式容器unordered_map/unordered_set 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

unordered_map/unordered_set

C++ STL : 模擬實現STL中的關聯式容器map和set
這次實現的unordered_map/unordered_set的具體思路和實現的map/set思路差不多，只不過是將底層的數據結構從紅黑樹修改為哈希桶，因為數據結構的修改，其特性也發生了變化。

unordered_map/unordered_set與map/set的區別

相同點：

都是關聯型容器，存儲的是以<key, value>的鍵值對pair

不同點：

底層數據結構。unordered_map/unordered_set的底層容器是哈希桶，map和set底層是紅黑樹

數據是否有序。因為unordered_map/unordered_set的底層容器是哈希桶，數據都是通過映射關系來存儲，這就導致了數據是無序的。而map和set底層是紅黑樹，所以可以通過中序遍歷來獲取有序的數據。

數據訪問時間。unordered_map/unordered_set因為其是通過映射關系來存儲，所以效率為O（1）。map/set為O（logN）

底層哈希桶的改造

仿函數

在模板中提供了4個模板參數，分別是key值類型、數據類型、key值的獲取方法、hash（key）的獲取方法

template<class K, class T, class KeyofT, class Hash>

Key值的獲取方法

因為需要考慮到代碼復用的問題，用一個哈希桶來實現K模型的unordered_set和KV模型的unordered_map。并且對于某些自定義類型作為參數，也需要考慮如何從他傳的參數中獲取key值，就需要增加一個模板參數，來讓使用者自行提供從數據中獲取key值的方法。

下面就拿map和set的舉例子。
map

struct MapKeyOfValue {const K& operator()(const std::pair<K, V>& kv){return kv.first;} };

set

struct SetKeyOfValue {const K& operator()(const K& key){return key;} };

hash（key）的轉換方法

如果key的類型為整型的話還好說，可以直接用來進行哈希函數的映射。但是如果是其他的一些無法進行整型算數運算的類型或者極為龐大的數據，如常用的string或者大數等類型，就需要一種方法來將其轉換為可以計算的整型值，但是對于自定義類型我們并不能知道他的轉換方法，所以就需要提供一個仿函數，讓使用者自行提供轉換的方法。

因為常用的key一般都是string和int，這里我就給了默認的整型處理方法以及string的特化方法

template<class K> struct _Hash {const K& operator()(const K& key){return key;} };template<> struct _Hash<std::string> {size_t operator()(const std::string& key){//BKDR字符串哈希函數size_t hash = 0;for (size_t i = 0; i < key.size(); i++){hash *= 131;hash += key[i];}return hash;} };

迭代器

迭代器的實現非常容易，值得一提的就是++的重載。

如果迭代器當前所在的桶中的下一個位置不為空，則直接返回下一個位置。而下一個位置為空，則說明當前桶為空，就需要去到下一個桶中遍歷數據。但是光光依靠迭代器是無法獲取下一個桶的位置的，所以我就加入了一個哈希桶指針，這樣就可以通過指針獲取桶的哈希函數來計算出當前映射位置，再通過訪問映射位置來找到下一個存有數據的桶，就可以計算出下一個位置。

template<class K, class T, class KeyOfT, class Hash> struct __HashTableIterator {typedef HashNode<T> Node;typedef HashBucket<K, T, KeyOfT, Hash> HB;typedef __HashTableIterator<K, T, KeyOfT, Hash> Self;__HashTableIterator(Node* node, HB* hb): _node(node), _phb(hb){}T& operator*(){return _node->_data;}T* operator->(){return &_node->_data;}Self& operator++(){//如果下一個節點不為空，直接返回下一個if (_node->_next){_node = _node->_next;}//如果下一個為空，則走到下一個表中else{//通過獲取當前數據的key來判斷下一個數據的位置KeyOfT koft;size_t pos = _phb->HashFunc(koft(_node->_data));++pos;for (; pos < _phb->_table.size(); pos++){Node* cur = _phb->_table[pos];//如果下一個桶的數據不為空，則返回桶的第一個節點if (cur != nullptr){_node = cur;return *this;}}//剩下的桶都沒有數據_node = nullptr;}return *this;}Self operator++(int){Self temp = *this;++this;return temp;}bool operator != (const Self& s){return _node != s._node;}bool operator == (const Self& s){return _node == s._node;}Node* _node;HB* _phb; };

完整代碼

#pragma once #include<vector> #include<string>namespace lee {//算法科學家總結出的一個增容質數表，按照這樣增容的效率更高const int PRIMECOUNT = 28;const size_t primeList[PRIMECOUNT] = {53ul, 97ul, 193ul, 389ul, 769ul,1543ul, 3079ul, 6151ul, 12289ul, 24593ul,49157ul, 98317ul, 196613ul, 393241ul, 786433ul,1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,1610612741ul, 3221225473ul, 4294967291ul};/*因為哈希函數的常用方法如直接定地、除留余數、平方取中等方法需要用的key值為整型，而大部分時候我們的key都是string，或者某些自定義類型，這個時候就可以提供一個仿函數的接口給外部，讓他自己處理如何將key轉換成我們需要的整型*/template<class K>struct _Hash{const K& operator()(const K& key){return key;}};template<>struct _Hash<std::string>{const size_t & operator()(const std::string& key){//BKDR字符串哈希函數size_t hash = 0;for (size_t i = 0; i < key.size(); i++){hash *= 131;hash += key[i];}return hash;}};template<class K>struct SetKeyOfT{const K& operator()(const K& key){return key;}};template<class T>struct HashNode{HashNode(const T& data = T()): _data(data), _next(nullptr){}T _data;HashNode<T>* _next;};template<class K, class T, class KeyofT, class Hash>class HashBucket;template<class K, class T, class KeyOfT, class Hash>struct __HashTableIterator{typedef HashNode<T> Node;typedef HashBucket<K, T, KeyOfT, Hash> HB;typedef __HashTableIterator<K, T, KeyOfT, Hash> Self;__HashTableIterator(Node* node, HB* hb) : _node(node), _phb(hb){}T& operator*(){return _node->_data;}T* operator->(){return &_node->_data;}Self& operator++(){//如果下一個節點不為空，直接返回下一個if (_node->_next){_node = _node->_next;}//如果下一個為空，則走到下一個表中else{//通過獲取當前數據的key來判斷下一個數據的位置KeyOfT koft;size_t pos = _phb->HashFunc(koft(_node->_data));++pos;for (; pos < _phb->_table.size(); pos++){Node* cur = _phb->_table[pos];//如果下一個桶的數據不為空，則返回桶的第一個節點if (cur != nullptr){_node = cur;return *this;}}//剩下的桶都沒有數據_node = nullptr;}return *this;}Self operator++(int){Self temp = *this;++this;return temp;}bool operator != (const Self& s){return _node != s._node;}bool operator == (const Self& s){return _node == s._node;}Node* _node;HB* _phb;};template<class K, class T, class KeyofT = SetKeyOfT<T>, class Hash = _Hash<K>>class HashBucket{public:typedef __HashTableIterator<K, T, KeyofT, Hash> iterator;typedef HashNode<T> Node;friend struct iterator;HashBucket(size_t capacity = 10): _table(capacity), _size(0){}~HashBucket(){Clear();}iterator begin(){//找到第一個節點for (size_t i = 0; i < _table.size(); i++){//如果節點不為空則返回if (_table[i]){return iterator(_table[i], this);}}return iterator(nullptr, this);}//因為在STL中哈希桶的底層是單鏈表的結構，所以不支持--操作，end就直接給一個空即可iterator end(){return iterator(nullptr, this);}size_t getNextPrime(size_t num){size_t i = 0;for (i = 0; i < PRIMECOUNT; i++){//返回比那個數大的下一個質數 if (primeList[i] > num){return primeList[i];}}//如果比所有都大，還是返回最后一個，因為最后一個已經是32位最大容量return primeList[PRIMECOUNT - 1];}size_t HashFunc(const K& key){Hash hash;return hash(key) % _table.size();}std::pair<iterator, bool> Insert(const T& data){KeyofT koft;/*因為哈希桶是開散列的鏈式結構，發生了哈希沖突是直接在對應位置位置進行頭插，而桶的個數是固定的，而插入的數據會不斷增多，隨著數據的增多，就可能會導致某一個桶過重，使得效率過低。所以最理想的情況，就是每個桶都有一個數據。這種情況下，如果往任何一個地方插入，都會產生哈希沖突，所以當數據個數與桶的個數相同時，也就是負載因子為1時就需要進行擴容。*/if (_size == _table.size()){//按照素數表來增容size_t newSize = getNextPrime(_table.size());size_t oldSize = _table.size();std::vector<Node*> newTable(newSize);_table.resize(newSize);//接著將數據重新映射過去for (size_t i = 0; i < oldSize; i++){Node* cur = _table[i];while (cur){//重新計算映射的位置size_t pos = HashFunc(koft(cur->_data));//找到位置后頭插進對應位置Node* next = cur->_next;cur->_next = newTable[pos];newTable[pos] = cur;cur = next;}//原數據置空_table[i] = nullptr;}//直接和新表交換，交換過去的舊表會和函數棧幀一塊銷毀。_table.swap(newTable);}size_t pos = HashFunc(koft(data));Node* cur = _table[pos];//因為哈希桶key值唯一，如果已經在桶中則返回falsewhile (cur){if (koft(cur->_data) == koft(data)){return std::make_pair(iterator(cur, this), false);}else{cur = cur->_next;}}//檢查完成，此時開始插入，這里選擇的是頭插，這樣就可以減少數據遍歷的次數。Node* newNode = new Node(data);newNode->_next = _table[pos];_table[pos] = newNode;++_size;return std::make_pair(iterator(newNode, this), true);}iterator Erase(const K& key){KeyofT koft;size_t pos = HashFunc(key);Node* cur = _table[pos];Node* prev = nullptr;while (cur){if (koft(cur->_data) == key){iterator ret(cur, this);++ret;//如果要刪除的是第一個節點，就讓下一個節點成為新的頭節點，否則直接刪除。if (prev == nullptr){_table[pos] = cur->_next;}else{prev->_next = cur->_next;}delete cur;--_size;return ret;}else{prev = cur;cur = cur->_next;}}return end();}iterator Find(const K& key){KeyofT koft;size_t pos = HashFunc(key);Node* cur = _table[pos];while (cur){if (koft(cur->_data) == key){return iterator(cur, this);}else{cur = cur->_next;}}return end();}void Clear(){//刪除所有節點for (size_t i = 0; i < _table.size(); i++){Node* cur = _table[i];while (cur){Node* next = cur->_next;delete cur;cur = next;}_table[i] = nullptr;}}private:std::vector<Node*> _table;size_t _size;}; };

unordered_set

文檔介紹

unordered_set是不按特定順序存儲唯一元素的容器，允許根據元素的值快速檢索單個元素。

在unordered_set中，元素的值同時也是其唯一標識它的key。key是不可變的，因此，unordered_set中的元素不能在容器中修改,但是可以插入和刪除它們。

在內部,unordered_set中的元素不按照任何特定的順序排序, 而是根據它們的哈希值組織成bucket，以便直接通過元素的值快速訪問單個元素

unordered_set容器通過key訪問單個元素要比set快，但它通常在遍歷元素子集的范圍迭代方面效率較低。

它的迭代器至少是前向迭代器。

代碼實現

這里封裝的思路和之前的map和set基本一樣。
C++ STL : 模擬實現STL中的關聯式容器map和set

#include"HashBucket.hpp"namespace lee {template<class K, class Hash = lee::_Hash<K>>class unordered_set{public:struct SetKeyOfValue{const K& operator()(const K& key){return key;}};typedef typename HashBucket<K, K, SetKeyOfValue, Hash>::iterator iterator;iterator begin(){return _hb.begin();}iterator end(){return _hb.end();}iterator find(const K& key){return _hb.Find(key);}iterator erase(const K& key){return _hb.Erase(key);}std::pair<iterator, bool> insert(const K& key){return _hb.Insert(key);}private:HashBucket<K, K, SetKeyOfValue, Hash> _hb;};};

unordered_map

文檔介紹

unordered_map是存儲<key, value>鍵值對的關聯式容器，其允許通過keys快速的索引到與其對應的 value。

在unordered_map中，鍵值通常用于惟一地標識元素，而映射值是一個對象，其內容與此鍵關聯。鍵和映射值的類型可能不同。

在內部,unordered_map沒有對<kye, value>按照任何特定的順序排序, 為了能在常數范圍內找到key所對應的value，unordered_map將相同哈希值的鍵值對放在相同的桶中。

unordered_map容器通過key訪問單個元素要比map快，但它通常在遍歷元素子集的范圍迭代方面效率較低。

unordered_maps實現了直接訪問操作符(operator[])，它允許使用key作為參數直接訪問value。

它的迭代器至少是前向迭代器。

代碼實現

#include"HashBucket.hpp"namespace lee {template<class K, class V, class Hash = _Hash<K>>class unordered_map{public:struct MapKeyOfValue{const K& operator()(const std::pair<K, V>& kv){return kv.first;}};typedef typename HashBucket<K, std::pair<K, V>, MapKeyOfValue, Hash>::iterator iterator;iterator begin(){return _hb.begin();}iterator end(){return _hb.end();}iterator find(const K& key){return _hb.Find(key);}iterator erase(const K& key){return _hb.Erase(key);}std::pair<iterator, bool> insert(const std::pair<K, V>& data){return _hb.Insert(data);}V& operator[](const K& key){std::pair<iterator, bool> ret = _hb.Insert(make_pair(key, V()));return ret.first->second;}private:HashBucket<K, std::pair<K, V>, MapKeyOfValue, Hash> _hb;}; };

總結

以上是生活随笔為你收集整理的C++ STL : 模拟实现STL中的关联式容器unordered_map/unordered_set的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：高级数据结构与算法 | 哈希：哈希冲突
下一篇：海量数据处理（一）：位图与布隆过滤器的

c/c++

C++ STL : 模拟实现STL中的关联式容器unordered_map/unordered_set

目錄

unordered_map/unordered_set

unordered_map/unordered_set與map/set的區別

底層哈希桶的改造

仿函數

Key值的獲取方法

hash（key）的轉換方法

迭代器

完整代碼

unordered_set

文檔介紹

代碼實現

unordered_map

文檔介紹

代碼實現

總結