ZF-4253: Memcache should support tags

Description

I have extended the Memcached backed to implement a tagging system. I have not yet load tested or extensively bug tested this. My improvement is for a specific project, but I've adapted my extended class for submission as a potential improvement. Only a couple of methods are actually over-ridden here.

The basic idea is that you use a cached array to track the keys. When you save keys you retrieve the cached key array, add the ID being saved as another entry into each tag it is being associated with, and save the updated tag array. To remove by tag you look up the tag in the global tags array and loop through the entries deleting each one from memcache and updating the cached tag array.

Here is my example implementation:

class KeyEnabled_Memcached extends Zend_Cache_Backend_Memcached
{

    private function getTagListId()
    {
        return "MyTagArrayCacheKey";
    }
    
    private function getTags()
    {
        if(!$tags = $this->_memcache->get($this->getTagListId()))
        {
            $tags = array();
        }
        return $tags;
    }
    
    private function saveTags($id, $tags)
    {
        // First get the tags
        $siteTags = $this->getTags();
        
        foreach($tags as $tag)
        {
            $siteTags[$tag][] = $id;
        }
        $this->_memcache->set($this->getTagListId(), $siteTags);        
    }
    
    private function getItemsByTag($tag)
    {
        $siteTags = $this->_memcache->get($this->getTagListId());
        return isset($siteTags[$tag]) ? $siteTags[$tag] : false;
    }

    /**
     * Save some string datas into a cache record
     *
     * Note : $data is always "string" (serialization is done by the
     * core not by the backend)
     *
     * @param  string $data             Datas to cache
     * @param  string $id               Cache id
     * @param  array  $tags             Array of strings, the cache record will be tagged by each string entry
     * @param  int    $specificLifetime If != false, set a specific lifetime for this cache record (null => infinite lifetime)
     * @return boolean True if no problem
     */
    public function save($data, $id, $tags = array(), $specificLifetime = false)
    {
        $lifetime = $this->getLifetime($specificLifetime);
        if ($this->_options['compression']) {
            $flag = MEMCACHE_COMPRESSED;
        } else {
            $flag = 0;
        }
        $result = $this->_memcache->set($id, array($data, time()), $flag, $lifetime);
        if (count($tags) > 0) {
            $this->saveTags($id, $tags);
        }
        return $result;
    }

    /**
     * Clean some cache records
     *
     * Available modes are :
     * 'all' (default)  => remove all cache entries ($tags is not used)
     * 'old'            => remove too old cache entries ($tags is not used)
     * 'matchingTag'    => remove cache entries matching all given tags
     *                     ($tags can be an array of strings or a single string)
     * 'notMatchingTag' => remove cache entries not matching one of the given tags
     *                     ($tags can be an array of strings or a single string)
     *
     * @param  string $mode Clean mode
     * @param  array  $tags Array of tags
     * @return boolean True if no problem
     */
    public function clean($mode = Zend_Cache::CLEANING_MODE_ALL, $tags = array())
    {
        if ($mode==Zend_Cache::CLEANING_MODE_ALL) {
            return $this->_memcache->flush();
        }
        if ($mode==Zend_Cache::CLEANING_MODE_OLD) {
            $this->_log("Zend_Cache_Backend_Memcached::clean() : CLEANING_MODE_OLD is unsupported by the Memcached backend");
        }
        if ($mode==Zend_Cache::CLEANING_MODE_MATCHING_TAG) {
            $siteTags = $newTags = $this->getTags();
            if(count($siteTags))
            {
                foreach($tags as $tag)
                {
                    if(isset($siteTags[$tag]))
                    {
                        foreach($siteTags[$tag] as $item)
                        {
                            // We call delete directly here because the ID in the cache is already specific for this site
                            $this->_memcache->delete($item);
                        }
                        unset($newTags[$tag]);
                    }
                }
                $this->_memcache->set($this->getTagListId(),$newTags);
            }
        }
        if ($mode==Zend_Cache::CLEANING_MODE_NOT_MATCHING_TAG) {
            $siteTags = $newTags = $this->getTags();
            if(count($siteTags))
            {
                foreach($siteTags as $siteTag => $items)
                {
                    if(array_search($siteTag,$tags) === false)
                    {
                        foreach($items as $item)
                        {
                            $this->_memcache->delete($item);
                        }
                        unset($newTags[$siteTag]);
                    }
                }
                $this->_memcache->set($this->getTagListId(),$newTags);
            }
        }
    }
}

Comments

tags system for memcache, APC or xcache backends are not reliable because the system can drop any of your record to make some space for other records

so your proposal is good but not reliable

if memcache drop your MyTagArrayCacheKey record, what about CLEANING_MODE_NOT_MATCHING_TAG or CLEANING_MODE_MATCHING_TAG ?

in SVN trunk, there is a completly new backend called "TwoLevels" : - the fast level can be memcache backend - the slow level (but reliable) can be the file backend

This virtual backend has plenty of advantages and of course support tags (throw the reliable backend)

Please have a look at this.

The lack of memcached tags support missing should atleast be documented somewhere.

Furthermore, it's a "hack" so to speak to make work for file cache aswell, so i don't see any reason why not to implement for Memcached aswell. However this solution, while rather simple, does not offer good performance.

This proposes one big array for all the keys and their tags, which does not work very well if you use millions upon millions of keys (enough to make /dev/shm/cache based "filecache" to crawl due to huge directory sizes), and would translate to huge amounts of downtime on such systems, due to constant memory over consumption.

Plus, when saving a new key you would need to load that, and save it there, not working therefore.

I propose it should work something along these lines: - tag based index keys, ie. tag1, tag2 have: internal-metadata-tag1, internal-metadata-tag2 index arrays cached - index array for different tags, ie. containing all the tags currently being employed. works if relatively small amount of different tags.

As for the memcached possibly cleaning them away: I doubt because these are constantly accessed, therefore constantly fresh, therefore not going to be emptied. To be certain of this, a temporary directory as a backup for these could be used, or solely for these, ie. file cache. However, that option neither is optimal, unless used against /dev/shm (What's similar for Windows?). If this data disappears, flush all in memcache (make this optional)

there is an alternative way to handle this

  1. create a tag record as a key in memcache and set its value (version) initially to '0'

  2. for each tagged key you store/retrieve in memcache, get the value of the tag to create a key extension of something like [tagname][version] ........ .

  3. if you want to invalidate all the keys for a particular tag, just increment the value stored in its tag value.

it means that you have to access memcache to get the tag version number for each tagged key read or write, but since its just a single integer it should be very fast. it only needs to be read once for each tag, so reads/writes of keys in the same tag could use a locally cached version number.

example: if you create a tag called "fred" and store a key called "smith" against it, the key generated would be fred=>0 for the tag and fred_0_smith=>data for the key/value

incrementing the key would result in reads against fred_1_smith which would be the same as invalidating the "smith" key and all others associated with that tag.

Unused values would eventually evict from memcache by lru action. You may have to take special action to stop the tag records from evicting.

This proposal also has issues with write concurrency.

ie: 2 threads are saving at the same time and get the current value of the MyTagArrayCacheKey array. Each thread adds tag=>id entries to the array. Thread 1 writes the modified MyTagArrayCacheKey array, then thread 2 writes its modified MyTagArrayCacheKey array. MyTagArrayCacheKey does not have tag entries from thread 1.

What about the memcached-tag project?