Sections

Wiera Policy Guide

Wiera is a policy-driven key-value storage system for a geo-distributed cloud environment. Wiera policy is represented by a JSON file. An application will be required to define the local instances in this file. The hierarchy of the JSON file is shown below.

  id
  host_list
    local_instance_1
      storage_tiers
        tier_1
        tier_2
        ...
      events
        event_1:
          response_1
          response_2
          ...
        event_2:
          response_1
          response_2
          ...
        ...
      local_instance_2
        ...

            

A Wiera server can manage arbitrary numbers of local instances. Each local instance is organized by several storage tiers and events that used to handle the tasks of manipulation of data, such as storing and forwarding.

The policies mentioned in the Tiera paper and Wiera paper are implemented. Potential users and developers can download these policy files from here. In this guide, a primary-backup consistency model is taken as an example. The primary-backup protocol is a well-known consistency model that chooses a single instance as the primary data storage, all other instances will forward data request to this primary. In our case, two instances are employed, aws-us-east and aws-us-east-2. The aws-us-east is chosen as the primary storage. Whenever it receives data, it will store the data locally and broadcast this data to all other instances, namely, the aws-us-east-2. Besides, it can only find data on its own storage tiers when handling a data retrieval request. In contrast, the aws-us-east-2 will forward the data retrieval and storage request to aws-us-east.

Example: Primary-backup consistency

  1. The policy's id and hosts

    {
      "id": "primary_backup",
      "host_list": {
                        

    The id and host_list are the two root elements of a Wiera policy. Every policy must have an id element whose value is a string. This string value is usually used to describe the function of this policy. Moreover, host_list will keep all the configuration of local instances and relevant responses.


  2. Primary instances

        "aws-us-east": {
                        

    Inside the host_list, the application should give the hostname of each local instance. In our example, we use Amazon Web Services aws-us-east and aws-us-east-2 as our local instances.


  3. Tier configuration

          "storage_tiers": [
            {
              "tier_name": "ebs-st1",
              "tier_size": "5GB",
              "tier_type": 2,
              "tier_location": "ebs-location",
              "tier_expected_latency":  10,
              "default":  true,
              "primary":  true
            }
          ],
                        

    For each local instance, the applications also should specify the used tiers and their configuration. In our case, we use a single tier in the aws-us-east.

    • tier_name: A string used to identify this tier. The tier_name should be unique in a single host.
    • tier_size: The initial size of this tier. See info about the unit.
    • tier_type: The storage tier. In our case, we use a local disk, indicated by 2, as the storage. See info about the storage.
    • tier_location: A string used to indicate the storage location in this tier. For example, this value is the directory name when the tier type is SDD or HDD.
    • tier_expected_latency: future development.
    • default: A boolean value. True indicates that this tier is the default storage when there is more than one tier in the local instance.
    • primary: A boolean value. True indicates that this local instance is primary storage. This value is optional.

  4. Local instance's events

          "events": [
            {
              "event_type": "ActionPut",
              "event_trigger": [
                "ActionPut"
              ],
              "event_conditions": {},
                        

    The events element contains different types of event that used to handle different conditions. In our simple example, the ActionPut and ActionGet events are necessarily required.

    • event_type: A string used to indicate the event type. The ActionPut is usually used to handle the storage event.See all provided types of events.
    • event_trigger: An array of string used to indicate all the actions that can trigger this event. The ActionPut is triggered when the client-side executes a set command, which is used to store data.See all provided event triggers.
    • event_conditions: An object used to specify the arguments for an event. Because the ActionPut event does not require any arguments, it is left empty.

  5. Event's responses

              "responses": [
                {
                  "response_type": "Store",
                  "response_parameters": {
                    "to": ":ebs-st1"
                  }
                },
                {
                  "response_type": "Broadcast",
                  "response_parameters": {
                    "to": [
                      "all"
                    ]
                  }
                }
              ]
            },
                        

    Responses are the core of an event. The responses element contains a bundle of responses, which tells the Wiera how to react when this event is triggered. As introduced, the primary storage stores the data locally and broadcasts it to all other instances.

    • response_type: A string used to indicate the number of the response.
    • reponse_parameters: An object used to specify the arguments for a response. See all the responses and their parameters

      The argument required for the Store is called to whose value is a string that is split into two part by a colon. The first part is the name of an instance. The empty, in our case, indicates to be the local instance. The second part is the name of a storage tier. Combining the response_type and the response_parameters together, it means to store the data into the local ebs-st1 tier when this event is triggered.

      The Broadcast response requires the same name argument, to, but with a type of array. The array specifies the destination. The value of all means all the instances except itself.


  6. Data retrieval event

            {
              "event_type": "ActionGet",
              "event_trigger": [
                "ActionGet"
              ],
              "event_conditions": {},
              "responses": [
                {
                  "response_type": "Retrieve",
                  "response_parameters": {}
                }
              ]
            }
          ]
        },
                        

    The above ActionGet event and Retrieve response are commonly used and straightforward. The ActionGet is triggered when the client-side executes a get command. Along with the get command, the application is also required to supply a key of the data. The Retrieve response will find the data depending on the given key.


  7. Non-primary instance

        "aws-us-east-2": {
          "storage_tiers": [
            {
              "tier_name": "ebs-st1",
              "tier_size": "5GB",
              "tier_type": 2,
              "tier_location": "ebs-location",
              "tier_expected_latency":  10,
              "default":  true,
              "primary":  false
            }
          ],
          "events": [
            {
              "event_type": "ActionPut",
              "event_trigger": [
                "ActionPut"
              ],
              "event_conditions": {},
              "responses": [
                {
                  "response_type": "ForwardPut",
                  "response_parameters": {
                  "to": "aws-us-east:"
                  }
                }
              ]
            },
            {
              "event_type": "ActionGet",
              "event_trigger": [
                "ActionGet"
              ],
              "event_conditions": {},
              "responses": [
                {
                  "response_type": "ForwardGet",
                  "response_parameters": {
                    "to": "aws-us-east:"
                  }
                }
              ]
            }
          ]
        }
      }
    }
                        

    The non-primary instance has a same structure as the primary instance except the responses. In the non-primary instance, the ForwardPut and ForwardGet responses are used to define the behavior of the Wiera. Both ForwardPut and ForwardGet require the same argument as the Store response. Moreover, the ForwardPut and ForwardGet behave as a client's set and get commands that only need to connect to an instance without knowing the underlying storage tiers. Therefore, only the destination instance's name needs to be specified in the to argument.

Policy Reference

  1. Policy Events

    Events are used to organize triggers and responses. There are four first-class events in Wiera. ActionPut event and ActionGet event are primarily used when an application store or retrieve data. Timer event can be employed when some actions need to be executed periodically. A set of Monitoring events are used to checking different conditions of storage tiers and attributes of stored data, such as the used space of a tier. When they satisfy an application-defined condition, the corresponding event is triggered.

    With using an event, the application also needs to determine the event_trigger array in the policy, which defines the actions that used to trigger this event. Three actions are possible: ActionPut, ActionGet, and Timer.

    The requirement on the event_conditions is determined by event_trigger and event_type. Whenever an event can be triggered by Timer or using the Timer event, the event_conditions should contain the period parameter with a whole number unit in millisecond. The ActionPut event/action and the ActionGet event/action do not require parameters. The set of Monitoring events require different parameters for each.

    All the following parameters and their types are in term of JSON.

    • ActionPut

      ActionPut is usually used to organize responses with an ActionPut trigger in its event_trigger array.

      event_conditions

      event_triggers

      • ActionPut

    • ActionGet

      ActionGet is usually used to organize responses with an ActionGet trigger in its event_trigger array.

      event_conditions

      event_triggers

      • ActionGet


    • Timer

      Timer event is triggered periodically. It is usually used to backup data or do the periodic checking.

      event_conditions

      • key: string. The key is used by the storage interface to store the data, and retrieve by the Retrieve response.
      • value: string. The data need to be stored.
      • to: string. The destination for storing this data. It must have the tiername proportion. The format of the value of to is pre-defined. A string is separated by a colon, e.g. "hostname:tiername". If the hostname is empty, it will automatically be replaced by its own hostname. In other words, it goes to the local instance. If the tiername is empty, the decision on choosing the destination's tier will be answered by the destination instance.

      event_triggers

      • Timer [ActionGet] [ActionPut]

    • MonitoringColdData

      MonitoringTierCapacity event is used to check the time length that a key-value pair has been not modified in the scope of an instance. Applications are required to specify the time threshold for defining the cold data.

      event_conditions

      • threshold: string. The value of period is used to define the cold data. See the time format.
      • period: number.

      event_triggers

      • Timer | ActionGet | ActionPut

    • MonitoringTierCapacity

      MonitoringTierCapacity event is used to check the available storage capacity of a tier. If it exceeds a threshold, then the responses of this event will be run. MonitoringTierCapcity event requires at least one event_trigger.

      event_conditions

      • tier_name: string. The tier_name is used to identify a tier that will be monitored.
      • percent: number. A whole number is used to define the threshold. For example, the total capacity is 1GB and the value of percent is 50. When the used space is beyond 512MB, then this event will start its responses.

      event_triggers

      • Timer | ActionGet | ActionPut
  2. Policy Responses

    Responses are the core of Wiera. Applications have the ability to combine different basic responses together to construct a complex model as they desired. The data flow between responses is crucial to use responses. Applications must provide the required parameters for a response before using it. Three data flows merge together while using any response. A response can get parameters from its application, previous responses, and policy. Parameters provided applications are very limited. When an application calls the set key value command from the Wiera client, it can supply two parameters to the response: {"key": "app_key" ; "value" : "app_value"}. When an application call the get key command, it can only supply one parameter to the response: {"key" : "value"}. Policy and previous responses are two important parameter sources. Policy can provide any parameter in the response_parameters element. For example, the Store response provides a to parameter to define the location of data storage in the policy. The limitation of policy specified parameters is that applications must be able to define them before using. In other words, policy specified parameters are static. In contrast, each response can set an arbitrary number of parameters for the following responses in the real time. For example, the Store response will set the last_modified_time parameters, which may be used by following responses. In the implementation, parameters that come from previous responses have the higher priority than those that are defined in the policy.

    • Store

      Store response is normally used inside an ActionPut event. It stores the key-value pair into the destination storage.

      • Input parameters
        • key: string. The key is used by the storage interface to store the data, and retrieve by the Retrieve response.
        • value: string. The data need to be stored.
        • to: string. The destination for storing this data. It must have the tiername proportion. The format of the value of to is pre-defined. A string is separated by a colon, e.g. "hostname:tiername". If the hostname is empty, it will automatically be replaced by its own hostname. In other words, it goes to the local instance. If the tiername is empty, the decision on choosing the destination's tier will be answered by the destination instance.
      • Output parameters
        • version: number. If versioning function is enabled, the version number is incremented by one each time when the data with the same key is stored. Otherwise, this number is always -2.
        • tier_name: string. The destination's tiername proportion.
        • last_modified_time: number. The finish time of the storing.
        • tag: string. If the previous responses generate a tag parameter, then the value of the tag is set to the same. Otherwise, set it to an empty string.

    • Retrieve

      Retrieve response is used to obtain data from a storage tier. The key usually comes from the Wiera client side. And source storage tier can be chosen based on their storage types. So the application can provide no parameters for this response in the policy. And this response will set a value parameter as output.

      • Input parameters
        • from: string. The format of this string should be same as the to of Store response. This parameter is an optional parameter. Without this parameter, Wiera will choose the default source tier according to their storage types.
        • key: string. The key is used to obtain the data. The application may never provide a static key in the policy. In most case, the key is from the input of the Wiera client side.
      • Output parameters
        • value: string. The data that is associated with the key.

    • Queue

      Queue response is used to push a key-value pair into multiple destinations. The pushing process is done in the background. The application can also specify the number of worker threads used to push this data. The default number is 5. Moreover, the version of this key-value pair is required and used to coordinate the version between instances.

      • Input parameters
        • key: string. The key of the data. It usually comes from the Wiera client side.
        • value: string. The data usually comes from the Wiera client side.
        • version: number. The version is usually set by the previous responses, such as Store. The application may not set a static version in the policy.
        • worker_cnt: number. THe number of the worker threads for pushing. The default value is 5.
        • to: array of string. The only available value in this array is "all". "all" means all the storage instances except itself. The reason to use an array instead of a string is for future extension.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • Broadcast

      Same as Queue response, except the Broadcast is done in the foreground, and use the default number of worker threads to push the key-data pair.

      • Input parameters
        • key: string. The key of the data. It usually comes from the Wiera client side.
        • value: string. The data usually comes from the Wiera client side.
        • version: number. The version is usually set by the previous responses, such as Store. The application may not set a static version in the policy.
        • to: array of string. The only available value in this array is "all". "all" means all the storage instances except itself. The reason to use an array instead of a string is for future extension.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • SearchKeys

      SearchKeys response is used to find all the keys of data that has the same attributes, such as dirty.

      • Input parameters
        • target_locale: string. The keys are found in the scope of a single tier. The tier_name is used to identify the tier.
        • query_type: array of string. The value of what specifies the common attribute of the data. And the keys of these data are the searching targets See the attributes currently supported. The reason to use an array instead of a string is for future extension.
      • Output parameters
        • key_list: array of string. The search result.

    • Copy

      Copy response is used to copy one kind of data from one storage tier to another. Applications usually need to provide a SearchKeys response before the Copy response to organize the key_list. Applications can also specify the rate of the copy process. This feature is useful when the user's server has limited network or disk I/O resource.

      • Input parameters
        • to: string. The to specifies the destination of the Copy response. The format of string is "hostname:tiername".
        • from: string. The from specifies the source of the Copy response. The format of string is "hostname:tiername".
        • rate: number. The value defines the copy rate. By default, it is set to 5.
        • key_list: array of string. The key_list contains all the keys of data that is going to be copied. Applications may never provide a static list of keys in the policy. In most cases, the key_list is generated by the SearchKeys response.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • Move

      Move response is used to move one kind of data from one storage tier to another. Applications usually need to provide a SearchKeys response before the Move response to organize the key_list. Applications can also specify the rate of moving process. This feature is useful when the user's server has limited network or disk I/O resource.

      • Input parameters
        • to: string. The to specifies the destination of the Copy response. The format of the string is "hostname:tiername".
        • from: string. The from specifies the source of the Copy response. The format of the string is "hostname:tiername".
        • rate: number. The value defines the copy rate. By default, it is set to 5.
        • key_list: array of string. The key_list contains all the keys of data that is going to be copied. Applications may never provide a static list of keys in the policy. In most cases, the key_list is generated by the SearchKeys response.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • ForwardGet

      ForwardGet response is used to forward the client's get request from one instance to another instance. The forward operation is totally transparent. From the view of the destination instance, the received get request is exactly same as the request that is directly sent from a Wiera client. From the view of a client, the request is directly served by the destination instance.

      • Input parameters
        • to: string. The value identifies the destination instance. The format of the string is "hostname:". An application does not need to provide the tiername. The storage tier is chosen by the destination instance.
        • key: string. The key is used to obtain the data. The application may never provide a static key in the policy. In most case, the key is from the input of the Wiera client side.
      • Output parameters
        • value: string. The returned data from the destination instance.
        • result: boolean. True if successful, false otherwise.

    • ForwardPut

      ForwardPut response is used to forward the client's set request from one instance to another instance.

      • Input parameters
        • to: string. The value identifies the destination instance. The format of the string is "hostname:". An application does not need to provide the tiername. The storage tier is chosen by the destination instance.
        • key: string. The key is used to obtain the data. The application may never provide a static key in the policy. In most case, the key is from the input of the Wiera client side.
        • value: string. The data need to be forwarded.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • Shrink

      Shrink response can be used to reduce the storage size of a tier in runtime. The response is often used inside a MonitoringTierCapacity event.

      • Input parameters
        • tier_name: string. The string indicates the tier whose storage size will be reduced.
        • percent: number. A whole number is used to specify the reduced percentage.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • Grow

      Grow response can be used to increase the storage size of a tier in runtime. The response is often used inside a MonitoringTierCapacity event.

      • Input parameters
        • tier_name: string. The string indicates the tier whose storage size will be reduced.
        • percent: number. A whole number is used to specify the increased percentage.
      • Output parameters
        • result: boolean. True if successful, false otherwise.

    • Compress

      Compress data.

      • Input parameters
        • value: string. The value required to be compressed.
      • Output parameters
        • value: string. The compressed result.
        • result: boolean. True if successful, false otherwise.

    • UnCompress

      Decompress data.

      • Input parameters
        • value: string. The value required to be decompressed.
      • Output parameters
        • value: string. The decompressed result.
        • result: boolean. True if successful, false otherwise.

    • Encrypt

      Encrypt data with the encryption algorithm specified in the instance's configuration file. See more about configuration file.

        • Input parameters
          • value: string. The value required to be encrypted.
        • Output parameters
          • value: string. The encrypted result.
          • result: boolean. True if successful, false otherwise.

    • Decrypt

      Decrypt data with the encryption algorithm specified in the instance's configuration file.

      • Input parameters
        • value: string. The value required to be encrypted.
      • Output parameters
        • value: string. The encrypted result.
        • result: boolean. True if successful, false otherwise.

    • LockGlobalRead

      LockGlobalRead is key-oriented lock mechanism. The responses between LockGlobalRead / LockGlobalWrite response and UnLock response is a critical section. A key's global read lock can be acquired by multiple clients at the same time, but the acquiring of global write lock will be blocked. A key's global write lock can be acquired by exactly one client at any time, all other acquiring of global write lock or global read lock of the same key will be blocked.

      • Input parameters
        • key: string. The identifier of the read-lock.
      • Output parameters
        • global_lock: object. The generated read-lock.
        • result: boolean. True if successful, false otherwise.

    • LockGlobalWrite

      Generate a write-lock for a specific key.

      • Input parameters
        • key: string. The identifier of the write-lock.
      • Output parameters
        • global_lock: object. The generated write-lock.
        • result: boolean. True if successful, false otherwise.

    • UnLock

      Release an acquired lock.

      • Input parameters
        • global_lock: object. The lock object that is required to be released. Applications should never provide this parameter manually, should use either LockGlobalWrite or LockGlobalRead before this response.
      • Output parameters
        • result: boolean. True if successful, false otherwise.
  3. Tier types

    The tier type is used to indicate the storage media. Seven types of tier types are provided.

    Storage typeValue
    0Memory
    1SSD
    2HDD
    3Cloud Storage
    4Cloud Archival
    5Wiera Instance
  4. Cloud Storage Interface

    The cloud storage type is used to indicate the cloud storage interface. Users can also define their own customized storage interfaces. See how to define the storage interface.

    Cloud Storage typeValue
    0Amazon Simple Storage Service (S3)
    1Microsoft Azure Storage
    2Google Cloud Storage
  5. Query Type

    The query type is used to indicate the common attribute of data. The SearchKeys responses will find all the keys of the data that satisfies the query condition.

    Query TypeDescription
    oldestThe key of the data that has the oldest modification.
    newestThe key of the data that has the newest modification.
    allKeys for every data.
    dirtyThe key of the data that is different in two tiers.
  6. Time format

    Time formatDescription
    "120h"120 hours
    "120"120 seconds