Valhalla Legends Forums Archive | Advanced Programming | Internal variable system.

AuthorMessageTime
EvilCheese
As part of the functionality of my current project, I would like to allow both the user and program itself to declare and use an arbitrary number of internal variables represented by ASCII strings.

These variables should be able to hold values of types Integer, Float, and String, and should be organised hierarchically, example usages are:

Somerootvariable.somesubvariable.someinteger = 5 (integer assign)
Somerootvariable.somesubvariable.somestring = "This is a string."

print Somerootvariable.somesubvariable.somestring
>This is a string.

Some of these values will be used in performance dependant areas of code, and as such should resolve to their values in the shortest time possible, whilst creating the minimum required overhead to manage.

What would be the best way to implement such a system?
June 26, 2003, 2:22 PM
Eibro
I'm not sure I understand what you're asking, but couldn't you use a union?

[code]union Variable
{
std::string varString;
int varInt;
double varDouble;
// etc ...
};[/code]
June 26, 2003, 5:00 PM
EvilCheese
A union on its own will not suffice, for a few reasons:

1- The variables need to be able to be manipulated via the application's internal console, hence they need to be both creatable and referencable via actual text-based names. The examples I included above are designed to be typed into the application's console.

Furthermore, it is not possible to use static pre-hashing, since the user should be able to create a variable with ANY name during program execution.

2- The variables need to support hierarchy, which means including the potential for each variable to reference multiple child variables as well as parent variables.

3- The variable values need to be both assigned and read via a common interface, and remain accessible throughout the entire run-time of the program. Implicit type-conversion of some kind is required.

June 26, 2003, 6:04 PM
Eibro
[code]#include <map>
#include <string>

union Variable;

typedef std::map<std::string, Variable> VarList;

union Variable
{
enum { VAR_STRING, VAR_INT, VAR_DOUBLE, VAR_ETC } MyType;
std::string varString;
int varInt;
double varDouble;
// etc ...

VarList childVars;
};[/code]What about that? Not sure if a union will work; that might have to be changed to a struct with an internal union. Also, childVars might have to be a pointer... not sure.

Actually, I don't think that will work. Perhaps something like this:
[code]#include <map>
#include <string>

struct Variable;

typedef std::map<std::string, Variable> VarList;

struct Variable
{
enum { VAR_STRING, VAR_INT, VAR_DOUBLE, VAR_ETC } MyType;
union
{
std::string varString;
int varInt;
double varDouble;
// etc ...
} VarCollection;
VarList childVars;
};[/code]
June 26, 2003, 6:40 PM
EvilCheese
Looks promising so far.

I have to say so far yours is looking a little neater than mine (though that could just be my STL phobia).

I'm using a class CVariable which offers various overloaded assignment operators and evaulators, as well as the ability to recursively search within a particular child tree.

I'm wrapping a dynamically sized list of those within a class CVariableManager, which allows transparent and type-safe use of the contained variables.

Hmmm, how would you access a specific variable from low on the tree using your implementation?

Say you get given "System.Camera1.FOV", what method would you use to resolve it to its value, and how would you implement situations where an entire tree needs to be created at once (a child variable is assigned on a currently non-existing tree)?
June 26, 2003, 8:08 PM
Eibro
Input = "System.Camera1.FOV"
So, we'd go something like this:
[code]VarList g_Base;

// ...
// Input = "System.Camera1.FOV"
// RHS = 1.25
// This would occour in a statement like System.Camera1.FOV = 1.25;
// ...

using std::string;
typedef string::size_type StringPos;

StringPos PreviousPeriod = 0, Period = Input.find(0, '.');
VarList& CurrentSub = g_Base;
while (Period != string::npos)
{
   string NextInputSub(Input.begin() + PreviosPeriod + 1, Input.begin() + Period);
   CurrentSub = CurrentSub.childVars[NextInputSub];

   Period = Input.find(PreviousPeriod + 1, '.');
}

switch (RHS.MyType)
{
case VAR_STRING:
   CurrentSub.VarCollection.varString = RHS.VarCollection.varString;
   break;
case VAR_INT:
   CurrentSub.VarCollection.varInt = RHS.VarCollection.varInt;
   break;
case VAR_DOUBLE:
   CurrentSub.VarCollection.varDouble = RHS.VarCollection.varDouble;
   break;
case VAR_ETC:
   // Etc ...
   break;
};[/code]You get the idea. You'll need to parse the RHS variable to see what type it is, and set MyType accordingly beforehand. Like, if (RHS.find('.') == string::npos && std::find_if(RHS.begin(), RHS.end(), isalpha) == RHS.end()) // RHS is probably type VAR_DOUBLE
June 26, 2003, 8:40 PM
Eibro
Thought of some more stuff. You might want to add VAR_NVAR to the MyType enum to signify that it's not a variable, or hasn't been assigned a value as of yet. Then, adding a default constructor you could initially create all variables as VAR_NVAR[code]struct Variable
{
Variable() : MyType(VAR_NVAR) {};

enum { VAR_NVAR, VAR_STRING, VAR_INT, VAR_DOUBLE, VAR_ETC } MyType;
union
{
std::string varString;
int varInt;
double varDouble;
// etc ...
} VarCollection;
VarList childVars;
};[/code]
In your assignment code you could check whether two variables are of the same type. If they're of differing types, then convert the RHS variable to the LHS variable type.
[code]if (CurrentSub.VarCollection.MyType != VAR_NVAR && CurrentSub.VarCollection.MyType != RHS.MyType)
{
// Variables are of differing types
// Convert RHS to CurrentSub.VarCollection.MyType and do assignment
}[/code]Another thought, you might want to make childVars a pointer to a VarList. This would compilcate assignments and querys though as you'd need to check CurrentSub.childVars for NULL-ness before traversing it. Also, you'd need to set childVars to NULL in the contstuctor, and delete it in the destructor.
June 26, 2003, 9:21 PM
Skywing
I'm not sure if the compiler will even let you declare objects which require a constructor/destructor in a union. It really doesn't make sense, because the compiler doesn't know when it should call constructors and destructors for each object. You could use pointers, but I still think it's an ugly idea ;)

My recommendation would be to use a base class for Variable and derive from it for each type, if you're set on that kind of model.
June 27, 2003, 12:47 AM
Eibro
[quote author=Skywing link=board=23;threadid=1704;start=0#msg13037 date=1056674863]
I'm not sure if the compiler will even let you declare objects which require a constructor/destructor in a union. It really doesn't make sense, because the compiler doesn't know when it should call constructors and destructors for each object. You could use pointers, but I still think it's an ugly idea ;)

My recommendation would be to use a base class for Variable and derive from it for each type, if you're set on that kind of model.
[/quote]Tis true, you can't declare any type which has a constructor/destructor inside a union. Also, my compiler complained about 'childVars' not being a pointer, so I guess it has to be a pointer rather than optionally implementing it as one to save memory.

As for the std::string problem, move it outside the union and treat it as a special case (ugly)
Could make it a string pointer (ugly)
Could make it a char array (a bit better, but imposes an upper-limit on string size)
Or, make it a char* (must allocate/reallocate in a transparent fashion, better than a char array though)

Infact, it might be a good idea to implement each variable as a pointer. That way you could have an array of any type desired: System.Camera[1].FOV instead of System.Camera1.FOV

I thought about an OO approach (base class Variable) with derived classes for each variable type, but how would you implement storage for each variable type? a void*?
June 27, 2003, 1:25 AM
Adron
Do you really need them to be hierarchical? Just by allowing dots in the names you could have them seem hierarchical but still store them all in the same list. Unless you're allowing references (pointers) to variables to be stored in variables, I don't see what difference actually storing them hierarchically would make.

Storage for each type of derived class is not a problem - a derived class can be (and normally is) larger than the base class. You just add the member variables you need for the particular variable type.

If variables can change types (say at an assignment) and you allow references to variables, I see trouble with using a base class and subclasses for each type, since you can't dynamically change between the subclasses without deleting and making a new variable. Which would invalidate the references.

I'd solve the "no class with constructor in union" problem by storing a std::string* instead. When you're working with strings you'll be doing allocations anyway, then you might as well allocate the string object too.


June 27, 2003, 2:03 PM
EvilCheese
[quote]
Do you really need them to be hierarchical? Just by allowing dots in the names you could have them seem hierarchical but still store them all in the same list. Unless you're allowing references (pointers) to variables to be stored in variables, I don't see what difference actually storing them hierarchically would make.
[/quote]

That's almost how I'm handling it, the CVariableManager class stores a flat dynamically sized array of pointers to variables which share a common interface for allocation/deallocation, data retrieval and evaluation.

The hierarchy effect is acheived through storing an array of pointers to children and a pointer to parent within each variable.

Essentially, the hierarchy system is more for user convenience and a conceptual aid.... though with the recursive search method I use for variable retrieval, organisation as a tree dramatically cuts evaluation time for a specific variable, since only one branch at any level needs to be searched to find it.

In terms of organisation, I use 3 "top level" node variables: User, System, and Templates. All other declared variables are underneath one of these, this is all managed virtually transparently by the manager class and a set of macros for assignment/retrieval.

I'll probably post code when the project is at least mostly complete. :)
June 27, 2003, 3:19 PM
iago
Use a hashtable, hash the entire variable string (including the periods). Of course, that's just faking the heiarchical part, but meh.
June 27, 2003, 11:13 PM
Brolly
You could use a linked list, give each symbol name an ID, and resolve it from the ID.
June 30, 2003, 2:34 AM
herzog_zwei
Store everything into a hash map of variants using a string as the key. Create a variant class that'll handle the usual variables as well as a hash map. While you're resolving your variable hierarchy and encounter a variant type of VT_HASHMAP, you shift to the next node in the hierarchy and recursively call the resolution function using that hash map.

Hash maps are good with the right hash functions. Storing everything as a hash map lets you reuse the same code and you wouldn't have to deal with so many special cases and varying data structures.

The reason you'd want a hash is for speed in locating a single item. The reason you want the hierarchy is for speed in locating groups of items. You shouldn't consider faking the hierarchy if the reason you want to have a hierarchical variable name system is not just cosmetics but for access speeds.
July 1, 2003, 6:27 AM
Adron
Note that the hierarchical storage won't improve speed other than in special cases: If you use a hash map - never, if you use a c++ stl map - only if you get a reference to a node a ways down into the hierarchy and use it to find several of its child variables.
July 1, 2003, 9:06 PM
herzog_zwei
That really depends, but it'll improve in more situations than that. If you were talking about a perfect hash then, yes, it will definitely be slower. However, that generally wastes lots of memory and isn't what you'd want to use if you do a lot of dynamic adds/deletes of keys. In real life scenarios where you have lots of keys and want it to be dynamic, a hierarchical hash map can generally be faster than a regular hash map. Regular hash maps have static settings that work well up to a certain point (and aren't set to hog up as much storage space as perfect hashes). Once you exceed that (or you have a bad hash function), you will generate more colllisions. The hierarchy would reduce the amount of potential collisions you will get. Other benefits of a hierarchy include being able to cache the hash map for a node (so they don't need to be looked from the top level) and being able to iterate through all members of a classification quickly rather than having to iterate through all members of all classes.
July 2, 2003, 1:31 AM
EvilCheese
In a situation using some exhaustive lookup methodology, then a hierarchical system will obviously convey a lot of benefits merely by reducing the number of lookups you have to do to reach a given node on your tree.

However, I would tend to agree with Adron that if you are using a hashtable as lookup, that hierarchical arrangement would convey little in the way of obvious benefits, and would likely be slower in many situations, especially for trees with many levels, assuming you use true hierarchy.
July 3, 2003, 12:05 AM
Arta
I've recently implemented something like this for TestBNCS. It works by hashing the variable (Key) name, and saving that in a hashtable. Data is all stored as VOID* within nodes in the hashtable, the GetKey() function just returns a VOID* that can be cast to whatever - the caller is expected to know what type the variable they're grabbing is. Addition of keys to the system is accomplished with a set of overloaded functions that all call an AddKey(Key, (LPCVOID)&Value, ValueLength) function.

I like this system so far, it's fast and efficient. Currently, I'm using a large hashtable and am not allowing collisions - for performance - but in the future I might make the hashtables entries into linked lists, depending on the performance hit. I don't think there will be many collisions anyway, since there are several different sets of data - not all data is lumped into the same hashtable - so it should work out well.

We shall see :)
July 3, 2003, 8:09 PM
herzog_zwei
[quote author=Arta[vL] link=board=23;threadid=1704;start=15#msg13747 date=1057262941]
I like this system so far, it's fast and efficient. Currently, I'm using a large hashtable and am not allowing collisions - for performance
[/quote]

Yeah, hash maps are one of the most flexible ways of storing dynamic data and it's very efficient if you use good hash functions. Is the hash table static, did you implement a perfect hash, or are you just disallowing insertions of keys that would cause collisions?

[quote]
- but in the future I might make the hashtables entries into linked lists, depending on the performance hit.
[/quote]

You might want to consider using an expandable/collpasible array instead of linked lists since it'll give better performance overall. Pick a reasonable growth size for your dynamic array based upon what you think would be the average number of collisions your hash function will produce. Expand/collapse the array only when needed (based upon how many entries are being used in the array and what your growth size is) and your insertions/deletions won't suffer much of a hit. You'll definitely fragment the heap less and you gain the speed/ease of using an array.

If possible, you might also want to consider using enumerations or pointers (to static strings) as the key instead of full-blown dynamic strings.

Once you have all that going, you might want to have it use real variants instead of VOID*, just so you can call it complete (and possibly to deal with automatic allocation/deallocation of memory for certain types, something that a simple VOID* system isn't good at). You don't suffer much of a hit from it but it does make life easier. I'd avoid having a single overloaded function to Set/Get all types of data (other than a variant type) so that you can do something like SetFloat(key, 0) and turn it into a float instead of using Set(key, 0), where it can be a interpreted to be a any of a string/number/pointer/etc (not to mention C++ doesn't allow overloading of return types so Get would pose a problem). Your Get functions can then be made to do automatically coerce data when possible. Of course, automatic coercion is a source of bugs in most people's hands, so you'd have to weigh the risks/benefits and consider how it'd be used by others.
July 5, 2003, 5:00 PM
Arta
Collisions are a concern. I'm not very pleased with my current workaround - which is to create a hash array with vastly more elements than I'll ever need - so I will very likely address this problem. The hashing function I'm using is reputedly quite good, testing it is another thing on my list. I'm currently disallowing keys that would cause collisions, and quitting with a fatal error. This is acceptable only because this is a development version and would obviously never be released or used until that's been changed.

However, most of this is moot. There are currently no dynamically named keys, they are all hard-coded, and this is unlikely to change. The only reason I'm using this system instead of a struct (or similar) is that not all values will be needed or used. Thus, this seems like a more efficient use of memory than to create a struct containing all possible members, most of which will be unused at any one time.

I am already automatically allocating and freeing memory. Since all of the overloaded functions call AddKey(Key, (LPCVOID)&Value, ValueLength), memory is simply allocated according to the value of ValueLength and is freed when the key is removed. The problem you pose with regards to floats and other ambiguous data is easily solved by casting the value to be added so that the appropriate overloaded function will be called:

[code]
SetFloat(key, (float)0);
[/code]

This is also unlikely to be a problem, though. The only types commonly used are bool, DWORD, and char*, with an occasional FILETIME. In the unlikely event that anything else is needed, it can just be added by calling AddKey explicitly.
July 7, 2003, 5:43 PM
Adron
[quote author=Arta[vL] link=board=23;threadid=1704;start=15#msg14085 date=1057599828]
Collisions are a concern. I'm not very pleased with my current workaround - which is to create a hash array with vastly more elements than I'll ever need - so I will very likely address this problem.
[/quote]

To address that problem... The simplest problem is to store the colliding elements in sequence in the hash table. When you lookup a key, you'll first look in the hash slot for the key, then in the following slots until you find an empty slot. This eventually causes clustering.

Once method #1 isn't efficient enough, you can have a hash function with two inputs to produce the slot for colliding inputs, or use some other method to store them nonsequentially.
July 7, 2003, 8:47 PM
Arta
[quote author=Adron link=board=23;threadid=1704;start=15#msg14097 date=1057610869]
To address that problem... The simplest problem is to store the colliding elements in sequence in the hash table. When you lookup a key, you'll first look in the hash slot for the key, then in the following slots until you find an empty slot. This eventually causes clustering.
[/quote]

Oh, I'm already doing that. When i say I'm not allowing collisions, I mean of actual key name hashes, not of table indexes. 2 keys that create the same hashtable index will both be stored - 2 keys that create the same hash will be rejected - the reason being that when retrieving keys, it's much faster to compare the hashes than it is to compare the names.

Sorry, didn't explain that very well before.

July 7, 2003, 11:45 PM

Search