Valhalla Legends Forums Archive | General Programming | Draft: Pointer Education

AuthorMessageTime
JoeTheOdd
I just tossed this together over about an hour, and thought you guys might be interested.

OpenOffice
RTF
January 4, 2006, 1:20 AM
Myndfyr
A lot of this is factually inaccurate.

[quote]
Well, first things first. What is a pointer? In most simple terms, its a number.. a number that points. On 32-bit systems, a pointer is a 32-bit unsigned integer value, commonly referred to as a DWORD. On older, 16-bit systems, its a 16-bit unsigned integer, a WORD. On newer 64-bit systems, its a QWORD. A pointer points to a location in memory where the data it corresponds with is placed at.
[/quote]
This is inaccurate; pointer size is determined by the operating system with general limits based on the architecture of the system.  Windows 3.1, for example, ran on 32-bit machines such as the Intel 80386, but still used 16-bit pointers, as did all real-mode DOS applications. 

Furthermore, in older systems such as the Intel 8088, 8086, and 80186, pointers were not always limited to 16-bit; these systems had a 20-bit address bus, and in order to access the full megabyte of memory afforded to them, memory was segmented.  To access memory in segments beyond the code and data segments (CS and DS respectively), the processor used two 16-bit pointers to specify the segment and offset by left-shifting the segment register by four bits and adding it to the offset, effectively creating a 32-bit pointer system that only allowed access to 20 bits worth of memory.  32-bit pointers were also called "far" pointers in reference to the fact that they were used to access memory in another memory segment.  The notation for memory access in this manner was segment-register:offset, where offset could be a constant or register, and segment-register could be ES, DS, CS, or SS.  The location of the current instruction was specified at CS:IP.

[quote]
Although strings can be passed by value (VB: ByVal), they are almost always passed by reference (VB: ByRef), due to multiple reasons, including messing up WORD alignment on the stack. I haven't scoured the documents looking for disproof of this, but if I am correct, a string is never passed by reference in any Win32 API call.
[/quote]
Strings cannot be passed by value unless they are the size of a WORD, DWORD, or QWORD.  The string pointer is passed by value.

In calling conventions, passing by value means that the data is copied to the stack where it is accessed by the function; modifying the data via the name of the parameter within the function only modifies the data on the stack.  OTOH, passing by reference specifies passing an indirection on the stack, which is effectively passing a pointer.  Modifying the data of a reference means that the value of the original value will change also.

For example:
[code]
using namespace std;

void doNothing(int a) {
  a = 5;
}
void doSomething(int &a) {
  a = 10;
}
void doSomethingElse(int *a) {
  *a = 20;
}
int main(void)
{
  int a;

  a = 2;
  cout << a << endl; // output: 2
  doNothing(a);
  cout << a << endl; // output: 2
  doSomething(&a);
  cout << a << endl; // output: 10
  doSomethingElse(&a);
  cout << a << endl; // output: 20

  return 0;
}
[/code]

As you can see, the pointer and reference functions behave identically; both change the original data.  That brings us to the one distinction between pointers and references: the location in memory pointed-to by a pointer can change, where the location pointed-to by a reference cannot.  In the reference example, we explicitly assigned the value 10 directly to the variable with no indirection, yet indirection was implied.  OTOH, if we were to assign the value 20 to a when a was a pointer (as opposed to a reference), we would likely generate a runtime error because we were changing *where* the pointer pointed instead of the value.  This semantic difference is understood by the compiler, and in practice, both functions should generate effectively the same code (of course, with the constant differences).

[quote]Pointers are simply a location in memory, and as such, they don't point to a specific data type.[/quote]
That is false, depending on whom you ask.  In C, this practice is common; in C++, strong type-casting was added for stronger type checking by the compiler.  However, pointer variables are typed, and while they can be recast so that memory is interpreted differently, the variables themselves have type.

[quote]In the C and C++ languages, you can get the pointer of a variable by adding an ampersand to the beginning of its name, so that a variable named m_lMyLong would become &m_lMyLong[/quote]
That is not necessarily true.  To be precise, one adds the address-of operator to the expression.  The address-of operator only works on rvalues; when applied to an lvalue, it creates a reference.  For instance:
[code]
int &someReference = getMeAPointer;
[/code]
doesn't set someReference as a pointer to getMeAPointer (because you would have to dereference it with the dereference operator, *).  It rather sets someReference as a reference to getMeAPointer.  As stated earlier, a semantic difference, but still an important one.

[quote]For example, the StarCraft Installer's CD-Key verification process has the pointer in ecx, and loops 12 times, each time incrementing eax and doing some math stuff to the byte in the memory location [ecx+eax].[/quote]
I don't have the disassembly in front of me, but I doubt that this is the case.  ECX is, by convention and as defined by Intel, the counter register; the base address (the string pointer) is much more likely in EAX with ECX being the indexer.  As I said though, I do not have the disassembly in front of me, so I do not have a basis for factual dispute, except that this would violate convention.
January 5, 2006, 6:27 PM
JoeTheOdd
iago's thread on the assembly thing seems to have suddenly disapeared. Anyhow, I had that in front of me when I wrote this.

I'll have to contact you on AIM about byval string passing, because I don't quite understand that. =/
January 6, 2006, 12:01 AM
JoeTheOdd
Changes:

Revised second paragraph
[quote] Well, first things first. What is a pointer? In most simple terms, its a number.. a number that points. At the current point in time, most people use a 32-bit operating system, that is, NT kernels between 4.0 and 5.1 (Windows Whistler/XP), and non-NT kernels 95 through ME. Alternatively, you might be using a 64-bit edition of Windows NT 5.1 or Windows Longhorn/Vista. Chances are nobody uses DOS or Windows 3.11 anymore, but for completeness, they are 16-bit systems. The length of a pointer is the same number of bits as your system is described as. In fact, when you say you are using a 32-bit OS, you are saying that your OS uses 32-bit addressing. The 8080 had a feature called 20-bit addressing, which allowed the OS to use 20-bit memory addresses instead of 16-bits.

If you've ever wondered why you can't run Windows XP on an 8080 processor, and just assumed it was because it would be too slow (it would), you aren't totally right. An operating system isn't the only limiting factor in how long your addresses can be. The 8080 processor was limited to 16-bits (or 20-bits, with 20-bit addressing). The 80x86_32 (or just 80x86) is limited to 32. The 80x86_64 processor is the only one that supports 64-bit addressing.[/quote]

Properly labeled "Pointers in Assembly" as an appendix.
January 6, 2006, 12:12 AM
Myndfyr
[quote author=Joe link=topic=13767.msg140709#msg140709 date=1136505704]
I'll have to contact you on AIM about byval string passing, because I don't quite understand that. =/
[/quote]

"ByVal" string passing occurs when you copy the pointer to the string to the stack.  This means that the pointer to the string in the original scope will not change if the pointer to the string changes in the callee.  It is essentially double indirection.

For example, this is a ByVal string function:
[code]
void printSomething(char* mystring) {
  cout << mystring << endl;
}
[/code]
This is a ByRef example:
[code]
void main()
{
  char* string1 = "Blarg!";
  cout << &string1 << ": " << string1 << endl; // outputs some memory address: Blarg!
  changeStringByVal(string1);
  cout << &string1 << ": " << string1 << endl; // outputs same memory address: Glarg!
  changeStringByRef(&string1);
  cout << &string1 << ": " << string1 << endl; // outputs a different memory address: Hello, world!
}
void changeStringByVal(char* stringData) {
  *stringData = 'G';
}
void changeStringByRef(char** stringData) {
  *stringData = "Hello, world.";
}
[/code]

Get it now?
January 6, 2006, 12:43 AM
Null
so why even bother posting all this crap when its not even correct? there are other ways to get kudo's you know....
January 6, 2006, 1:36 AM
JoeTheOdd
[quote author=effect link=topic=13767.msg140732#msg140732 date=1136511365]
so why even bother posting all this crap when its not even correct? there are other ways to get kudo's you know....
[/quote]Leave my topic. Now. Bye. =)

[quote]Get it now?[/quote]
ByVal string passing is a pointer to a pointer to a string.
ByRef string passing is a pointer to a string.
Right?
January 6, 2006, 1:39 AM
Myndfyr
[quote author=Joe link=topic=13767.msg140733#msg140733 date=1136511555]
[quote author=effect link=topic=13767.msg140732#msg140732 date=1136511365]
so why even bother posting all this crap when its not even correct? there are other ways to get kudo's you know....
[/quote]Leave my topic. Now. Bye. =)

[quote]Get it now?[/quote]
ByVal string passing is a pointer to a pointer to a string.
ByRef string passing is a pointer to a string.
Right?
[/quote]

No.  A pointer to a character array is a string.  This is ByVal, and the value of the pointer to the character array is copied to the stack.

ByRef is essentially a pointer to a string, which is a pointer to a pointer to a character array.  The value of a pointer to the original function's value is copied to the stack, which means that the original function's value can be modified by dereferencing 1 level.[quote author=Joe link=topic=13767.msg140714#msg140714 date=1136506368]
Changes:

Revised second paragraph
[quote] Well, first things first. What is a pointer? In most simple terms, its a number.. a number that points. At the current point in time, most people use a 32-bit operating system, that is, NT kernels between 4.0 and 5.1 (Windows Whistler/XP), and non-NT kernels 95 through ME. Alternatively, you might be using a 64-bit edition of Windows NT 5.1 or Windows Longhorn/Vista. Chances are nobody uses DOS or Windows 3.11 anymore, but for completeness, they are 16-bit systems. The length of a pointer is the same number of bits as your system is described as. In fact, when you say you are using a 32-bit OS, you are saying that your OS uses 32-bit addressing. The 8080 had a feature called 20-bit addressing, which allowed the OS to use 20-bit memory addresses instead of 16-bits.
[/quote][/quote]
This is incorrect.  The 8086, 8088, and 80186 had 20-bit addressing based on segmentation.  Also, saying that Win9x-based operating systems are fully-32-bit is inaccurate; they still support real-mode 16-bit drivers.

.[quote author=Joe link=topic=13767.msg140714#msg140714 date=1136506368][quote]
If you've ever wondered why you can't run Windows XP on an 8080 processor, and just assumed it was because it would be too slow (it would), you aren't totally right. An operating system isn't the only limiting factor in how long your addresses can be. The 8080 processor was limited to 16-bits (or 20-bits, with 20-bit addressing). The 80x86_32 (or just 80x86) is limited to 32. The 80x86_64 processor is the only one that supports 64-bit addressing.[/quote]

Properly labeled "Pointers in Assembly" as an appendix.
[/quote]
This is also incorrect.  First of all, I seriously doubt anyone has wondered why they can't run Windows XP on an 8080 processor.  Second, as stated before, an 8080 did not have 20 address lines.  Third, there is no such nomenclature as 80x86_32.  All 80x86 machines since the 80386 were fully 32-bit.  The 64-bit processors are called x64 (this is the AMD brand) and IA64 (Intel Architecture 64).  However, AMD's processor is incompatible at the instruction set level with the Intel processor; therefore, it is incorrect to even associate it with 80x86.
January 6, 2006, 7:16 AM
Quarantine
Okay heres a clear term:

ByVal passes the VALUE not the VARIABLE to the function
ByRef passes the VARIABLE to the function

Now say you pass a variable ByRef to a function, then whatever you do to that variable affects the variable you used as it's parameter.
January 6, 2006, 7:31 AM
iago
The tricky part is that a string is always a pointer.  So passing a string ByVal still passes the reference. 

[quote]
Well, first things first. What is a pointer? In most simple terms, its a number.. a number that points. On 32-bit systems, a pointer is a 32-bit unsigned integer value, commonly referred to as a DWORD. On older, 16-bit systems, its a 16-bit unsigned integer, a WORD. On newer 64-bit systems, its a QWORD. A pointer points to a location in memory where the data it corresponds with is placed at.
[/quote]
I really hate explaining this, because Windows has been using the wrong word and messed up a lot of people.  A "word" is NOT 16-bits.  A "word" is a variable type that has the same length as the processor supports, which is 16-bit on old systems, 32-bit on modern systems, and 64-bit on new systems.  Windows kept calling 16-bits a word on 32-bit platforms for reverse compatibility with software that wasn't written right to begin with.


Mynd -- you're being overly picky in some places. 
[quote author=MyndFyre link=topic=13767.msg140633#msg140633 date=1136485632]
[quote]Pointers are simply a location in memory, and as such, they don't point to a specific data type.[/quote]
That is false, depending on whom you ask.  In C, this practice is common; in C++, strong type-casting was added for stronger type checking by the compiler.  However, pointer variables are typed, and while they can be recast so that memory is interpreted differently, the variables themselves have type.
[/quote]
There's nothing false about that.  Pointers aren't, in general, strongly typed.  Data stored in memory obviously has type, everything stored in memory does, but who cares?  The pointers themselves don't have type.


[quote author=MyndFyre link=topic=13767.msg140633#msg140633 date=1136485632]
[quote]In the C and C++ languages, you can get the pointer of a variable by adding an ampersand to the beginning of its name, so that a variable named m_lMyLong would become &m_lMyLong[/quote]
That is not necessarily true.  To be precise, one adds the address-of operator to the expression.  The address-of operator only works on rvalues; when applied to an lvalue, it creates a reference.  For instance:
[code]
int &someReference = getMeAPointer;
[/code]
doesn't set someReference as a pointer to getMeAPointer (because you would have to dereference it with the dereference operator, *).  It rather sets someReference as a reference to getMeAPointer.  As stated earlier, a semantic difference, but still an important one.
[/quote]
First of all, the address-of operator IS &.  That's like saying, "1 + 5 doesn't use a plus to add values together, it's the addition operator!"  Pointing that out is just being picky. 

Second, the address-of operator DOES return a pointer.  It's perfectly valid to do this:
[code]int a = 5;
*&a = 6;
[/code]
In fact, this also works, if for some reason you wanted to:
[code]*&*&*&*&*&*&*&*&*&a = 6;[/code]
You can't assign an address with it, but nowhere does Joe say you can assign an address.  What he says is completely true, you're talking about an entirely different situation in your code, which just confuses the issue.

[quote author=MyndFyre link=topic=13767.msg140633#msg140633 date=1136485632]
[quote]For example, the StarCraft Installer's CD-Key verification process has the pointer in ecx, and loops 12 times, each time incrementing eax and doing some math stuff to the byte in the memory location [ecx+eax].[/quote]
I don't have the disassembly in front of me, but I doubt that this is the case.  ECX is, by convention and as defined by Intel, the counter register; the base address (the string pointer) is much more likely in EAX with ECX being the indexer.  As I said though, I do not have the disassembly in front of me, so I do not have a basis for factual dispute, except that this would violate convention.
[/quote]
I believe Joe made a mistake there.  Here is the code, although it should still be on the forum:
http://www.javaop.com/~iago/cdkey1.asm
But in any case, it doesn't matter.  What the variables are called has absolutely no relevance to the point of that section. 
January 7, 2006, 7:25 PM
Skywing
[quote author=iago link=topic=13767.msg140959#msg140959 date=1136661903]
The tricky part is that a string is always a pointer.  So passing a string ByVal still passes the reference. 
I really hate explaining this, because Windows has been using the wrong word and messed up a lot of people.  A "word" is NOT 16-bits.  A "word" is a variable type that has the same length as the processor supports, which is 16-bit on old systems, 32-bit on modern systems, and 64-bit on new systems.  Windows kept calling 16-bits a word on 32-bit platforms for reverse compatibility with software that wasn't written right to begin with.
[/quote]

Well, actually, at the time that name was correct.  Back in 16-bit windows, WORD == unsigned short == the native integer size for the processor.  The names WORD/DWORD stick around because people depended on them being 16-bit values (i.e. used them in structures written to files that had to be written by a 16-bit app and read by a 32-bit app, or something of that sort).  If the definition in the headers had been upped to 32 bits, then this would have broken.  Microsoft opt'd to go for ease of portability for developers porting 16-bit apps to 32-bit Windows.
January 7, 2006, 7:41 PM
iago
[quote author=Skywing link=topic=13767.msg140960#msg140960 date=1136662876]
[quote author=iago link=topic=13767.msg140959#msg140959 date=1136661903]
The tricky part is that a string is always a pointer.  So passing a string ByVal still passes the reference. 
I really hate explaining this, because Windows has been using the wrong word and messed up a lot of people.  A "word" is NOT 16-bits.  A "word" is a variable type that has the same length as the processor supports, which is 16-bit on old systems, 32-bit on modern systems, and 64-bit on new systems.  Windows kept calling 16-bits a word on 32-bit platforms for reverse compatibility with software that wasn't written right to begin with.
[/quote]

Well, actually, at the time that name was correct.  Back in 16-bit windows, WORD == unsigned short == the native integer size for the processor.  The names WORD/DWORD stick around because people depended on them being 16-bit values (i.e. used them in structures written to files that had to be written by a 16-bit app and read by a 32-bit app, or something of that sort).  If the definition in the headers had been upped to 32 bits, then this would have broken.  Microsoft opt'd to go for ease of portability for developers porting 16-bit apps to 32-bit Windows.
[/quote]

Yeah, but as a result, the name WORD refers to different (unpredictable) values on different systems. 
January 7, 2006, 8:34 PM
Myndfyr
[quote author=iago link=topic=13767.msg140959#msg140959 date=1136661903]
[quote author=MyndFyre link=topic=13767.msg140633#msg140633 date=1136485632]
That is not necessarily true.  To be precise, one adds the address-of operator to the expression.  The address-of operator only works on rvalues; when applied to an lvalue, it creates a reference.  For instance:
[code]
int &someReference = getMeAPointer;
[/code]
doesn't set someReference as a pointer to getMeAPointer (because you would have to dereference it with the dereference operator, *).  It rather sets someReference as a reference to getMeAPointer.  As stated earlier, a semantic difference, but still an important one.
[/quote]
First of all, the address-of operator IS &.  That's like saying, "1 + 5 doesn't use a plus to add values together, it's the addition operator!"  Pointing that out is just being picky. 
[/quote]
No, I'm not meaning to be picky.  & is two operators: the reference operator and the address-of operator.  & when creating a variable is the reference operator.  I was trying to be precise: Joe said this:
[quote]In the C and C++ languages, you can get the pointer of a variable by adding an ampersand to the beginning of its name, so that a variable named m_lMyLong would become &m_lMyLong[/quote]
By his statement, this code:
[code]
int &m_lMyLong;
[/code]
would create a reference variable.  As I said, I was not trying to be picky, but precise.

[quote author=iago link=topic=13767.msg140959#msg140959 date=1136661903]
What the variables are called has absolutely no relevance to the point of that section. 
[/quote]
Right, but it's supposed to be a reference document.  Would you trust a textbook that was essentially correct but factually inaccurate on details?
January 8, 2006, 10:14 AM
iago
The whole quote is turning into a huge headache, so screw it.

Well then, change my example from + to -.  The - operator could be used as -a or as b - a.  But I don't see anybody calling it the "binary subtraction operator" or "unary subtraction operator" when they're describing it. 

Yes, I would trust a book like that.  For example, in The Shellcoder's Handbook, they get a register backwards in one of their examples, and in another place they had a section about how the "epb" register is used.  And I still trust the book and consider it the best book ever written on the subject.  And that's an integral part of the book, unlike Joe's that was a passing reference. 
January 8, 2006, 10:46 PM

Search