Valhalla Legends Forums Archive | Visual Basic Programming | problem with removing html

AuthorMessageTime
laurion
I dl'd a file off of pscode to remove html from whatever. I used it to remove the HTML from my aim messages. [ http://www.Planet-Source-Code.com/vb/scripts/ShowCode.asp?txtCodeId=40698&lngWId=1 ] I made 2 textboxes, one called txt7 and one called txt8 . I then did
[code]
'sets the received message into the textboxForm1.txt7.Text = "" & CStr(A2) & " - " & CStr(A6)
'removes the html from the textboxForm1.txt8.Text = RemoveHTML(Form1.label7.Text)
[/code]
Only problem is that the outcome is not right. For instance, if it puts in like
[code]Joe - <HTML><BODY BGCOLOR="#ffffff"><FONT LANG="0">hello</FONT></BODY></HTML>
[/code]
for whatever reason the outcome is just [code] Joe - [/code]
any help?
When i try this program how its supposed to be used, by opening the text file and then making the html to text, it works fine. But when i just copy it into mine, it messes up.
btw vb6
December 19, 2004, 1:27 AM
Myndfyr
So-- to be clear, you got code somewhere else and you want us to support it?

It seems like PSC lets you post comments, support requests, and the like on their page.  Why don't you try there -- or even to e-mail the author?
December 19, 2004, 9:19 AM
Quarantine
I still dont see why you cant just use Replace
December 19, 2004, 3:05 PM
St0rm.iD
[quote author=Warrior link=topic=9952.msg92936#msg92936 date=1103468759]
I still dont see why you cant just use Replace
[/quote]

Because that obviously wouldn't accomplish what he's trying to do.

s/<.*?>/ /
December 19, 2004, 4:43 PM
Mr. Neo
[quote author=Banana fanna fo fanna link=topic=9952.msg92938#msg92938 date=1103474603]
s/<.*?>/ /
[/quote]

This will not work as regular expressions are greedy and will replace everything starting from <HTML> to </HTML>.  This expression was the first one I thought of using and it turned out to work just a tad too well.  So, I went and created a little StripHTML function.

[code]
Function RemoveHTML(source As String) As String
  Dim a As RegEx
  Dim d As Integer
 
  a = New RegEx
  a.Options.DotMatchAll = True
  a.ReplacementPattern = ""
  a.Options.Greedy = False
 
  d = 1
  While d <> 0
    d = InStr(0, source, "<")
   
    If d <> 0 Then
      a.SearchPattern = "<.+>"
      source = a.Replace(source,0)
    End If
  Wend

  Return source
End Function
[/code]

Please note, this was made in REALbasic so you will have to change some things around to the VB equivalents.  This code is tested and working.  I have not tested it, and doubt that it would work, if there are <'s scattered throughout the message.  You could improve upon that if you wish.

Edit:  It will also hang if there are not the same number of > as <.
December 19, 2004, 5:19 PM
St0rm.iD
Well, in Python doing .*? will not be greedy due to the question mark. I don't know if it's the same syntax in other implementations, though.
December 19, 2004, 5:26 PM

Search