bugÉtoilé - Bugs: bug #8584, bad html parsing

Show feedback again

You are not allowed to post comments on this tracker with your current authentification level.

bug #8584: bad html parsing

Submitted by:  Nicolas Roard <rio>
Submitted on:  Tue Feb 27 02:24:08 2007  
Category: Grr / RSSKitSeverity: 3 - Normal
Priority: 1 - LaterStatus: None
Privacy: PublicAssigned to: Guenther Noack <guenther>
Open/Closed: OpenOperating System: None

Wed Feb 28 02:43:09 2007, comment #3:

I have a minimal part of hpricot under

It uses Ragel to generate code.
But once code is generated, it is pure C without dependency.
So there is no problem of using it.
It compiles slow, but runs fast.

Yen-Ju Chen <yjchen>
Project Member
Tue Feb 27 23:46:59 2007, comment #2:

I had the problem on the étoilé blog feed :-)

The strange thing is, now you tell me, I remember the html parser code in grr. I even remember it kinda working (apart from images). Yet the articles from this feed showed the html entities.

For hpricot, it's just an idea -- without having looked into it myself I won't say much more.

For the text loader idea in gnustep, the architecture is not appropriate for, say, a webbrowser. But it's more than enought for a RSS viewer imho.

Nicolas Roard <rio>
Project Administrator
Tue Feb 27 22:13:16 2007, comment #1:

Grr has a HTML parser. You can find it at Components/ArticleView/NSString+TolerantHTML.[mh]. When the HTML parser encounters any errors that cause it to throw an exception, the article view falls back to plain text display. Could you please give me a copy of the feed that made problems with HTML parsing?

Concerning hpricot:

I'll have a look if hpricot makes sense for the article view component. The current HTML parsing code can be roughly divided in two parts: the parser itself (which reads in the tags and the text) and the interpreter (which translates the tags into an attributed string). I hope to be able to replace the parser with hpricot here.

Concerning the GNUstep text loader idea:

The last time I've looked at that, the text loader architecture looked inappropriate to me to seamlessly fit in the Grr HTML parsing code. But maybe I'm wrong there.

Another thing that comes to my mind concerning the text loader is that GNUstep has a pretty strict no-external-dependencies policy,
which means that hpricot is a no-no unless it is small enough to be put into GNUstep as well.

Guenther Noack <guenther>
Project MemberIn charge of this item.
Tue Feb 27 02:24:08 2007, original submission:

grr has no html parser apparently, so feeds aren't rendered properly. Ideally it should support at least basic html tags (i,b,br,ul,li,ol,img,url). A possible solution is to use hpricot; and if we have an html parser that "output" an nsattributedstring, we could move that to a gnustep text loader :) (eg, fix that "bug" upstream).

Nicolas Roard <rio>
Project Administrator


No files currently attached


Depends on the following items: None found

Items that depend on this one: None found


Carbon-Copy List
  • -unavailable- added by yjchen (Posted a comment)
  • -unavailable- added by rio (Submitted the item)

    Do you think this task is very important?
    If so, you can click here to add your encouragement to it.
    This task has 0 encouragements so far.

    Only logged-in users can vote.


    Error: not logged in



    Follow 5 latest changes.

    Date Changed By Updated Field Previous Value => Replaced By
    Sat Nov 1 19:28:04 2008guentherStatusWont Fix=>None
    Sat Nov 1 19:27:50 2008guentherStatusIn Progress=>Wont Fix
    Thu Mar 1 09:44:25 2007guentherStatusNeed Info=>In Progress
    Wed Feb 28 12:19:20 2007rioSummaryNo html parser=>bad html parsing
    Tue Feb 27 22:13:16 2007guentherStatusNone=>Need Info
    Show feedback again

    Back to the top

    Powered by Savane 3.1-cleanup