

It's really very common to see such approaches taken to generate HTML.This part of the Web Standards Curriculum looks at the different codes that can be used to represent text characters when there is a need to escape them. Here is some pseudo-code loosely based on C#, although it is preferred to use valid methods and tools: public class HtmlAndXmlWriter Referring to your example, the reason why every occurrence of " is represented by ' could be because using that approach, you can safely use such "special" characters in both attributes and values.Īnother motivation I've seen is where people believe, "We must explicitly show that our symbols are not part of the syntax." Whereas, valid HTML can be created by using the proper string-manipulation tools, see the previous paragraph again. When some developers generate HTML without the use of special XML-oriented tools, they may try to be sure the resulting HTML is valid by taking the approach that everything must be escaped. In my experience it may be the result of auto-generation by a string-based tools, where the author did not understand the rules of HTML. Again, this is not incorrect, it's simply unnecessary. The result is that all "special chars" are encoded everywhere they occur in the generated HTML, without regard for the context in which they occur. Though there are optional arguments to htmlspecialchars() that will cause it to ignore quotes, those arguments were (and are) rarely used by authors of basic template-driven systems. Notably, PHP's htmlspecialchars() function is one such example. Some common server-side languages provided a single function for this purpose, which simply encoded all chars that might be invalid in some context within an HTML document. When HTML content is generated dynamically, for example, by populating an HTML template with simple string values from a database, it's necessary to encode each value before embedding it in the generated content. It is not incorrect, though it is now probably unnecessary, unless you're targeting some very archaic platforms. As a result, many "old-timers" may still do this reflexively. Though the HTML spec has never disallowed use of these chars in text content, it became fairly standard practice to encode them anyway, so that non-spec-compliant browsers and other processors would handle them more gracefully. Many years ago, I regularly encountered rendering problems in mainstream browsers resulting from the use of unencoded quote chars in regular text content of HTML/XHTML documents. There was a point where buggy/lazy implementations of HTML/XHTML renderers were more common than those that got it right.
