Igor's Website - Articles - Text editing library for .NET

Science, stories, art and music.

Science / Computer science articles.

Text editing library for .NET

Link parsing, Latin to Cyrillic conversion, and HTML cleanup are little things that are quite often necessary to do as part of a bigger project. I’ve written this library to simplify certain tasks for myself and for anyone else who might find this library useful.

Latin to Cyrillic and Cyrillic to Latin conversion

Some sites require two scripts to be implemented which, obviously, requires some kind of translation to be made. This can be a bit tricky if the editing is done on the raw HTML, where text can contain both Latin and Cyrillic letters and the context determines whether the text should be converted or not. This library contains methods for converting between Latin and Cyrillic text and HTML with preserving the structure of the HTML (note that this is for Serbian Cyrillic script).

Sometimes, certain parts of the actual content need to remain in Latin while the rest is converted to Cyrillic (like names, places, etc.). This is why I added the {[lat]} tags. Everything in-between these tags will remain in the Latin script after conversion.

In the library, there are methods to convert HTML or plain text to Latin or Cyrillic.

HTML and link parsing

The library can also be used for stripping the text of HTML (the RemoveHtml() method is used for this) or to create a plain HTML formatted text out of the actual HTML. The latter allows HTML to be presented as text in an HTML document without disturbing the markup of the HTML document it is presented in (the HtmlToText() method does this).

Links can be parsed from plain text to the standard a tags. Alternatively, the same method can be used for parsing links with specifying the delegate to apply to each found link.

Formatting MSO tags

There is a somewhat annoying thing that occurs when pasting from Microsoft Office Word to some HTML editors: the resulting HTML contains tags with mso classes for formatting. Also, when pasting most of the paragraph tags contain style information which usually needs to be removed or reformatted for the actual use in application.

This class contains a method MsoToHtml() which can be used to partially or completely reformat the pasted text (this depends on how complex the pasted formatting is). Basically, the method will remove all formatting from paragraph tags, reformat the font and span tags to u, i or b tags and reformat the lists. Most of the alignment will be lost however.


Google Code Prettify

I use Google Prettify to format the source code in my articles. If the code is displaying in one line, you can try opening the page in a different browser.

Request software design

If you wish to have a specific application designed, contact me at software@igorsevo.com. If you want to know more about what I do, check out my home page and Science page.

Support this site

Suggest an article

You can suggest an article or ask a question on the Questions page.