Material and methodsIn this section we will take a close look at complexitiesinvolved in the recognition process. Number of Character ShapesIn Arabiceach letter can have di?erentshapes depending on its position i.e. initial, middle and ending.
Some lettersjoin with other letters from both sides, some join from only one side and somedo not join at all. Each connected piece of characters is also known asligature or sub word. Thus a word can consist of one or more sub words. In Urduthe shape of the character not only depend on its position but also on thecharacter to which it is being joined. The characters change their shape inaccordance with the neighboring characters. This feature of Nastaliq is alsoknown as context sensitivity.
Thus in Urdu the possible shapes of a singlecharacter are not limited to 3 but it can have many more shapes depending onthe preceding and following characters. Among these classes character hamza (?)do not join from any side and make only one primary shape while all othercharacters connect form either right or both sides. Di?erent shapes of charter bay (?) when joined withcharacters from di?erentclasses at di?erent positions. SloppingThe calligraphic nature of Nastaliq also introducesslopping in the text. Slopping mean that as the new letters are joined withprevious letters, a slope is introduced in the text because the letters arewritten diagonally from top right to bottom left. One of the major advantagesof slopping is that it conserves a lot of writing space.Slopping also means that characters no more joinwith each other on the baseline which is an important property in Naskh.
It isutilized in the character segmentation algorithms for Arabic/Persian text. Sothe character segmentation algorithms designed for Arabic/Persian text cannotbe applied on the Urdu text. Number of character shapes and slopping makesNastaliq character segmentation most challenging task in the whole recognitionprocess and till now in our knowledge not a single algorithm exists whichpromises decent results in segmentation of sub words into individual characters.This is also one of the main hurdle which keeps most of the researchers awayfrom accepting the challenge of Nastaliq character recognition. StretchingAnother very important property of the Nastaliqstyle is stretching. Stretching means that letters are replaced with a longerversions instead of their standard version.
Some characters even change theirdefault shape when stretched i.e. seen (?) however some only change theirwidth. The purpose of stretching is not only to bring more beauty into the characterbut it also serves as a tool for justification. Justification means that the text meets the boundaries of the bounded areairrespective to the varying length of the sentences. However it should be notedthat not every character in Urdu can be stretched. For example alif (?), ray (?),daal (?) cannot be stretched but bay (?), seen (?) and fay (?) can bestretched.
It should also be noted that stretching works closely with thecontext sensitive property of Nastaliq and certain class of characters can onlybe stretched when joined with another character of a certain class or writtenat a certain position (initial, medial and end). All these attributes ofstretching show that stretching is a complex procedure and it also increases thecomplexity in machine recognition. Standard Nastaliq fonts used in the printsnormally do not support stretching. However it is commonly used in the titlesof the books and calligraphic art. So if we are dealing only with machineprinted Nastaliq text, we normally do not need to worry about stretching, butif we are dealing with calligraphic or handwritten Nastaliq document, there isa huge possibility that we have to deal with stretched version of characters. Positioningand SpacingLike stretching, positioning and spacing are animportant tool for justification in Nastaliq and are also used for the beautificationof text. Positioning means the placement of ligatures and sub words in Nastaliqand spacing means the space between two consecutive ligatures. In normalsituations the ligatures are written to right of previous ligature with a smallstandard spacing.
But positioning allows the ligatures to be placed at di?erent positions such as new ligature is startedsomewhere from the top of previous ligature or it can be placed right above iteven if it is a part of another word. Positing will not care even it had tooverlap and connect two ligatures if the need arises. Unlike stretching,positioning is quite common and used extensively in the news heading in theUrdu print media industry because of its extreme power to accommodate long andbig headings in small spaces in the paper. All these flexibilities andstrengths of Nastaliq make it real challenge for the machine recognition. Onone hand context sensitivity and sloping makes the character segmentation avery di?cult task and on the other hand positioningmakes even the ligature and sub word segmentation equally more di?cult.