Nuances of English Language & Analytics

Language was invented by humans and not by computers.  The complexity of any language with reference to a context is unfathomable. Here are some sample statements – “The bandage was wound around the wound”, “The farm was used to produce produce”, “We must polish the Polish furniture”, “The insurance was invalid for the invalid”, “Since there is no time like the present , he thought it was time to present the present”……..

English language has more than 500,000 words and research shows that an average user knows around 25,000 words. This gives an idea of the diversity in vocabulary and nuances available to users of English.

English language has many misnomers, paradoxes, ambiguities, homophones, homonyms, diverse phonetics for same vowel, distinct adjectives for positive and negative connotations, near synonyms, slangs, regionalisms and to make matters worse the current generation has devised their own ‘SMS’ or ‘Texting’ language.

English is a crazy language. If a vegetarian eats vegetables, what does a humanitarian eat? If Peanut oil is extracted from peanuts, where does ‘baby oil’ come from? Have noses that run and feet that smell? If one who paints is a painter, would a person who draws be a drawer? The unique lunacy of English language makes it a herculean task to be analyzed by computers.

Natural Language Understanding (NLU) and Natural Language Processing (NLP) in artificial intelligence that deals with machine reading comprehension have been trying to analyze the language. As Wikipedia says “The process of disassembling and parsing input is more complex than the reverse process of assembling output in natural language generation because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are pre-determined when outputting language.”

Natural Language Understanding (NLU) and Natural Language Processing (NLP) in artificial intelligence that deals with machine reading comprehension have been trying to analyze the language. As Wikipedia says “The process of disassembling and parsing input is more complex than the reverse process of assembling output in natural language generation because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are pre-determined when outputting language.”

The Natural Language Processing algorithms allow computers to process and understand human languages. This involves computational linguistics and applications in human language technology. This covers areas such as sentence understanding, machine translation, probabilistic parsing and tagging, information extraction, grammar induction, word sense disambiguation, entity recognition etc

With everyone talking about ‘Big Data’ there is considerable commercial interest in this field because of its application to news-gathering, text categorization, voice-activation, archiving and large-scale content-analysis. Enterprises need to take advantage of NLP to find intelligence in their Meta data.