Text mining suite to solve business problems using sophisticated machine learning and natural language processing techniques.
What we offer:
Sentiment analysis module from dCrypt can be used to extract user opinions from product reviews, blogs, tweets, facebook posts etc. We have worked with several clients in extracting sentiment polarity from product reviews posted in Amazon, Sales performance reviews, tweets etc and help them understand what the users feel about their products and services.
Sentiment analysis component of dCrypt works by identifying the sentiment carrying words from text. It use NLP based techniques to separate sentiment carrying words from objects. Features are extracted from the text and then classified by the selected model. The model selection happens over training sample provided by the user from a range of advanced machine learning algorithms.
We have also an algorithm which can extract sentiment in an unsupervised manner incase the user doesn’t have a training sample. It works by using POS tagger to get tag for each word in document and query lexical resource SentiWordNet 3.0 to obtain the positive and negative polarity of each word based on the part of speech tag. The individual polarities are then aggregated to obtain document level sentiment score.
This module helps to match data from different sources when direct lookup is not feasible because of minor differences between the sources. For a given sample string, it is processed and matched with reference data by several matchers available with us. We have developed a heuristic to obtain best matching string in reference data.
Product mapping allows mapping of short text descriptions to output variables. It has been used in retail, FMCG industries to predict category, subcategory, brand, sub-brand and other variables based on the available product description. It is a supervised approach and requires training data for learning its parameters.
Topic Modelling is a statistical approach for discovering topics in a collection of text documents based on statistics of each word. It identifies clusters of words and groups them together based on their similarity. We have used online learning algorithm for LDA (Latent Dirichlet Allocation) for identifying the topics in a set of documents and also set of words which constitute these topics. We use a large set of documents to identify the parameters of LDA. These parameters can later be used to predict the closest topic of a new document from a similar domain.