In the digital age, information is the currency of kings. The abundance of content has empowered people with knowledge available at the touch of a screen. Social media is the epicenter of the information highway with terabytes of data flying across the globe daily. From ‘see-me-having-coffee’ selfies to ‘let-me-show-you-how-to-bake’ videos, the choices and feeds flowing in are staggering. While the social networking aspect remains at the core, markers embedded unwittingly within the information can provide crucial insights to industry. And with millions of consumers digitally integrated, these markers are hard to ignore, yet like picking needles in a proverbial haystack.
The Bombay Stock Exchange, a client, presented a peculiar security problem. Any material news or rumors floating around in digital media (Twitter, Facebook, Regional News websites, e-newspapers, blogs etc) have serious potential to impact the sentiments of the investing population which can further impact the price/volumes of securities being traded on exchange platforms. While the BSE keeps an eye on such news, views & recommendations, the physical scrutiny of such large volume of data poses obvious challenges. Datametica suggested a systematic solution that relies on machine learning techniques to track listed companies on digital media.
The objectives to achieve were:
- Remove dependency on manual/physical scanning
- Ensure broader coverage of suspected articles by identifying suspected rumors on digital media
- Robust Alert Generation Mechanism (AGM) for continuous seamless updates
- Achieve efficiency and speed in rumor verification process
For that, Datametica - Big data and Analytics Company designed & built a robust and scalable Rumor Analytics solution that scans, captures and generates alerts for suspicious news feeds pertaining to its listed securities on the digital medium including social media. The solution relies on machine learning techniques to track news related to listed companies on digital media like Twitter, Facebook, e-newspaper, blogs, regional newspapers. etc.
The end-to-end solution was developed on a distributed system – Hadoop, to support the economies of scale.
3 Steps involved were:
- Developing a base framework for ingesting content from digital media (Twitter, Facebook, news websites, blogs, e-newspaper etc.)
- Building a Statistical Model using Machine Learning algorithms to process and identify suspicious content
- Building dashboards and alerts
Datametica automated the process of reading and tagging content using machine learning algorithms, and worked with various types of machine learning algorithms as well. Based on a tradeoff of scalability and performance, Support Vector Machine was found to be the best. Unstructured data sources (Digital Media) were converted into tokens/words for processing. Further noise was filtered using Natural Language Toolkit. Support vector machines consider this dimensional space of tokens/words for generating scores based on weights obtained in a data driven manner. The Bombay Stock Exchange (BSE) provided keywords which are looked for in tagged articles. Datametica refined the untagged set by understanding surveillance team’s business process and trained the statistical algorithm.
The solution led to improved operational efficiency of the surveillance team at BSE while setting a base that remains steadfast through future rumor challenges.
- Broader coverage of digital media
- Increased investor confidence
- Improved productivity & transparency
- Reduced processing time
- Minimum human intervention
“Digital media analytics solution implemented by DataMetica provided us a framework to mitigate the potential risks of stock manipulation and information asymmetry” - - Ashish Chauhan, CEO, Bombay Stock Exchange