With the advent of alternative data across multiple industries, many companies are setting up a ‘Chief of Data’ role along with data science teams that leverage use cases on alternative data. In spite of this spotlight on the domain, however, the industry is not yet fully aware of the nature and opportunities of alternative data, nor appreciate the complexities of processing it.
A few of the key challenges that provides / suppliers, users and sources of alternative data face today, and going forward include:
- Value in Diverse Sources: since any data apart from static or market data is classified as
alternative data, it exists across formats: text, speech, images, multi-lingual and even hard copies. This type of data can also be structured or unstructured. Even though vendors specialize in niche domains, it is difficult for end users to know what type of alternative data is available for them. With the entry of intermediates and platforms, this challenge is slowly being resolved by matching the right vendors and users.
- Diverse Requirements: when analyzing data that relies on sentiments (news, social media, etc.), the algorithm that calculates the “score” is a black box. Information may be lost or diluted, and each user will have different requirements on what the need from a particular
type of alternative data. In the case of hedge funds, the hidden alpha maybe lost. Each use
case and its different requirements bring a huge challenge to how data should be preprocessed. One size may not fit all, and it becomes even trickier to sell as an off-the-shelf product.
- Processing and Relevance: even though alternative data can be a game changer to its users, the data itself will not be reliable if it is not processed properly. Unstructured data may require Natural Language Processing (NLP), and sometimes the integrity of the data may not be verifiable. It may not be a sizeable challenge to purchase the data, but how to use the data is key. Due to a lack of understanding and limited availability of technologies, many users may only leverage a shallow portion of the information.