Gaming SEC Filings Using Machine Learning to Detect Vectors and Sentiment in Reporting Language

Using FME we build an API to collect and clean the US federal Security & Exchange Commission quarterly filings from the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) website. Using FME to quickly pool the filing data we perform sentiment analysis on the cleaned unstructured Management Discussion & Analysis (MD&A) data. We implement word to vector strategies to tokenize the fairly boilerplate text and assign the companies into groupings of changer and non-changer companies. This is done mainly graphing deltas in cosine similarity in the tokenized word vectors and also using word count vector strategies to flag language unattractive to investment. The end goal for this analysis is to forecast abnormal returns and find diversification opportunities which align with our existing clients.
Presentation Details

Presenters:
Steven Cyphers

Presenter Company:
GHD

Event:
FME World Tour 2019

Industry:
AEC (Architecture Engineering and Construction)