Top Banner
AI & Neural Network Application An Intelligent Integrated Financial Online Platform
17

Neural Networks Application in Fianace and Online Platforms

Nov 06, 2015

Download

Documents

aviruk

using AI and machine learning algorithms in finance platforms to generate insights

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • AI & Neural Network Application

    An Intelligent Integrated Financial Online Platform

  • Intelligent Online Fund Raising Platform

    Objective : Creating an Intelligent Integrated online Finance Platform which maps and matches

    the Fund Seekers with Fund Providers .

    Using Artificial intelligence , Machine Learning & Deep Learning Neural nets protocols creating

    a Recommendation System .

    The Tasks assigned to the AI system are :

    Recommending Similar projects to Fund Providers Based on their investment thesis

    and Preference .

    Matching the Right Fund Seeker with the Right Fund Provider.

    Constant Up gradation of the Investment Parameters of Fund Providers via not only

    incorporating data provided by the users but also incorporating public news and

    announcements .

    The portals primary work is to match the fund seekers with the right fund providers be it equity

    players of Debt provided by various banks .

    For example in India SME and MSEM loans have a higher probability of being cleared by banks

    who have been mandated to focus on such sectors and sizes like SIDBI , here the AI sends the

    fund seeker to SIDBI if their fund requirement parameters match SIDBI'S but if the fund seeker

    is already servicing a loan in another bank then the application goes to the bank which has the

    loan seekers existing debt portfolio even if the loan parameter fits SIDBI.

  • Our Algorithmic Recommendation system is a Multistage stage process :

    Stage 1 :

    Machine Learning Application Via deployment of An Artificial Neural Network whose task is to

    recognize patterns investors investment decisions and matching patterns and strategic fits

    between investors and investment seekers.

    Here we break down stage 1 into 2 parallel processes :

    1A:Supervised Learning using ANN - the results generated here become partially the input

    parameters for stage 2 recommender system which then over lays their recommendations on

    top of output of 1A and the intersection of both is the final recommendation presented in front

    of the users .

    1B:Unsupervised Learning using Self Organizing Map Protocol which is another type of ANN.

    This ANN is tasked to generate the undisclosed investment pattern of various investing bodies

    and provide them with recommendations on possible investments listed on the website which

    fit into their existing patterns .

    Stage 2 :

    A hybrid approach towards Recommendation system creation : combining collaborative

    filtering and content-based filtering applied on the Patterns derived by the ANN.

    Hence our approach uses a neural network to recognize implicit patterns between investor &

    investee profiles and items of interest which are then further enhanced by Hybrid/collaborative

    filtering to personalized suggestions.

    In this White paper we will explain our Approach towards creating an intelligent online

    platform in details , based on :

    Artificial Neural Networks

    Hybrid Recommendation systems

  • The Architectural Framework of the Platform is Shown Below :

    Hybrid Methodology consists of One, collaborative filtering methods , is based on the hypothesis that similar users will demonstrate similar online behavior, and, therefore, what one is interested in will most probably be of interest to a similar user. The similarity of users is based upon user profiles. The other category of methods, content based takes into account the similarity of items, rather than users, in order to propose to the user a closest match. Each set of methods has its own advantages and disadvantages, or to put it differently, provides

    better results under different circumstances. Thats why combinations of methods have started

    appearing in the relevant literature

    There are a few different ways of interconnecting neural networks to a personalization method

    or a set of them. In this paper, a neural network is used to recognize implicit patterns between

    user profiles and items of interest, which are then further enhanced by Hybrid filtering to

    personalized suggestions. Our preliminary study indicates that this hybrid approach is

  • particularly promising when compared to pure content-based or collaborative filtering

    methods.

    Public Announcements will add data to the user profiles especially the ' Fund Providers '

    Profiles .These Profiles will get Upgraded by addition of extra fields in the Non Visible ie Non

    public Profile of these fund providers .

    There Will also be a host of Separate Special Algorithm which will Map the requirement of the

    Fund Providers existing portfolio firms and see if any of the fund seekers can increase their

    Turnover or improve their client offerings of these portfolio firms making them more susceptible

    to invest in the Fund seekers company .

    The Neural Network : 1A

    In General :

    Supervised learning

    In supervised learning, we are given a set of example pairs and the

    aim is to find a function in the allowed class of functions that matches the examples. In other words, we wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain.

    A commonly used cost is the mean-squared error, which tries to minimize the average squared error between the network's output, f(x), and the target value y over all the example pairs. When one tries to minimize this cost using gradient descent for the class of neural networks called multilayer perceptrons, one obtains the common and well-known backpropagation algorithm for training neural networks.

  • Tasks that fall within the paradigm of supervised learning are pattern recognition (also known as classification) and regression (also known as function approximation). The supervised learning paradigm is also applicable to sequential data (e.g., for speech and gesture recognition). This can be thought of as learning with a "teacher," in the form of a function that provides continuous feedback on the quality of solutions obtained thus far.

    1A:A multilayer feed-forward architecture is adopted, with various numbers of hidden layers

    and distributions of units among layers. The connection from unit m to unit n is characterized

    by a real-number weight wmn with initial value positioned at random in the range [1, 1].

    When a pattern is impressed on the input interface, the activities of the input units propagate

    through the entire network. Each unit in a hidden layer or in the output layer receives a

    stimulus

    where the am are the activities of the units in the immediately preceding layer. The activity of

    generic unit m in the hidden or output layers is in general a nonlinear function of its stimulus,

    a m =g(u m).

    In our work, the unit activation functions g(u) are selected between the logistic (sigmoid), hyperbolic tangent and linear forms. The system response may be decoded from the activities of the units of the output layer while the dynamics is particularly simple: the states of all units within a given layer are updated successively, proceeding from input to output. Several training algorithms exist that seek to minimize the cost function with respect to the

    network weights. For the cost function we make the traditional choice of the sum of squared

    errors calculated over the learning set, or more specifically

    where t and o denote, respectively, the target and actual activities of unit i of the output layer

    for input pattern (or example) . The most familiar training algorithm is standard back-

    propagation (hereafter often denoted SB), according to which the weight update rule to be

    implemented upon presentation of pattern is

  • where is the learning rate, is the momentum parameter, and 1 is the pattern impressed

    on the input interface one training step earlier. The second term on the right-hand side, called

    the momentum term, serves to damp out the wild oscillations in weight space that might

    otherwise occur during the gradient-descent minimization process that underlies the back-

    propagation algorithm. Our artificial neural networks are trained with a modified version of the

    SB algorithm [1] that we have found empirically to be advantageous in the majority of

    problems. In this algorithm, denoted MB, the weight update prescription corresponding to Eq

    read

    the momentum term being modified through the quantity

    In the latter expression, e is the number of the current epoch, with e = 0,1,2,3,...The

    replacement of by in the update rule for the generic weight allows earlier patterns of the

    current epoch to have more influence on the training than is the case for standard back-

    propagation. By the time e becomes large, is effectively zero. It can be shown, after rather

    lengthy algebra, that if a plateau region of the cost surface has been reached (i.e.,

    )1(mnw)1(mnSmnw)1(mnSmnwEremains almost constant) and e is relatively large,

    then Eq. converges to

  • 1B: A self-organizing map (SOM) is a type of ANN that is trained using unsupervised Learning

    to produce a low-dimensional (typically two-dimensional), discretized representation of the

    input space of the training samples, called a map.

    It means that you don't need to explicitly tell the SOM about what to learn in the input data. It

    automatically learns the patterns in input data and organizes the data into different groups.

    This is important to fish out hidden patterns in the data and the actions being conducted by

    the Investors .

    Self-organizing maps are different from other artificial neural networks in the sense that they

    use a neighborhood function to preserve the topological properties of the input space.

    Like most artificial neural networks, SOMs operate in two modes: training and mapping. "Training" builds the map using input examples (a competitive process, also called vector quantization), while "mapping" automatically classifies a new input vector.

    A self-organizing map consists of components called nodes or neurons. Associated with each node are a weight vector of the same dimension as the input data vectors, and a position in the map space. The usual arrangement of nodes is a two-dimensional regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher-dimensional input space to a lower-dimensional map space. The procedure for placing a vector from data space onto the map is to find the node with the closest (smallest distance metric) weight vector to the data space vector.

    It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.

    It is also common to use the U-Matrix. The U-Matrix value of a particular node is the average distance between the node's weight vector and that of its closest neighbors. In a square grid, for instance, we might consider the closest 4 or 8 nodes (the Von Neumann and Moore neighborhoods, respectively), or six nodes in a hexagonal grid.

    The goal of learning in the self-organizing map is to cause different parts of the network to

    respond similarly to certain input patterns. This is partly motivated by how visual, auditory or

    other sensory information is handled in separate parts of the cerebral cortex in the human

    brain.

    The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest principal component eigenvectors. With the latter alternative, learning is much faster because the initial weights already give a good approximation of SOM weights.

  • The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times as iterations.

    The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance (within the lattice) from the BMU. The update formula for a neuron v with weight vector Wv(s) is

    Wv(s + 1) = Wv(s) + (u, v, s) (s)(D(t) - Wv(s)),

    where s is the step index, t an index into the training sample, u is the index of the BMU for D(t), (s) is a monotonically decreasing learning coefficient and D(t) is the input vector; (u, v, s) is the neighborhood function which gives the distance between the neuron u and the neuron v in step s. Depending on the implementations, t can scan the training data set systematically (t is 0, 1, 2...T-1, then repeat, T being the training sample's size), be randomly drawn from the or implement some other sampling method .

    The neighborhood function (u, v, s) depends on the lattice distance between the BMU (neuron u) and neuron v. In the simplest form it is 1 for all neurons close enough to BMU and 0 for others, but aGaussian function is a common choice, too. Regardless of the functional form, the neighborhood function shrinks with time. At the beginning when the neighborhood is broad, the self-organizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons, the weights are converging to local estimates. In some implementations the learning coefficient and the neighborhood function decrease steadily with increasing s, in others (in particular those where t scans the training data set) they decrease in step-wise fashion, once every T steps.

    This process is repeated for each input vector for a (usually large) number of cycles . The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net.

    During mapping, there will be one single winning neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector.

    While representing input data as vectors has been emphasized in this article, it should be noted that any kind of object which can be represented digitally, which has an appropriate distance measure associated with it, and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps.

  • Variables

    These are the variables needed, with vectors in bold,

    is the current iteration is the iteration limit is the index of the target input data vector in the input data set

    is a target input data vector is the index of the node in the map

    is the current weight vector of node v is the index of the best matching unit (BMU) in the map

    is a restraint due to distance from BMU, usually called the neighborhood function, and

    is a learning restraint due to iteration progress.

    Algorithm Parameters

    1. Randomize the map's nodes' weight vectors

    2. Grab an input vector 3. Traverse each node in the map

    1. Use the Euclidean distance formula to find the similarity between the input vector and the map's node's weight vector

    2. Track the node that produces the smallest distance (this node is the best matching unit, BMU)

    4. Update the nodes in the neighborhood of the BMU (including the BMU itself) by pulling them closer to the input vector

    1. Wv(s + 1) = Wv(s) + (u, v, s) (s)(D(t) - Wv(s)) 5. Increase s and repeat from step 2 while

    A variant algorithm:

    1. Randomize the map's nodes' weight vectors 2. Traverse each input vector in the input data set

    1. Traverse each node in the map 1. Use the Euclidean distance formula to find the similarity between the

    input vector and the map's node's weight vector 2. Track the node that produces the smallest distance (this node is the best

    matching unit, BMU) 2. Update the nodes in the neighborhood of the BMU (including the BMU itself) by

    pulling them closer to the input vector

  • 1. Wv(s + 1) = Wv(s) + (u, v, s) (s)(D(t) - Wv(s)) 3. Increase s and repeat from step 2 while

    Neural Network Training Data Set Deployed on our Relational Database

    Each fund seeking profile is a multi point data set and the responses to each profile is also as a

    part of the newly created Extended Data set .

    We record The reasons given by fund seekers for refusing a project.

    We will also record the projects clicked by our investors and service providers which are

    Prompted to them by the Recommender system and as the number of clicks rises the ANN will

    start reducing the error rate exponentially .

    Recommender systems or recommendation systems (sometimes replacing "system" with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that user would give to an item.

  • Recommender systems have become extremely common in recent years, and are applied in a variety of applications. The most popular ones are probably movies, music, news, books, research articles, search queries, social tags, and products in general. However, there are also recommender systems for experts, jokes, restaurants, financial services, life insurance, persons (online dating) etc .

    Recommender systems typically produce a list of recommendations in one of two ways -

    through collaborative or content-based filtering. Collaborative filteringapproaches building a

    model from a user's past behavior (items previously purchased or selected and/or numerical

    ratings given to those items) as well as similar decisions made by other users; then use that

    model to predict items (or ratings for items) that the user may have an interest in. Content-

    based filtering approaches utilize a series of discrete characteristics of an item in order to

    recommend additional items with similar properties. These approaches are often combined

    (seeHybrid Recommender Systems).

    Hybrid Recommender Systems

    Recent research has demonstrated that a hybrid approach, combining collaborative filtering and content-based filtering could be more effective in some cases. Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach (and vice versa); or by unifying the approaches into one model .Several studies empirically compare the performance of the hybrid with the pure collaborative and content-based methods and demonstrate that the hybrid methods can provide more accurate recommendations than pure approaches. These methods can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem.

    Netflix & Amazon is a good example of hybrid systems. They make recommendations by comparing the watching and searching habits of similar users (i.e. collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering).

    A variety of techniques have been proposed as the basis for recommender systems: collaborative, content-based, knowledge-based, and demographic techniques. Each of these techniques has known shortcomings, such as the well known cold-start problem for collaborative and content-based systems (what to do with new users with few ratings) and the knowledge engineering bottleneck in knowledge-based approaches. A hybrid recommender system is one that combines multiple techniques together to achieve some synergy between them.

    Collaborative: The system generates recommendations using only information about rating profiles for different users. Collaborative systems locate peer users with a rating

  • history similar to the current user and generate recommendations using this neighborhood.

    Content-based: The system generates recommendations from two sources: the features associated with products and the ratings that a user has given them. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user's likes and dislikes based on product features.

    Demographic: A demographic recommender provides recommendations based on a demographic profile of the user. Recommended products can be produced for different demographic niches, by combining the ratings of users in those niches.

    Knowledge-based: A knowledge-based recommender suggests products based on inferences about a users needs and preferences. This knowledge will sometimes contain explicit functional knowledge about how certain product features meet user needs.

    The term hybrid recommender system is used here to describe any recommender system that combines multiple recommendation techniques together to produce its output. There is no reason why several different techniques of the same type could not be hybridized, for example, two different content-based recommenders could work together, and a number of projects have investigated this type of hybrid: NewsDude, which uses both naive Bayes and kNN classifiers in its news recommendations is just one example..

    Seven hybridization techniques:

    Weighted: The score of different recommendation components are combined numerically.

    Switching: The system chooses among recommendation components and applies the selected one.

    Mixed: Recommendations from different recommenders are presented together. Feature Combination: Features derived from different knowledge sources are combined

    together and given to a single recommendation algorithm. Feature Augmentation: One recommendation technique is used to compute a feature or

    set of features, which is then part of the input to the next technique. Cascade: Recommenders are given strict priority, with the lower priority ones breaking

    ties in the scoring of the higher ones. Meta-level: One recommendation technique is applied and produces some sort of

    model, which is then the input used by the next technique.

    Our 2nd phase : Implementing a Hybrid Recommender system post the data/pattern crunch by

    an ANN.

    Both content-based filtering and collaborative filtering have there strengths and weaknesses.

    Three specific problems can be distinguished for content-based filtering:

  • o Content description. In some domains generating a useful description of the content can

    be very difficult. In domains where the items consist of music or video for example a

    representation of the content is not always possible with todays technology.

    o Over-specialization. A content-based filtering system will not select items if the previous

    user behavior does not provide evidence for this. Additional techniques have to be

    added to give the system the capability to make suggestion outside the scope of what

    the user has already shown interest in.

    o Subjective domain problem. Content-based filtering techniques have difficulty in

    distinguishing between subjective information such as points of views and humor.

    A collaborative filtering system doesnt have these shortcomings. Because there is no need for

    a description of the items being recommended, the system can deal with any kind of

    information. Furthermore, the system is able to recommend items to the user which may have

    a very different content from what the user has indicated to be interested in before. Finally,

    because recommendations are based on the opinions of others it is well suited for subjective

    domains like art. However, collaborative filtering does introduce certain problems of its own:

    o Early rater problem. Collaborative filtering systems cannot provide recommendations

    for new items since there are no user ratings on which to base a prediction. Even if

    users start rating the item it will take some time before the item has received enough

    ratings in order to make accurate recommendations. Similarly, recommendations will

    also be inaccurate for new users who have rated few items.

    o Sparsity problem. In many information domains the existing number of items exceeds

    the amount a person is able (and willing) to explore by far. This makes it hard to find

    items that are rated by enough people on which to base predictions.

    o Gray sheep. Groups of users are needed with overlapping characteristics. Even if such

    groups exist, individuals who do not consistently agree or disagree with any group of

    people will receive inaccurate recommendations.

    A system that combines content-based filtering and collaborative filtering could take advantage

    from both the representation of the content as well as the similarities among users. Although

    there are several ways in which to combine the two techniques a distinction can be made

    between two basis approaches. A hybrid approach combines the two types of information

  • while it is also possible to use the recommendations of the two filtering techniques

    independently.

    Collaborative filtering looks for the correlation between user ratings to make predictions. Such

    correlation is most meaningful when users have many rated items in common. As stated earlier,

    in large domains with many items this is not always the case. Furthermore, the lack of access to

    the content of the items prevent similar users from being matched unless they have rated the

    exact same item. For example, if one fund manager liked an' E -commerce Project ' and another

    liked an 'Intranet based E commerce Portal ' they would not necessarily be matched together. A

    hybrid approach called collaboration via content deals with these issues by incorporating both

    the information used by content-based filtering and by collaborative filtering.

    In collaboration via content both the rated items and the content of the items are used to

    construct a user profile. The selection of terms which describe the content of the items is done

    using content-based techniques. The weight of terms indicate how important they are to the

    user. In the table below an example is shown of what kind of information is available to make a

    prediction about the Project Delta for ' Venture Cap 2 ' with collaboration via content. Five

    terms are shown which describe the sort of Projects a user is interested in.

    Five terms and Project ratings :

    Break Even

    Timeline

    Domain Specific

    Asset Based

    Technology based

    Scalability Delta

    Bank1 1 0 1.2 0.2 0.2 -

    NBFC 3 2.1 0 0.5 3 2.2 +

    PE1 1.3 1.5 0.2 3.2 1.9 +

    VC1 1.1 2 2.8 0.8 0 -

    Venture Cap 2 0.8 1.1 0 2 1.2

    ?

  • Just as with collaborative filtering, the Pearson correlation coefficient can be used to compute

    the correlation between users. Instead of determining the correlation with user ratings

    however, term weights are used. Because this method has a greater number of items from

    which to determine similarity than collaborative filtering the problem of users not having

    enough rated items in common is not an issue anymore. Furthermore, unlike content-based

    filtering, predictions are based on the impressions of other users which could lead to

    recommendations outside the normal environment of a user. However to make

    recommendations about items it is still necessary that there are enough users who have rated

    the item. Just as with collaborative filtering new items can not be recommended as long as

    there arent any user who have rated the new item.

    Another approach to combining collaborative and content-based filtering is to make predictions

    based on a weighted average of the content-based recommendation and the collaborative

    recommendation. The rank of each item being recommended could be a measure for the

    weight. In this way the highest recommendation receives the highest weights.

    Let us now integrate the Collaborative module with the Training set of ANN :

    Collaborative Filtering We implemented a pure collaborative filtering component that uses a neighborhood-based algorithm . In neighborhood-based algorithms, a subset of users are chosen based on their similarity to the active user, and a weighted combination of their ratings is used to produce predictions for the active user. The algorithm we use can be summarized in the following steps: 1. Weight all users with respect to similarity with the active user. Similarity between users is their ratings vectors. 2. Select n users that have the highest similarity with the active user. These users form the

    neighborhood. 3. Compute a prediction from a weighted combination of the selected neighbors ratings. In

    step 1, similarity between two users is computed using the Pearson correlation coefficient, defined below:

    where r a i is the rating given to item i by user a; is the mean rating given by user a; and m is the total number of items. iar,ar

  • In step 3, predictions are computed as the weighted average of deviations from the neighbors

    mean:

    where p a i is the prediction for the active user a for item i; is the similarity between users a and

    u; and n is the number of users in the neighborhood. It is common for the active user to have

    highly correlated neighbors that are based on very few co-rated (overlapping) items. These

    neighbors based on a small number of overlapping items tend to be bad predictors. To devalue

    the correlations based on few co-rated items, we multiply the correlation by a Significance

    Weighting factor . If two users have less than 50 co-rated items we multiply their correlation by

    a factor iap,uaP,50/,nsgua=, where n is the number of co-rated items. If the number of

    overlapping items is greater than 50 then we leave the correlation unchanged i.e.

    Connecting Neural Network and Collaborative Filtering Algorithms In the proposed hybrid combination of neural network and collaborative filtering algorithms,

    we first create a pseudo user-ratings vector for every user u in the database. The pseudo user-

    rating vector, vu, consists of the item ratings provided by the user u, where available, and those

    predicted by the neural network predictor algorithm otherwise.

    In the above equation denotes the actual rating provided by user u for item i, while is the rating predicted by the neural network system. iur,iuc, The pseudo user-ratings vectors of all users put together give the dense pseudo ratings matrix

    V. We now perform collaborative filtering using this dense matrix. The similarity between the

    active user a and another user u is computed using the Pearson correlation coefficient

    described in Eq. 6. Instead of the original user votes, we substitute the votes provided by the

    pseudo user-ratings vectors va and vu.

    We aimed to bring forward the need to combine collaborative filtering techniques for

    personalization with neural networks, that possess the ability to learn / adapt.

    Conclusion : More data the the ANN crunchers the smaller the Recommendation error &

    better our AI system becomes .