Big Data Science Methods

Big Data Science Methods

Management Summary

In this series I look at data science as it is understood at e-dialog and the applications we develop for our customers. In order to understand the usefulness of data science, you also need to take a look at the most common methods used. And there are many of them. What all of these methods have in common is that they scale very well and can therefore also be used on extremely large datasets. The most common methods and their applications are outlined here. Of course, this description can only scratch the surface - there is much more.

Clustering & Classification

Clustering and classification are two related methods that aim to identify groups in data. One application example is the identification of customer segments from online behavior. TheClusteringThe method is completely data-driven: the data is fed into a suitable algorithm and as a result groups are formed autonomously with properties that are as homogeneous as possible. If there is already a group allocation for some customers, this can be done usingClassification algorithmsthe underlying structure can be identified. These insights can then be applied to customers who are not yet classified to assign them to the most appropriate group possible. Group membership can be either:hard fact(yes/no) orfuzzy(probability 0-100%).

Visualization of a fuzzy clustering in two groups, the strength of the color indicates the probability of cluster membership.

The prerequisite for both methods is a correspondingly comprehensive database, both in terms of observations and variables. Conceivable data sources include customer journeys, shopping cart contents & -volumes as well as campaign reports.

Scenario simulation, portfolio optimization

Visualization of the results of a portfolio optimization of search keywords in the dimensions of sales achieved and costs incurred. Each keyword has a unique position in space. Keywords that cost more than they bring can be identified and specifically excluded. An angle bisector is fitted as a guide.

Our customers spend large amounts of money on advertising campaigns. These resources should be used optimally. Through targeted analysis of conversions, costs and conversion revenuesper keywordandper publisherThe foundation is laid for scenario simulations and portfolio optimization. Combined with interactive tools, customers can then try out for themselves what effects it would have if I forego this or that keyword. What happens if I book more impressions with Publisher X?

Shopping Cart Analyzes & Association Rules (ARs)

Association Rules were developed to analyze shopping baskets in supermarkets. The central question is which products are sold together. Special offers (only one product from a combination) and thematic advertising content (as many products as possible from frequent combinations, example: everything for the grill) are then derived from this.
These analyzes can be carried out either in total or grouped according to customer segments, seasonality or product groups. Floodlight reports from online shops, for example, which contain all purchased products, can be used as a database. This analysis is also possible with data obtained from store purchases – the prerequisite here is appropriately networked POS devices. It becomes particularly interesting when the individual transactions can be assigned to customers, as is possible online but also with loyalty cards. In any case, the prerequisite is a large number of transactions, with the lower limit probably being several tens of thousands.

Predictive Analytics & Recommendation Engines

Based on the findings from the shopping cart analysis and customer data, cross-selling and up-selling potential can be used here. Examples include Amazon (customers who bought X were also interested in Y) and Netflix’s film recommendations. Here, customer information is sent into an algorithm, which then returns,how likely the customer is to be interested in which other productsis.
In any case, a prerequisite here is the shopping basket analysis as described above. In order to be able to make personalized recommendations, a customer account or at least the address (gender, address, possibly bank details) is also necessary. Automatic integration with creditworthiness inquiries is also possible here in order to take certain payment conditions into account during the shopping process.

Dynamic pricing

Dynamic pricing refers to the idea of ​​adjusting prices in such a way that the optimal conversion and revenue is achieved. In other words: before I can’t complete a business transaction, I reduce the price (without falling below my contribution margin). Accordingly, customers who have an affinity for premium products may be shown a higher price than customers who are bargain hunting. The basis for this procedure is a description of the customer that is as accurate as possible. The device used is an example: Customers who visit the shop with the latest iPhone model are assumed to be prepared to dig deeper into their pockets for the right performance. In contrast, customers who come in with an older feature phone will only be able to convince them to sign up with discounts. Whether and how well this hypothesis holds needs to be determined in systematic tests. This illustrates the importance of science in data science.

Social Media Intelligence & Natural Language Processing (NLP)

Following the brand approach in marketing, the presentation of their brand is of great importance for companies. However, through social media they only have limited influence on it. This makes it all the more important to recognize when your own brand is being portrayed poorly. However, the large number of social media channels and online forums only makes it possible to monitor them manually to a limited extent. However, thanks to the great advances in natural language processing, especially sentiment analysis, it is possible to analyze individual texts, such as forum or social media posts,emotionsandSentiments(positive/negative). With an appropriate number of posts, the central topics of the discussions can also be identified. For example, it then becomes clear that features X and Y on cell phone Z are the ones that generate the most positive reports. The advertising can be adapted accordingly.

A word cloud based on the terms in this post, created automatically taking into account the peculiarities of the German language such as stop words, cases and capitalization.

The data requirements can only be roughly defined and depend heavily on the question. In the maximum version, access to the corresponding social media streams is required – although this is associated with high costs. If only a few forums or company assets are to be monitored, such solutions are significantly more cost-effective to implement.

Predictive analytics in a social media context

A representation of the interactions of actors in a social network, where the thickness of the edges indicates the frequency of interactions.

Related to social media intelligence in the brand context, social media data can of course also be usedupcoming trendsanalyze: Which DVD releases are already eagerly awaited? What will be the winter trend of the coming season? These questions can be answered through the targeted analysis of social media activities. It is important to pay appropriate attention hereOpinion Leadersto lay. This also keeps costs down because you don’t have to subscribe to the entire stream.

Price Monitoring

A long-running issue in the retail sector is the desire to automatically know the competition’s prices in order to be able to adapt one’s own pricing strategy accordingly. While this isn’t really technically challenging, ask yourselflegal questions. Many websites specifically exclude automated or commercial use in their terms of use. Here it is important to find appropriate alternatives. In this respect, implementation is less of a technical challenge and more of a challengeresearch-intensive.

Linking advertising channels & time

A central question of advertising isInteraction effectsof advertising channels. The analysis of these channels can be expanded to include TV and radio with the appropriate technology. The focus is on the simultaneity of spot broadcasting and online activity. In addition to the classic channel analyses, the precise media plan is required as a database. In addition, large amounts of data must be handled efficiently; appropriate solutions such as Google BigQuery are essential.

Geostatistics

Geostatistical models can be used, on the one hand, to answer questions with a regional reference point (What is the catchment area of ​​my branches? What effect do campaigns with regional targeting have?) and, on the other hand, to include the advertising channel of poster advertising in the channel analysis. The customer’s movement profiles are used, which of course require a corresponding app and informed consent.

Big Data (Management)

Many companies still have problems dealing with large amounts of data. However, if you consider that daily keyword reports from DoubleClick Search already cost 30 MB per day, these reports over the course of the year (10 GB) already represent amounts of data that most companies can no longer handle internally. As a way out, aggregated reports are often used, but these often do not reveal patterns and thus lead to wrong decisions. This is where we need to start and give the individual departments tools with which they can use large amounts of data to gain insights directly in the browser.
Interactive ones are conceivableDashboardsthat run on corresponding Google services. Of course, more precise cost estimates are only possible if there are specific use cases.

Summary

In this second part of the data science series, I examined the question of which methods actually characterize data science. From this, common applications from the areas of web analysis, online marketing and e-commerce were derived and how e-dialog uses them for their customers.

If you’ve gotten a taste for it and want to get more out of your data, we’ll be happy to advise and support you in every step of your data science processes. Contact us atkontakt@e-dialog.group

e-dialog office Vienna
Relevant content

More about Analytics