Miso, Meltano and Setting Up Real-Time, Continuous Personalization
As a real-time personalization data platform (PDP), Miso relies solely on two key first-party data inputs to generate real-time, continuous, dynamic, in-session personalization. The first data input are your user clickstream logs, which reflect the key actions and events each individual user is driving on your site or app, and which should be provided in real-time. The second data input is your product catalog, which ideally are being updated in real-time as well, especially if youβre a marketplace or user-generated content site, or at least updated continuously on a regular cron job schedule.
As a team who built personalized platforms for a living before Miso, weβre very sympathetic to the lift this entails. And not only have we built out a range of integrations to streamline getting started, we make sure weβre available 24/7 for our partners, over Slack and Zoom, to help debug and support each of them on best practices every step of the way.
Since joining the Miso Shiba Squad a year ago, as a DevX Engineer, Iβve gotten the opportunity to talk to many teams across different domains and, of course, infrastructure variation about their process of integrating Miso into their tech stack. At its core, Miso is an API-first platform, which by its nature allows for a lot of implementation flexibility and creative data integration solutions. One interesting implementation pattern that Iβve seen teams use repeatedly is using a combination of Misoβs client-side SDK for streaming in their real-time clickstream logs and a reverse-ETL (or ELT) tool like Meltano to enable near real-time catalog updates, plus the first historical data bulk uploads from their clickstream warehouse.
Weβve talked about using the Misoβs client-side SDK in a previous article, so now letβs take a closer look at Meltano.
Meltano is an open-source ELT platform whose development is sponsored by GitLab (another amazing open-source DevOps platform that we use here at Miso.ai). The folks at Meltano believe in the commoditization of data integration, which is evident in the hundreds of data plug-ins that are supported by their open-source community. Miso participates in the Meltano ecosystem via target-miso, a singer-based data loader that can push data to Misoβs personalization platform.
Weβre of course not suggesting that you have to use Meltano to be able to integrate with Miso β but it is a great open-source solution that plays well with Miso. There are other, similar products in this space like Hightouch, Meroxa, and Apache Airflow and stay tuned for more posts on those as well. Our engineering team is constantly building out new toolings for the wider ecosystem of databases, data warehouses, and ETL/ELT systems out there.
Miso Meets Meltano
Misoβs search and recommendation engines are trained and tuned on your product catalog and user clickstream data. This allows our predictive data models to perform multivariate, embedding-based clustering (which maps users and products in the same vector space) to output a ranked list of products that are most likely aligned with the behavioral intent of a given user.
While the initial, historical data dump to kick off the engine training is pretty trivial, orchestrating a productionizable data pipeline for future product updates can be tricky, especially if youβre the one tasked with maintaining a data extractor.
This is where Meltano can fit right into your data stack. At the most basic level, you choose a data source (or tap) and a destination (target-miso) and Meltano, with the help of Singer β an open-source data exchange format β that will take care of the software glue in between. Hereβs a good introduction to Singer.io
An important caveat is that Meltano is not real-time. It can be scheduled to refresh as often as youβd like, even on a near real-time basis, but thereβs no way to natively set up Meltano to trigger as soon as data is generated. Thatβs why we recommend only using Meltano for sending catalog updates (as near real-time updates are sufficient) or optional user demographic data or preference data (for instance, if you have quizlets in your app or purchase flow).
Letβs look at a concrete example. Suppose your product catalog lives in a Postgres database. Your merchandising and (physical) product team adds new SKUs every week and you want to make sure that Misoβs engines are receiving those updates and recommending new products to customers (fun fact: we excel at this type of product cold-start problem). There should be a pipeline in place to reliably extract the data from postgres and load it into Miso. You could probably build it from scratch, especially given our API-first approach and the suite of Data APIs we offer, but thereβs value to not reinventing the wheel when a vetted open-source solution in Meltano is available.
BTW, what the heck is Singer?
Meltano is built on top of an open-source framework called Singer.
Some of the best things in life are singers. From legendary sewing machines that never die, to resto-modded Porsche 911s, to The Masked Singer (depends on the singer), itβs no surprise that Singer.io is the open-source standard for writing scripts that move data.
Singer defines two types of scripts: extractors (which are called taps) and loaders (which are called targets). Taps and targets are the things that do the moving. As an example, using simple CLI commands like βtap-salesforce | target-csvβ moves data from salesforce into a csv file. The best part is that Singer is open-source and continuously getting updates from the community. Hereβs a look at some of the taps and targets that Singer (and by extension Meltano) currently supports.
The thing that Singer purports to do β the moving data thing β it does really well. But itβs barebones otherwise and missing the testing, orchestration, and deployment features that organizations need before they can comfortably productionize a data pipeline. Thatβs where Meltano comes in.
For more information on Singer, hereβs a useful getting started guide.
Setting up a data pipeline using Meltano
Installing the Meltano framework
The official Meltano guide does a great job of guiding you through the framework installation process. One requirement is that you will need to have Python 3 installed. The example Iβll be providing is based on UNIX commands but the official guide also has a Windows-specific implementation.
First, letβs create a virtual python environment, activate it, and install the Meltano package.
Initializing a Meltano project
Next, weβll create and navigate to a new directory that will hold our Meltano project and initialize it. Weβll call the project βpostgres-demoβ. This will create three environments: dev, test, and prod, as well as a meltano.yml file that will hold the details of our tap and target.
Choosing an extractor and loader
As I mentioned earlier, our example is moving data from a postgres database to Miso. Therefore, our extractor is postgres and our loader is Miso. We need to add in these details inside the meltano.yml file that was automatically created when we initialized the meltano project in the previous step.
Meltano provides a great overview of all the intricacies and nuances of configuring the meltano.yml file so we wonβt get into the details here, but at the most basic level, it should look something like this:
version:1
default_environment: dev
project_id: d347016d-0d0d-4110-a0ad-cf7e516f76a2
plugins:
extractors:
- name: tap-postgres
executable: tap-postgres
variant: transferwise
pip_url: pipelinewise-tap-postgres
config:
host: $PG_HOST
port: $PG_PORT
user: $PG_USER
password: $PG_PASSWORD
dbname: $PG_DBNAME
select:
- public-users.*
metadata:
public-users:
replication-method: INCREMENTAL
replication-key: create_time
loaders:
- name: target-miso
namespace: target_miso
pip_url: target-miso
executable: target-miso
config:
template_folder: template
api_server: $MISO_API_SERVER
api_key: $MISO_API_KEY
use_async: false
environments:
- name: dev
Note that the connection settings should be stored in a separate environment file.
Whoopsβ¦we donβt actually have any data yet.
Letβs fix that by creating some dummy user data inside a postgres database. If you want to follow along, Iβm using this SQL file and this shell script to create 100 random users, along with some demographic information.
Note: Not to distract from this walkthroughβ¦but demographic data isnβt a key thing to ever send Miso. In fact, we ask teams never to send us any PII. And usually if demographic data is sent to Miso, itβs hashed. But I digress.
Mapping the data using a template
So now that we have user data in a SQL table, we are really close to sending it over to Miso. The final preparatory step is to define a template that maps the SQL schema to a JSON structure that Miso can accept as an API payload. To do this, create a directory called βtemplateβ in your project folder and create a .jinja file inside that looks like this:
{
"user_id": "{{ data.id }}",
"city": "{{ data.city }}",
"state": "{{ data.state }}",
"country": "{{ data.country }}",
"name": "{{ data.first_name }} {{ data.last_name }}",
"custom_attributes": {
"username": "{{ data.username }}",
"email": "{{ data.email if data.email }}",
"date_of_birth": "{{ data.date_of_birth | datetime_format }}",
"create_time": "{{ data.create_time | datetime_format }}"
}
}
With this template, we are telling Meltano which fields in the users table map to which JSON fields in the payload. This should be the same structure as our Users API.
Letβs see if it works
Using the command βmeltano run tap-postgres target-misoβ we sent the data from our local Postgres database to Miso. Letβs check our data in Dojo:
Looks like all 100 users made it over! Small note that while this example used a mock user dataset, Miso doesnβt actually require separate user data at all. User profiles are optional at this point. We instead rely on interaction and product catalog data, both of which are just as easy to send using Meltano. The only thing that would need to be changed is the template file.
And although we didnβt get into the specifics in this post, Meltano has a ton of cool features to check out, like orchestration, containerization, and discovery mode.
Deep Thoughts with Taha
If open source is more your style, Meltano is a great option for an ELT pipeline solution. Admittedly, this integration does take a fair bit of configuration upfront, especially since you need to map the schema from your source to a JSON destination format that Miso can ingest. On the flip side, the manual configuration means that there is a ton of configuration flexibility and options to explore. If youβre not super comfortable with working via the CLI, thereβs also a browser-based Meltano UI available to use, though itβs not as comprehensive. As I mentioned earlier, Meltano on its own for your catalog and clickstream events will get you to static, 1:1 personalization, but to really get to the next level and have real-time personalization thatβs optimized for coldstart visitors and in-session personalizations for all users, really, youβll need to also pipeline real-time clickstream data. A great option for doing that is using our client-side SDK for JavaScript.
Before I close out, Iβd like to give a shoutout to Lucky Gunasekara and Lan Le for their editing expertise and Stian Johannessen for his design work. If thereβs an integration (or any other topic, really) that youβre curious about and want us to unpack, drop us a line on our contact form or at hello@askmiso.com. Weβd love to hear about what youβre working on and how we can help!