Majestic

  • Site Explorer
    • Majestic
    • Sintesi
    • Domini di rif.
    • Backlink
    • * Nuovi
    • * Persi
    • Contesto
    • Anchor Text
    • Pagine
    • Categorie
    • Link Graph
    • Siti correlati
    • Strumenti avanzati
    • Author ExplorerBeta
    • Summary
    • Similar Profiles
    • Profile Backlinks
    • Attributions
  • Confronta
    • Sintesi
    • Storico dei Backlink
    • Cronologia Flow Metric
    • Categorie
    • Clique Hunter
  • Strumenti Link
    • Il mio Majestic
    • Attività recente
    • Report
    • Campagne
    • Domini verificati
    • OpenApps
    • Chiavi API
    • Parole chiave
    • Keyword Generator
    • Verifica Keyword
    • Search Explorer
    • Strumenti Link
    • Bulk Backlink
    • Verifica siti limitrofi
    • Invia URL
    • Sperimentale
    • Fusione di indici
    • Link Profile Confronto
    • Link comuni
    • Solo Links
    • Rapporto PDF
    • Typo Domain
  • Free SEO Tools
    • Iniziamo
    • Backlink Checker
    • Majestic Million
    • Plugin Majestic
    • Google Sheets
    • Post Popularity
    • Social Explorer
  • Supporto
    • Blog Link esterno
    • Supporto
    • Iniziamo
    • Strumenti
    • Subscriptions & Billing
    • Domande frequenti
    • Glossario
    • Guida di stile
    • Video dimostrativi
    • Guida di riferimento API Link esterno
    • Contatti
    • About Backlinks and SEO
    • SEO in 2025
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Guide Link Building
  • Registrarsi GRATUITAMENTE
  • Piani e prezzi
  • Accedi
  • Language flag icon
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文
  • Iniziamo
  • Accedi
  • Piani e prezzi
  • Registrarsi GRATUITAMENTE
    • Sintesi
    • Domini di rif.
    • Mappa
    • Backlink
    • Nuovi
    • Persi
    • Contesto
    • Anchor Text
    • Pagine
    • Categorie
    • Link Graph
    • Siti correlati
    • Strumenti avanzati
    • Sintesi
      Pro
    • Storico dei Backlink
      Pro
    • Cronologia Flow Metric
      Pro
    • Categorie
      Pro
    • Clique Hunter
      Pro
  • Bulk Backlink
    • Keyword Generator
    • Verifica Keyword
    • Search Explorer
      Pro
  • Verifica siti limitrofi
    Pro
    • Fusione di indici
      Pro
    • Link Profile Confronto
      Pro
    • Link comuni
      Pro
    • Solo Links
      Pro
    • Rapporto PDF
      Pro
    • Typo Domain
      Pro
  • Invia URL
    • Summary
      Pro
    • Similar Profiles
      Pro
    • Profile Backlinks
      Pro
    • Attributions
      Pro
  • Report personalizzati
    Pro
    • Iniziamo
    • Backlink Checker
    • Majestic Million
    • Plugin Majestic
    • Google Sheets
    • Post Popularity
    • Social Explorer
    • Iniziamo
    • Strumenti
    • Subscriptions & Billing
    • Domande frequenti
    • Glossario
    • Video dimostrativi
    • Guida di riferimento API Link esterno
    • Contatti
    • Messaggi
    • The Company
    • Guida di stile
    • Condizioni generali
    • Prassi sulla privacy
    • GDPR
    • Contatti
    • SEO in 2025
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Guide Link Building
  • Blog Link esterno
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文

Improve your internal links using Python string-matching

Andreas Voniatis

To round out our internal linking odyssey, Andreas Voniatis from Artios explains how Python can do much more for your internal links than the humble spreadsheet you might be used to.

@andreasvoniatis  
Andreas Voniatis 2024 podcast cover with logo
More SEO in 2024 YouTube Podcast Playlist Link Spotify Podcast Playlist Link Audible Podcast Playlist Link Apple Podcast Playlist Link

Improve your internal links using Python string-matching

Andreas says: “Use Python’s string-matching functions to increase the relevance of your internal links on your website.”

Can you give a brief explanation of the value of using Python for SEO?

“What I love about Python is that it can scale SEO really well. A lot of SEOs will be working in spreadsheets and there are obviously restrictions or limitations in terms of what a spreadsheet can do. They are limited in the scale of the data they can handle, like the number of rows, but also in the complexity of the functions and calculations that they can perform with that data.

For example, if you're optimizing a high-traffic website with tons of pages, like Amazon, then you're going to find scalable SEO analysis in Excel or Google Sheets pretty limiting.

Instead, you can use an IPython notebook known as Jupyter, that will allow you to run Python code. If you import string-matching functions, you can take a target keyword and compare that to the title tags of your site pages to try and find the best page to send internal links to.”

Are you using this to determine whether a page or a piece of content is sufficiently optimized or just to find the most appropriate internal page to link to?

“You could also use it for measuring how optimized your content is, which is a different use case for Python. Python has many use cases for scalable and data-driven SEO. In this case, though, we're trying to find content like blog posts where you can place internal links that will help reshape the importance of your target content for Google and other search engines.”

What content elements are you looking for?

“The great thing about doing this is that there are so many different ways to approach it. On a basic level, you could take your target keyword and the title tags of all of your content, and then simply use a string-matching function to calculate the similarity between them. Based on that similarity metric, you could use a quick rule of thumb to say that anything that's 60% or above would be considered suitable pages to place internal links on, for example.

You could do it at the body content level but that's a bit more complex because you need to ingest that content into a spreadsheet cell (or what we call a DataFrame in Python language) to do that kind of calculation. That’s possible thanks to Python.

If you don’t know what a good rule of thumb is, you can go even deeper. You can say, ‘I want to model the median’ or ‘I want to model the 95th percentile of what's considered relevant.’ You can determine your rule of thumb on a statistical basis rather than on something that you pulled out of thin air.”

Would you be able to incorporate intent into what you're looking for?

“You absolutely can. If you had the target keyword for your site content then you could create another separate column in which you've predetermined whether those two keywords share the same search intent or not.”

What data sources are required for this?

“If you wanted to do this at a basic level, you could just rely on crawling data alone. If you want to get search intent involved, then you'll need SERP data so that you can determine the similarity between your target keyword and the focus keyword of the content page you're comparing the search intent of. If you wanted to look at whether Google was crawling that page live, you would obviously use server logs.”

How do you clean URLs that you wouldn't want to link to?

“That’s a slightly separate issue, but let's get into it. One of the things that I do is model the page rank or link equity of a website using crawl data and external backlink data, so that I get both the internal and external page rank. Then, I amalgamate those two data sources together to get what I would call the ‘effective page rank’, which combines both the internal and the external.

Using that, you can transform or pivot your existing site structure away from the typical catalogue/product group structure (which might make sense from a librarian’s perspective) and move it more towards the type of content structure that the internet is more interested in.”

Should all SEOs be doing this or is it primarily for technical SEOs?

“To me, any SEO should have a holistic view, and all SEOs should understand it. If you call yourself an SEO generalist or an SEO consultant, then you should have a level of competency, if not experience or understanding, in the holistic elements of SEO.

You should be competent in your technical, your content, and your backlinks/off-page SEO. Technical SEOs should know how to do this themselves, but SEO content strategists might not need to.”

How can you use statistical distributions to model relevance and highlight under-served target content?

“If you look at the median number of internal links to a product category on an e-commerce site, for example, those will be very different from the median number of internal links to a product item. I don’t want to create a hard-and-fast rule. I don’t want to say that any pages that have less than 10 internal links need more links, or that you should add a certain number of links to those pages. If you use statistical distributions, you're taking a smarter, more tailored approach. You're taking a segmented approach, and you're accounting for the fact that not all content is equal.

You would expect your product categories to have more internal links, so the threshold will be high. Your product items may have fewer internal links, or it might be the other way around. The point is to take a segmented approach. By using distributions, you're moving away from hard-and-fast rules.”

Is this just for internal links or can this approach be used to determine the optimum landing page for external links as well?

“You can apply it to absolutely everything. That's the whole premise of being data-driven.”

How do you measure the ROI of improved internal linking?

“You would benchmark the ROI beforehand and then it's almost like a split test. You would benchmark what it was before, then you could make the change following the model’s recommendations and see what the ROI is afterwards. However, if you're going to make this change site-wide, then you would want to do a split A/A test because you're comparing the result of the internal linking on the same URL against itself, before and after.

If you wanted to make it truly scientific, then you would conduct a split A/B test. In that case, you would only make that change on a collection of unlinked URLs, measure the revenue before and after, then compare it to the control group.”

Does providing better and more relevant internal links also enhance usability?

“In theory (and, in many cases, in a practical sense), search engine SEO and user experience are often aligned. By optimizing your content for the search engines, you should also be optimizing it for the user. If the user knows what they're getting before they click on the link, and the link is more relevant for their needs, then that should improve their experience.”

If an SEO is struggling for time, what should they stop doing right now so they can spend more time doing what you suggest in 2024?

“Stop getting better at Excel and retrain in Python.

Personally, I rarely use Excel. I use Google Sheets but only for putting together nice graphs because the ones produced by Python are a bit too sciencey for a business audience.

A more diplomatic and practical approach would be to say, ‘Limit your use of Excel and retrain in Python’. You’ll start noticing that you can invest ten minutes or one hour working out how to solve a dilemma in Python rather than Excel and, eventually, it will get to the point where you can do so much more in Python that you will drop Excel like a hot potato.

Python is also well future-proofed. That’s not to say there won't be a language in 10, 15, or 20 years that will supersede Python. However, the great thing is that, once you learn a computing language, those skills are transferable to almost any other computing language. I started out using R, which is a statistical computing language. Once I saw that more of the SEO industry was favouring Python, it was really easy for me to switch. A lot of the function names are identical.”

Andreas Voniatis is Founder at Artios, and you can find him over at Artios.io.

@andreasvoniatis  

Also with Andreas Voniatis

Andreas Voniatis 2025 podcast cover with logo
SEO in 2025
Identify what brings in backlinks, instead of where they come from

We’ve talked about the basics of SEO, and is anything more central to the industry than backlinks? Andreas Voniatis from Artios suggests it’s less about hunting them down and more about drawing them in.

Majestic SEO Podcast - the Majestic SEO podcast cover
Majestic SEO Podcast
#57: How AI is being used to power organic growth – Live Podcast
Andreas Voniatis, Pam Aungst Cronin, and Victoria Olsina join David Bain to talk about how AI is being used to power organic seo growth.
Andreas Voniatis 2023 podcast cover with logo
2023 Additional Insight
Use data science to inform your SEO
Andreas Voniatis emphasizes the importance of taking a statistical approach to SEO, and shares how you can embrace data science to uncover SEO insights.

Choose Your Own Learning Style

Webinar iconVideo

If you like to get up-close with your favourite SEO experts, these one-to-one interviews might just be for you.

Watch all of our episodes, FREE, on our dedicated SEO in 2024 playlist.

youtube Playlist Icon

Podcast iconPodcast

Maybe you are more of a listener than a watcher, or prefer to learn while you commute.

SEO in 2024 is available now via all the usual podcast platforms

Spotify Apple Podcasts Audible

Book iconBook

This is our favourite. Sometimes it's better to sit and relax with a nice book.

The best of our range of interviews is available right now as a physical copy and eBook.

Amazon US Amazon UK

Don't miss out

Opt-in to receive email updates.

It's the fastest way to find out more about SEO in 2025.


Come possiamo migliorare questa pagina? Dicci cosa pensi

Indice Recente

URL unici scansionati 336.011.352.754
URL unici trovati 757.982.995.318
Intervallo date 22 gen 2025 a 23 mag 2025
Ultimo aggiornamento 1 ora 14 minuti fa

Indice storico

URL unici scansionati 4.502.566.935.407
URL unici trovati 21.743.308.221.308
Intervallo date 06 giu 2006 a 26 mar 2024
Ultimo aggiornamento 03 mag 2024

SOCIAL

  • LinkedIn
  • YouTube
  • Facebook
  • Bluesky
  • Twitter
  • Blog Link esterno

AZIENDA

  • Punteggi Flow Metric
  • Informazioni su
  • Condizioni generali
  • Informativa sulla privacy
  • GDPR
  • Contattaci

STRUMENTI

  • Piani e prezzi
  • Site Explorer
  • Confronta i domini
  • Bulk Backlink
  • Search Explorer
  • API sviluppatore Link esterno

MAJESTIC PER

  • Link Context
  • Backlink Checker
  • Professionisti SEO
  • Analisti dei media
  • Influencer Discovery
  • Impresa Link esterno

PODCASTS & PUBLICATIONS

  • The Majestic SEO Podcast
  • SEO in 2025
  • SEO nel 2024
  • SEO nel 2023
  • SEO nel 2022
  • All Podcasts
top ^