Technical Checks
Technical Checks
Technical Checks evaluate the foundational infrastructure of your website that determines whether AI systems can access, crawl, and understand your content. Even the best-optimized content is invisible to AI if technical barriers prevent crawlers from reaching it.
These checks focus on accessibility, security, and AI-specific standards that are becoming increasingly important as AI-powered search grows.
HTTPS
What Is Checked
Whether your website is served over a secure HTTPS connection with a valid SSL/TLS certificate.
Why It Matters for GEO
HTTPS is a baseline trust signal for both traditional search engines and AI systems. AI crawlers may deprioritize or skip websites served over insecure HTTP connections. A valid SSL certificate confirms that the website is legitimate and that data transmitted between the server and client is encrypted. Without HTTPS, AI systems may flag your content as less trustworthy.
Possible Results
| Status | Meaning |
|---|---|
| 🟢 Pass | Site is served over HTTPS with a valid certificate |
| 🔴 Fail | Site uses HTTP or has an invalid/expired certificate |
How to Fix
- Obtain an SSL/TLS certificate (free via Let's Encrypt or through your hosting provider)
- Configure your web server to redirect all HTTP traffic to HTTPS
- Update all internal links and resources to use HTTPS URLs
- Ensure your certificate is set to auto-renew
- Test your SSL configuration using SSL Labs
robots.txt — AI Crawler Access
What Is Checked
Your robots.txt file is analyzed for rules that affect 6 major AI crawlers:
| Crawler | Operator | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training data and browsing for ChatGPT |
| ClaudeBot | Anthropic | Training data for Claude |
| Google-Extended | Training data for Gemini and AI features | |
| Amazonbot | Amazon | Training data for Alexa and AI services |
| FacebookBot | Meta | Content understanding for Meta AI |
| Bytespider | ByteDance | Training data for TikTok and AI services |
Why It Matters for GEO
Your robots.txt file is the primary mechanism for controlling which AI systems can access your content. Blocking AI crawlers means your content will not be included in the AI's knowledge base — making it impossible for those AI systems to recommend or cite your business. This is the most critical technical check for GEO: if AI cannot crawl your site, nothing else matters.
Possible Results
| Status | Meaning |
|---|---|
| 🟢 All allowed | All 6 AI crawlers have access |
| 🟡 Partial | Some crawlers are blocked, others are allowed |
| 🔴 Blocked | Most or all AI crawlers are blocked |
How to Fix
Review your robots.txt file (located at https://yourdomain.com/robots.txt) and ensure you are not blocking AI crawlers you want to have access. A GEO-friendly robots.txt looks like:
User-agent: *
Allow: /
# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: FacebookBot
Allow: /
User-agent: Bytespider
Allow: /
Important: Some CMS platforms and security plugins add blanket bot-blocking rules. Check whether your CMS has added restrictive rules without your knowledge.
llms.txt
What Is Checked
Whether your website hosts a valid llms.txt file — a new standard for providing AI systems with a structured, human- and machine-readable summary of your website.
Why It Matters for GEO
The llms.txt file is an emerging standard designed specifically for the AI era. It provides language models with a concise overview of your website's content, purpose, and structure — acting as a "cover letter" for AI systems. While still new, early adoption signals AI-awareness and ensures that AI systems that support this standard can quickly understand your site.
Possible Results
| Status | Meaning |
|---|---|
| 🟢 Pass | Valid llms.txt file found at /llms.txt |
| 🟡 Partial | File exists but has formatting or content issues |
| 🔴 Fail | No llms.txt file found |
How to Fix
Create an llms.txt file in your website's root directory. The file should follow the emerging standard format:
# Your Company Name
> Brief one-line description of your company or website.
## About
A paragraph describing your organization, what you do, and who you serve.
## Services
- Service 1: Brief description
- Service 2: Brief description
- Service 3: Brief description
## Key Pages
- [About Us](https://www.example.com/about): Learn about our company
- [Services](https://www.example.com/services): Our full service offering
- [Blog](https://www.example.com/blog): Industry insights and guides
- [Contact](https://www.example.com/contact): Get in touch
Best practices for llms.txt:
- Keep it concise — this is a summary, not a full sitemap
- Include your most important pages with brief descriptions
- Update it when you add major new sections or pages
- Use clear, descriptive language that AI can parse easily
sitemap.xml
What Is Checked
Whether your website has a valid sitemap.xml file and how many URLs it contains.
Why It Matters for GEO
A sitemap tells AI crawlers about all the pages on your website and when they were last updated. Without a sitemap, AI crawlers must discover pages by following links — potentially missing important content that is poorly linked. The sitemap also communicates update frequency, helping AI systems prioritize crawling recently changed content.
Possible Results
| Status | Meaning |
|---|---|
| 🟢 Pass | Valid sitemap found with URLs listed |
| 🟡 Partial | Sitemap exists but has issues (empty, malformed, or very few URLs) |
| 🔴 Fail | No sitemap found |
How to Fix
- Generate a sitemap using your CMS (most CMS platforms have built-in sitemap generation)
- Ensure the sitemap includes all important pages (not just blog posts)
- Add
<lastmod>dates to help crawlers identify recently updated content - Reference the sitemap in your
robots.txt:
Sitemap: https://www.example.com/sitemap.xml
- Keep the sitemap under 50,000 URLs (use sitemap index files for larger sites)
- Exclude pages you do not want indexed (e.g., admin pages, duplicate content)
- Validate your sitemap using an XML validator
IndexNow
What Is Checked
Whether your website supports the IndexNow protocol for instant URL submission to search engines.
Why It Matters for GEO
IndexNow allows you to proactively notify search engines (including Bing, Yandex, and participating AI systems) when content is created or updated. Instead of waiting for crawlers to discover changes, IndexNow pushes updates instantly. This is particularly valuable for GEO because it ensures AI systems have access to your latest content as quickly as possible.
Possible Results
| Status | Meaning |
|---|---|
| 🟢 Pass | IndexNow support detected (API key or integration found) |
| 🔴 Fail | No IndexNow support detected |
How to Fix
- Register for an IndexNow API key at indexnow.org
- Place the API key file in your website's root directory
- Integrate IndexNow into your CMS or publishing workflow to automatically submit URLs when content changes
- Many CMS platforms offer IndexNow plugins (e.g., for WordPress, use the IndexNow plugin)
- For custom implementations, submit a POST request to the IndexNow API whenever you publish or update a page:
curl -X POST "https://api.indexnow.org/indexnow" \
-H "Content-Type: application/json" \
-d '{
"host": "www.example.com",
"key": "your-api-key",
"urlList": [
"https://www.example.com/updated-page"
]
}'
AI Crawler Access — Overall Assessment
What Is Checked
An overall assessment of how accessible your website is to AI crawlers, combining results from the robots.txt check, HTTPS status, sitemap availability, and other technical factors.
Why It Matters for GEO
This is the summary check that tells you whether AI systems can effectively discover and process your content. Even if your content is perfectly optimized for GEO, technical barriers at the crawler level nullify all that effort. This assessment provides a holistic view of your technical GEO readiness.
Possible Results
| Status | Meaning |
|---|---|
| 🟢 Fully accessible | AI crawlers can discover and access all content |
| 🟡 Partially accessible | Some barriers exist that may limit AI crawler access |
| 🔴 Largely inaccessible | Significant technical barriers preventing AI access |
How to Fix
Address the individual technical checks above in this priority order:
- HTTPS — Fix certificate issues first (foundational trust requirement)
- robots.txt — Unblock AI crawlers (most common barrier)
- sitemap.xml — Create or fix your sitemap (ensures content discovery)
- llms.txt — Add an llms.txt file (emerging best practice)
- IndexNow — Implement IndexNow (proactive indexing)
General Technical Best Practices for GEO
Page Speed
While not a scored check, page speed affects crawler behavior. Slow-loading pages may be abandoned by crawlers with time limits. Aim for a page load time under 3 seconds.
Canonical Tags
Use canonical tags to indicate the preferred version of duplicate or similar pages. This prevents AI from being confused by multiple versions of the same content.
Structured Data
Ensure your JSON-LD Schema.org markup is valid and free of errors. Malformed structured data is worse than no structured data — it can mislead AI about your content.
Crawl Budget
If your site has thousands of pages, be strategic about which pages are accessible. Use robots.txt and noindex tags to direct crawlers toward your most important content.
Server Reliability
Ensure your server responds consistently with proper HTTP status codes. Frequent 500 errors or timeouts will cause AI crawlers to reduce their crawl frequency for your site.