Buscar

Marktechpost AI Compartió un vínculo

2025-05-15 10:31:37 ·

A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX

In this tutorial, we demonstrate how to construct an automated Knowledge Graphpipeline using LangGraph and NetworkX. The pipeline simulates a sequence of intelligent agents that collaboratively perform tasks such as data gathering, entity extraction, relation identification, entity resolution, and graph validation. Starting from a user-provided topic, such as “Artificial Intelligence,” the system methodically extracts relevant entities and relationships, resolves duplicates, and integrates the information into a cohesive graphical structure. By visualizing the final knowledge graph, developers and data scientists gain clear insights into complex interrelations among concepts, making this approach highly beneficial for applications in semantic analysis, natural language processing, and knowledge management.
!pip install langgraph langchain_core
We install two essential Python libraries: LangGraph, which is used for creating and orchestrating agent-based computational workflows, and LangChain Core, which provides foundational classes and utilities for building language model-powered applications. These libraries enable seamless integration of agents into intelligent data pipelines.
import re
import networkx as nx
import matplotlib.pyplot as plt
from typing import TypedDict, List, Tuple, Dict, Any
from langchain_core.messages import HumanMessage, AIMessage
from langgraph.graph import StateGraph, END
We import essential libraries to build an automated knowledge graph pipeline. It includes re for regular expression-based text processing, NetworkX and matplotlib for creating and visualizing graphs, TypedDict and typing annotations for structured data handling, and LangGraph along with langchain_core for orchestrating the interaction between AI agents within the workflow.
class KGState:
topic: str
raw_text: str
entities: Listrelations: List]
resolved_relations: List]
graph: Any
validation: Dictmessages: Listcurrent_agent: str
We define a structured data type, KGState, using Python’s TypedDict. It outlines the schema for managing state across different steps of the knowledge graph pipeline. It includes details like the chosen topic, gathered text, identified entities and relationships, resolved duplicates, the constructed graph object, validation results, interaction messages, and tracking the currently active agent.
def data_gatherer-> KGState:
topic = stateprintcollected_text = f"{topic} is an important concept. It relates to various entities like EntityA, EntityB, and EntityC. EntityA influences EntityB. EntityC is a type of EntityB."

state.append)

state= collected_text
state= "entity_extractor"

return state
This function, data_gatherer, acts as the first step in the pipeline. It simulates gathering raw text data about a provided topic. It then stores this simulated data into state, adds a message indicating the data collection completion, and updates the pipeline’s state by setting the next agentas active.
def entity_extractor-> KGState:
printtext = stateentities = re.findallentities =] + entities
state= list)

state.append)
printstate= "relation_extractor"

return state
The entity_extractor function identifies entities from the collected raw text using a simple regular expression pattern that matches terms like “EntityA”, “EntityB”, etc. It also includes the main topic as an entity and ensures uniqueness by converting the list to a set. The extracted entities are stored in the state, an AI message logs the result, and the pipeline advances to the relation_extractor agent.
def relation_extractor-> KGState:
printtext = stateentities = staterelations =relation_patterns =for e1 in entities:
for e2 in entities:
if e1 != e2:
for pattern, rel_type in relation_patterns:
if re.search, re.IGNORECASE) or \
re.search:
relations.append)

state= relations
state.append)
printstate= "entity_resolver"

return state
The relation_extractor function detects semantic relationships between entities within the raw text. It uses predefined regex patterns to identify phrases like “influences” or “is a type of” between entity pairs. When a match is found, it adds the corresponding relation as a tripleto the relations list. These extracted relations are stored in the state, a message is logged for agent communication, and control moves to the next agent: entity_resolver.
def entity_resolver-> KGState:
printentity_map = {}
for entity in state:
canonical_name = entity.lower.replaceentity_map= canonical_name

resolved_relations =for s, p, o in state:
s_resolved = entity_map.geto_resolved = entity_map.getresolved_relations.append)

state= resolved_relations
state.append)

state= "graph_integrator"

return state
The entity_resolver function standardizes entity names to avoid duplication and inconsistencies. It creates a mappingby converting each entity to lowercase and replacing spaces with underscores. Then, this mapping is applied to all subjects and objects in the extracted relations to produce resolved relations. These normalized triples are added to the state, a confirmation message is logged, and control is passed to the graph_integrator agent.
def graph_integrator-> KGState:
printG = nx.DiGraphfor s, p, o in state:
if not G.has_node:
G.add_nodeif not G.has_node:
G.add_nodeG.add_edgestate= G
state.append} nodes and {len} edges"))

state= "graph_validator"

return state
The graph_integrator function constructs the actual knowledge graph using networkx.DiGraphsupports directed relationships. It iterates over the resolved triples, ensures both nodes exist, and then adds a directed edge with the relation as metadata. The resulting graph is saved in the state, a summary message is appended, and the pipeline transitions to the graph_validator agent for final validation.
def graph_validator-> KGState:
printG = statevalidation_report = {
"num_nodes": len,
"num_edges": len,
"is_connected": nx.is_weakly_connectedif G.nodes else False,
"has_cycles": not nx.is_directed_acyclic_graphif G.nodes else False
}

state= validation_report
state.append)
printstate= END

return state
The graph_validator function performs a basic health check on the constructed knowledge graph. It compiles a validation report containing the number of nodes and edges, whether the graph is weakly connected, and whether the graph contains cycles. This report is added to the state and logged as an AI message. Once validation is complete, the pipeline is marked as finished by setting the current_agent to END.
def router-> str:
return statedef visualize_graph:
plt.figure)
pos = nx.spring_layoutnx.drawedge_labels = nx.get_edge_attributesnx.draw_networkx_edge_labelsplt.titleplt.tight_layoutplt.showThe router function directs the pipeline to the next agent based on the current_agent field in the state. Meanwhile, the visualize_graph function uses matplotlib and networkx to display the final knowledge graph, showing nodes, edges, and labeled relationships for intuitive visual understanding.
def build_kg_graph:
workflow = StateGraphworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.set_entry_pointreturn workflow.compileThe build_kg_graph function defines the complete knowledge graph workflow using LangGraph. It sequentially adds each agent as a node, from data collection to graph validation, and connects them through conditional transitions based on the current agent. The entry point is set to data_gatherer, and the graph is compiled into an executable workflow that guides the automated pipeline from start to finish.
def run_knowledge_graph_pipeline:
printinitial_state = {
"topic": topic,
"raw_text": "",
"entities":,
"relations":,
"resolved_relations":,
"graph": None,
"validation": {},
"messages":,
"current_agent": "data_gatherer"
}

kg_app = build_kg_graphfinal_state = kg_app.invokeprintreturn final_state
The run_knowledge_graph_pipeline function initializes the pipeline by setting up an empty state dictionary with the provided topic. It builds the workflow using build_kg_graph, then runs it by invoking the compiled graph with the initial state. As each agent processes the data, the state evolves, and the final result contains the complete knowledge graph, validated and ready for use.
if __name__ == "__main__":
topic = "Artificial Intelligence"
result = run_knowledge_graph_pipelinevisualize_graphFinally, this block serves as the script’s entry point. When executed directly, it triggers the knowledge graph pipeline for the topic “Artificial Intelligence,” runs through all agent stages, and finally visualizes the resulting graph using the visualize_graphfunction. It provides an end-to-end demonstration of automated knowledge graph generation.

Output Generated from Knowledge Graph Execution
In conclusion, we have learned how to seamlessly integrate multiple specialized agents into a cohesive knowledge graph pipeline through this structured approach, leveraging LangGraph and NetworkX. This workflow automates entity and relation extraction processes and visualizes intricate relationships, offering a clear and actionable representation of gathered information. By adjusting and enhancing individual agents, such as employing more sophisticated entity recognition methods or integrating real-time data sources, this foundational framework can be scaled and customized for advanced knowledge graph construction tasks across various domains.

Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.
Asif RazzaqWebsite | + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Introduces AlphaEvolve: A Gemini-Powered Coding AI Agent for Algorithm Discovery and Scientific OptimizationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Rime Introduces Arcana and Rimecaster: Practical Voice AI Tools Built on Real-World SpeechAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChainAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization
#stepbystep #guide #build #automated #knowledge

A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX
In this tutorial, we demonstrate how to construct an automated Knowledge Graphpipeline using LangGraph and NetworkX. The pipeline simulates a sequence of intelligent agents that collaboratively perform tasks such as data gathering, entity extraction, relation identification, entity resolution, and graph validation. Starting from a user-provided topic, such as “Artificial Intelligence,” the system methodically extracts relevant entities and relationships, resolves duplicates, and integrates the information into a cohesive graphical structure. By visualizing the final knowledge graph, developers and data scientists gain clear insights into complex interrelations among concepts, making this approach highly beneficial for applications in semantic analysis, natural language processing, and knowledge management. !pip install langgraph langchain_core We install two essential Python libraries: LangGraph, which is used for creating and orchestrating agent-based computational workflows, and LangChain Core, which provides foundational classes and utilities for building language model-powered applications. These libraries enable seamless integration of agents into intelligent data pipelines. import re import networkx as nx import matplotlib.pyplot as plt from typing import TypedDict, List, Tuple, Dict, Any from langchain_core.messages import HumanMessage, AIMessage from langgraph.graph import StateGraph, END We import essential libraries to build an automated knowledge graph pipeline. It includes re for regular expression-based text processing, NetworkX and matplotlib for creating and visualizing graphs, TypedDict and typing annotations for structured data handling, and LangGraph along with langchain_core for orchestrating the interaction between AI agents within the workflow. class KGState: topic: str raw_text: str entities: Listrelations: List] resolved_relations: List] graph: Any validation: Dictmessages: Listcurrent_agent: str We define a structured data type, KGState, using Python’s TypedDict. It outlines the schema for managing state across different steps of the knowledge graph pipeline. It includes details like the chosen topic, gathered text, identified entities and relationships, resolved duplicates, the constructed graph object, validation results, interaction messages, and tracking the currently active agent. def data_gatherer-> KGState: topic = stateprintcollected_text = f"{topic} is an important concept. It relates to various entities like EntityA, EntityB, and EntityC. EntityA influences EntityB. EntityC is a type of EntityB." state.append) state= collected_text state= "entity_extractor" return state This function, data_gatherer, acts as the first step in the pipeline. It simulates gathering raw text data about a provided topic. It then stores this simulated data into state, adds a message indicating the data collection completion, and updates the pipeline’s state by setting the next agentas active. def entity_extractor-> KGState: printtext = stateentities = re.findallentities =] + entities state= list) state.append) printstate= "relation_extractor" return state The entity_extractor function identifies entities from the collected raw text using a simple regular expression pattern that matches terms like “EntityA”, “EntityB”, etc. It also includes the main topic as an entity and ensures uniqueness by converting the list to a set. The extracted entities are stored in the state, an AI message logs the result, and the pipeline advances to the relation_extractor agent. def relation_extractor-> KGState: printtext = stateentities = staterelations =relation_patterns =for e1 in entities: for e2 in entities: if e1 != e2: for pattern, rel_type in relation_patterns: if re.search, re.IGNORECASE) or \ re.search: relations.append) state= relations state.append) printstate= "entity_resolver" return state The relation_extractor function detects semantic relationships between entities within the raw text. It uses predefined regex patterns to identify phrases like “influences” or “is a type of” between entity pairs. When a match is found, it adds the corresponding relation as a tripleto the relations list. These extracted relations are stored in the state, a message is logged for agent communication, and control moves to the next agent: entity_resolver. def entity_resolver-> KGState: printentity_map = {} for entity in state: canonical_name = entity.lower.replaceentity_map= canonical_name resolved_relations =for s, p, o in state: s_resolved = entity_map.geto_resolved = entity_map.getresolved_relations.append) state= resolved_relations state.append) state= "graph_integrator" return state The entity_resolver function standardizes entity names to avoid duplication and inconsistencies. It creates a mappingby converting each entity to lowercase and replacing spaces with underscores. Then, this mapping is applied to all subjects and objects in the extracted relations to produce resolved relations. These normalized triples are added to the state, a confirmation message is logged, and control is passed to the graph_integrator agent. def graph_integrator-> KGState: printG = nx.DiGraphfor s, p, o in state: if not G.has_node: G.add_nodeif not G.has_node: G.add_nodeG.add_edgestate= G state.append} nodes and {len} edges")) state= "graph_validator" return state The graph_integrator function constructs the actual knowledge graph using networkx.DiGraphsupports directed relationships. It iterates over the resolved triples, ensures both nodes exist, and then adds a directed edge with the relation as metadata. The resulting graph is saved in the state, a summary message is appended, and the pipeline transitions to the graph_validator agent for final validation. def graph_validator-> KGState: printG = statevalidation_report = { "num_nodes": len, "num_edges": len, "is_connected": nx.is_weakly_connectedif G.nodes else False, "has_cycles": not nx.is_directed_acyclic_graphif G.nodes else False } state= validation_report state.append) printstate= END return state The graph_validator function performs a basic health check on the constructed knowledge graph. It compiles a validation report containing the number of nodes and edges, whether the graph is weakly connected, and whether the graph contains cycles. This report is added to the state and logged as an AI message. Once validation is complete, the pipeline is marked as finished by setting the current_agent to END. def router-> str: return statedef visualize_graph: plt.figure) pos = nx.spring_layoutnx.drawedge_labels = nx.get_edge_attributesnx.draw_networkx_edge_labelsplt.titleplt.tight_layoutplt.showThe router function directs the pipeline to the next agent based on the current_agent field in the state. Meanwhile, the visualize_graph function uses matplotlib and networkx to display the final knowledge graph, showing nodes, edges, and labeled relationships for intuitive visual understanding. def build_kg_graph: workflow = StateGraphworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_nodeworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.add_conditional_edgesworkflow.set_entry_pointreturn workflow.compileThe build_kg_graph function defines the complete knowledge graph workflow using LangGraph. It sequentially adds each agent as a node, from data collection to graph validation, and connects them through conditional transitions based on the current agent. The entry point is set to data_gatherer, and the graph is compiled into an executable workflow that guides the automated pipeline from start to finish. def run_knowledge_graph_pipeline: printinitial_state = { "topic": topic, "raw_text": "", "entities":, "relations":, "resolved_relations":, "graph": None, "validation": {}, "messages":, "current_agent": "data_gatherer" } kg_app = build_kg_graphfinal_state = kg_app.invokeprintreturn final_state The run_knowledge_graph_pipeline function initializes the pipeline by setting up an empty state dictionary with the provided topic. It builds the workflow using build_kg_graph, then runs it by invoking the compiled graph with the initial state. As each agent processes the data, the state evolves, and the final result contains the complete knowledge graph, validated and ready for use. if __name__ == "__main__": topic = "Artificial Intelligence" result = run_knowledge_graph_pipelinevisualize_graphFinally, this block serves as the script’s entry point. When executed directly, it triggers the knowledge graph pipeline for the topic “Artificial Intelligence,” runs through all agent stages, and finally visualizes the resulting graph using the visualize_graphfunction. It provides an end-to-end demonstration of automated knowledge graph generation. Output Generated from Knowledge Graph Execution In conclusion, we have learned how to seamlessly integrate multiple specialized agents into a cohesive knowledge graph pipeline through this structured approach, leveraging LangGraph and NetworkX. This workflow automates entity and relation extraction processes and visualizes intricate relationships, offering a clear and actionable representation of gathered information. By adjusting and enhancing individual agents, such as employing more sophisticated entity recognition methods or integrating real-time data sources, this foundational framework can be scaled and customized for advanced knowledge graph construction tasks across various domains. Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Asif RazzaqWebsite | + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Introduces AlphaEvolve: A Gemini-Powered Coding AI Agent for Algorithm Discovery and Scientific OptimizationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Rime Introduces Arcana and Rimecaster: Practical Voice AI Tools Built on Real-World SpeechAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChainAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization #stepbystep #guide #build #automated #knowledge

WWW.MARKTECHPOST.COM

A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX

In this tutorial, we demonstrate how to construct an automated Knowledge Graph (KG) pipeline using LangGraph and NetworkX. The pipeline simulates a sequence of intelligent agents that collaboratively perform tasks such as data gathering, entity extraction, relation identification, entity resolution, and graph validation. Starting from a user-provided topic, such as “Artificial Intelligence,” the system methodically extracts relevant entities and relationships, resolves duplicates, and integrates the information into a cohesive graphical structure. By visualizing the final knowledge graph, developers and data scientists gain clear insights into complex interrelations among concepts, making this approach highly beneficial for applications in semantic analysis, natural language processing, and knowledge management. !pip install langgraph langchain_core We install two essential Python libraries: LangGraph, which is used for creating and orchestrating agent-based computational workflows, and LangChain Core, which provides foundational classes and utilities for building language model-powered applications. These libraries enable seamless integration of agents into intelligent data pipelines. import re import networkx as nx import matplotlib.pyplot as plt from typing import TypedDict, List, Tuple, Dict, Any from langchain_core.messages import HumanMessage, AIMessage from langgraph.graph import StateGraph, END We import essential libraries to build an automated knowledge graph pipeline. It includes re for regular expression-based text processing, NetworkX and matplotlib for creating and visualizing graphs, TypedDict and typing annotations for structured data handling, and LangGraph along with langchain_core for orchestrating the interaction between AI agents within the workflow. class KGState(TypedDict): topic: str raw_text: str entities: List[str] relations: List[Tuple[str, str, str]] resolved_relations: List[Tuple[str, str, str]] graph: Any validation: Dict[str, Any] messages: List[Any] current_agent: str We define a structured data type, KGState, using Python’s TypedDict. It outlines the schema for managing state across different steps of the knowledge graph pipeline. It includes details like the chosen topic, gathered text, identified entities and relationships, resolved duplicates, the constructed graph object, validation results, interaction messages, and tracking the currently active agent. def data_gatherer(state: KGState) -> KGState: topic = state["topic"] print(f"📚 Data Gatherer: Searching for information about '{topic}'") collected_text = f"{topic} is an important concept. It relates to various entities like EntityA, EntityB, and EntityC. EntityA influences EntityB. EntityC is a type of EntityB." state["messages"].append(AIMessage(content=f"Collected raw text about {topic}")) state["raw_text"] = collected_text state["current_agent"] = "entity_extractor" return state This function, data_gatherer, acts as the first step in the pipeline. It simulates gathering raw text data about a provided topic (stored in state[“topic”]). It then stores this simulated data into state[“raw_text”], adds a message indicating the data collection completion, and updates the pipeline’s state by setting the next agent (entity_extractor) as active. def entity_extractor(state: KGState) -> KGState: print("🔍 Entity Extractor: Identifying entities in the text") text = state["raw_text"] entities = re.findall(r"Entity[A-Z]", text) entities = [state["topic"]] + entities state["entities"] = list(set(entities)) state["messages"].append(AIMessage(content=f"Extracted entities: {state['entities']}")) print(f" Found entities: {state['entities']}") state["current_agent"] = "relation_extractor" return state The entity_extractor function identifies entities from the collected raw text using a simple regular expression pattern that matches terms like “EntityA”, “EntityB”, etc. It also includes the main topic as an entity and ensures uniqueness by converting the list to a set. The extracted entities are stored in the state, an AI message logs the result, and the pipeline advances to the relation_extractor agent. def relation_extractor(state: KGState) -> KGState: print("🔗 Relation Extractor: Identifying relationships between entities") text = state["raw_text"] entities = state["entities"] relations = [] relation_patterns = [ (r"([A-Za-z]+) relates to ([A-Za-z]+)", "relates_to"), (r"([A-Za-z]+) influences ([A-Za-z]+)", "influences"), (r"([A-Za-z]+) is a type of ([A-Za-z]+)", "is_type_of") ] for e1 in entities: for e2 in entities: if e1 != e2: for pattern, rel_type in relation_patterns: if re.search(f"{e1}.*{rel_type}.*{e2}", text.replace("_", " "), re.IGNORECASE) or \ re.search(f"{e1}.*{e2}", text, re.IGNORECASE): relations.append((e1, rel_type, e2)) state["relations"] = relations state["messages"].append(AIMessage(content=f"Extracted relations: {relations}")) print(f" Found relations: {relations}") state["current_agent"] = "entity_resolver" return state The relation_extractor function detects semantic relationships between entities within the raw text. It uses predefined regex patterns to identify phrases like “influences” or “is a type of” between entity pairs. When a match is found, it adds the corresponding relation as a triple (subject, predicate, object) to the relations list. These extracted relations are stored in the state, a message is logged for agent communication, and control moves to the next agent: entity_resolver. def entity_resolver(state: KGState) -> KGState: print("🔄 Entity Resolver: Resolving duplicate entities") entity_map = {} for entity in state["entities"]: canonical_name = entity.lower().replace(" ", "_") entity_map[entity] = canonical_name resolved_relations = [] for s, p, o in state["relations"]: s_resolved = entity_map.get(s, s) o_resolved = entity_map.get(o, o) resolved_relations.append((s_resolved, p, o_resolved)) state["resolved_relations"] = resolved_relations state["messages"].append(AIMessage(content=f"Resolved relations: {resolved_relations}")) state["current_agent"] = "graph_integrator" return state The entity_resolver function standardizes entity names to avoid duplication and inconsistencies. It creates a mapping (entity_map) by converting each entity to lowercase and replacing spaces with underscores. Then, this mapping is applied to all subjects and objects in the extracted relations to produce resolved relations. These normalized triples are added to the state, a confirmation message is logged, and control is passed to the graph_integrator agent. def graph_integrator(state: KGState) -> KGState: print("📊 Graph Integrator: Building the knowledge graph") G = nx.DiGraph() for s, p, o in state["resolved_relations"]: if not G.has_node(s): G.add_node(s) if not G.has_node(o): G.add_node(o) G.add_edge(s, o, relation=p) state["graph"] = G state["messages"].append(AIMessage(content=f"Built graph with {len(G.nodes)} nodes and {len(G.edges)} edges")) state["current_agent"] = "graph_validator" return state The graph_integrator function constructs the actual knowledge graph using networkx.DiGraph() supports directed relationships. It iterates over the resolved triples (subject, predicate, object), ensures both nodes exist, and then adds a directed edge with the relation as metadata. The resulting graph is saved in the state, a summary message is appended, and the pipeline transitions to the graph_validator agent for final validation. def graph_validator(state: KGState) -> KGState: print("✅ Graph Validator: Validating knowledge graph") G = state["graph"] validation_report = { "num_nodes": len(G.nodes), "num_edges": len(G.edges), "is_connected": nx.is_weakly_connected(G) if G.nodes else False, "has_cycles": not nx.is_directed_acyclic_graph(G) if G.nodes else False } state["validation"] = validation_report state["messages"].append(AIMessage(content=f"Validation report: {validation_report}")) print(f" Validation report: {validation_report}") state["current_agent"] = END return state The graph_validator function performs a basic health check on the constructed knowledge graph. It compiles a validation report containing the number of nodes and edges, whether the graph is weakly connected (i.e., every node is reachable if direction is ignored), and whether the graph contains cycles. This report is added to the state and logged as an AI message. Once validation is complete, the pipeline is marked as finished by setting the current_agent to END. def router(state: KGState) -> str: return state["current_agent"] def visualize_graph(graph): plt.figure(figsize=(10, 6)) pos = nx.spring_layout(graph) nx.draw(graph, pos, with_labels=True, node_color='skyblue', node_size=1500, font_size=10) edge_labels = nx.get_edge_attributes(graph, 'relation') nx.draw_networkx_edge_labels(graph, pos, edge_labels=edge_labels) plt.title("Knowledge Graph") plt.tight_layout() plt.show() The router function directs the pipeline to the next agent based on the current_agent field in the state. Meanwhile, the visualize_graph function uses matplotlib and networkx to display the final knowledge graph, showing nodes, edges, and labeled relationships for intuitive visual understanding. def build_kg_graph(): workflow = StateGraph(KGState) workflow.add_node("data_gatherer", data_gatherer) workflow.add_node("entity_extractor", entity_extractor) workflow.add_node("relation_extractor", relation_extractor) workflow.add_node("entity_resolver", entity_resolver) workflow.add_node("graph_integrator", graph_integrator) workflow.add_node("graph_validator", graph_validator) workflow.add_conditional_edges("data_gatherer", router, {"entity_extractor": "entity_extractor"}) workflow.add_conditional_edges("entity_extractor", router, {"relation_extractor": "relation_extractor"}) workflow.add_conditional_edges("relation_extractor", router, {"entity_resolver": "entity_resolver"}) workflow.add_conditional_edges("entity_resolver", router, {"graph_integrator": "graph_integrator"}) workflow.add_conditional_edges("graph_integrator", router, {"graph_validator": "graph_validator"}) workflow.add_conditional_edges("graph_validator", router, {END: END}) workflow.set_entry_point("data_gatherer") return workflow.compile() The build_kg_graph function defines the complete knowledge graph workflow using LangGraph. It sequentially adds each agent as a node, from data collection to graph validation, and connects them through conditional transitions based on the current agent. The entry point is set to data_gatherer, and the graph is compiled into an executable workflow that guides the automated pipeline from start to finish. def run_knowledge_graph_pipeline(topic): print(f"🚀 Starting knowledge graph pipeline for: {topic}") initial_state = { "topic": topic, "raw_text": "", "entities": [], "relations": [], "resolved_relations": [], "graph": None, "validation": {}, "messages": [HumanMessage(content=f"Build a knowledge graph about {topic}")], "current_agent": "data_gatherer" } kg_app = build_kg_graph() final_state = kg_app.invoke(initial_state) print(f"✨ Knowledge graph construction complete for: {topic}") return final_state The run_knowledge_graph_pipeline function initializes the pipeline by setting up an empty state dictionary with the provided topic. It builds the workflow using build_kg_graph(), then runs it by invoking the compiled graph with the initial state. As each agent processes the data, the state evolves, and the final result contains the complete knowledge graph, validated and ready for use. if __name__ == "__main__": topic = "Artificial Intelligence" result = run_knowledge_graph_pipeline(topic) visualize_graph(result["graph"]) Finally, this block serves as the script’s entry point. When executed directly, it triggers the knowledge graph pipeline for the topic “Artificial Intelligence,” runs through all agent stages, and finally visualizes the resulting graph using the visualize_graph() function. It provides an end-to-end demonstration of automated knowledge graph generation. Output Generated from Knowledge Graph Execution In conclusion, we have learned how to seamlessly integrate multiple specialized agents into a cohesive knowledge graph pipeline through this structured approach, leveraging LangGraph and NetworkX. This workflow automates entity and relation extraction processes and visualizes intricate relationships, offering a clear and actionable representation of gathered information. By adjusting and enhancing individual agents, such as employing more sophisticated entity recognition methods or integrating real-time data sources, this foundational framework can be scaled and customized for advanced knowledge graph construction tasks across various domains. Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Asif RazzaqWebsite | + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Introduces AlphaEvolve: A Gemini-Powered Coding AI Agent for Algorithm Discovery and Scientific OptimizationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Rime Introduces Arcana and Rimecaster (Open Source): Practical Voice AI Tools Built on Real-World SpeechAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChainAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization

·79 Views

Please log in to like, share and comment!
Towards Data Science Compartió un vínculo

2025-05-15 05:41:23 ·

The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated

The saying goes that 80% of data collected, stored and maintained by governments can be associated with geographical locations. Although never empirically proven, it illustrates the importance of location within data. Ever growing data volumes put constraints on systems that handle geospatial data. Common Big Data compute engines, originally designed to scale for textual data, need adaptation to work efficiently with geospatial data — think of geographical indexes, partitioning, and operators. Here, I present and illustrate how to utilize the Microsoft Fabric Spark compute engine, with the natively integrated ESRI GeoAnalytics engine# for geospatial big data processing and analytics.

The optional GeoAnalytics capabilities within Fabric enable the processing and analytics of vector-type geospatial data, where vector-type geospatial data refers to points, lines, polygons. These capabilities include more than 150 spatial functions to create geometries, test, and select spatial relationships. As it extends Spark, the GeoAnalytics functions can be called when using Python, SQL, or Scala. These spatial operations apply automatically spatial indexing, making the Spark compute engine also efficient for this data. It can handle 10 extra common spatial data formats to load and save data spatial data, on top of the Spark natively supported data source formats. This blog post focuses on the scalable geospatial compute engines as has been introduced in my post about geospatial in the age of AI.

Demonstration explained

Here, I demonstrate some of these spatial capabilities by showing the data manipulation and analytics steps on a large dataset. By using several tiles covering point cloud data, an enormous dataset starts to form, while it still covers a relatively small area. The open Dutch AHN dataset, which is a national digital elevation and surface model, is currently in its fifth update cycle, and spans a period of nearly 30 years. Here, the data from the second, third, and forth acquisition is used, as these hold full national coverage, while the first version did not include a point cloud release.

Another Dutch open dataset, namely building data, the BAG, is used to illustrate spatial selection. The building dataset contains the footprint of the buildings as polygons. Currently, this dataset holds more than 11 million buildings. To test the spatial functions, I use only 4 AHN tiles per AHN version. Thus in this case, 12 tiles, each of 5 x 6.25 km. Totalling to more than 3.5 billion points within an area of 125 square kilometers. The chosen area covers the municipality of Loppersum, an area prone to land subsidence due to gas extraction.

The steps to take include the selection of buildings within the area of Loppersum, selecting the x,y,z-points from the roofs of the buildings. Then, we bring the 3 datasets into one dataframe and do an extra analysis with it. A spatial regression to predict the expected height of a building based on its height history as well as the history of the buildings in its direct surroundings. Not necessarily the best analysis to perform on this data to come to actual predictions* but it suits merely the purpose of demonstrating the spatial processing capabilities of Fabric’s ESRI GeoAnalytics. All the below code snippets are also available as notebooks on github.

Step 1: Read data

Spatial data can come in many different data formats; we conform to the geoparquet data format for further processing. The BAG building data, both the footprints as well as the accompanied municipality boundaries, come in geoparquet format already. The point cloud AHN data, version 2, 3 and 4, however, comes as LAZ file formats — a compressed industry standard format for point clouds. I have not found a Spark library to read LAZ, and created a txt file, separately, with the LAStools+ first.

# ESRI - FABRIC reference: /

# Import the required modules
import geoanalytics_fabric
from geoanalytics_fabric.sql import functions as ST
from geoanalytics_fabric import extensions

# Read ahn file from OneLake
# AHN lidar data source: /

ahn_csv_path = "Files/AHN lidar/AHN4_csv"
lidar_df = spark.read.options.csvlidar_df = lidar_df.selectExprlidar_df.printSchemalidar_df.showlidar_df.countThe above code snippet& provides the below results:

Now, with the spatial functions make_point and srid the x,y,z columns are transformed to a point geometry and set it to the specific Dutch coordinate system, see the below code snippet&:

# Create point geometry from x,y,z columns and set the spatial refrence system
lidar_df = lidar_df.select.alias)
lidar_df = lidar_df.withColumn)
lidar_df = lidar_df.select.alias)\
.withColumn)

lidar_df.printSchemalidar_df.showBuilding and municipality data can be read with the extended spark.read function for geoparquet, see the code snippet&:

# Read building polygon data
path_building = "Files/BAG NL/BAG_pand_202504.parquet"
df_buildings = spark.read.format.load# Read woonplaats datapath_woonplaats = "Files/BAG NL/BAG_woonplaats_202504.parquet"
df_woonplaats = spark.read.format.load# Filter the DataFrame where the "woonplaats" column contains the string "Loppersum"
df_loppersum = df_woonplaats.filter.contains)

Step 2: Make selections

In the accompanying notebooks, I read and write to geoparquet. To make sure the right data is read correctly as dataframes, see the following code snippet:

# Read building polygon data
path_building = "Files/BAG NL/BAG_pand_202504.parquet"
df_buildings = spark.read.format.load# Read woonplaats datapath_woonplaats = "Files/BAG NL/BAG_woonplaats_202504.parquet"
df_woonplaats = spark.read.format.load# Filter the DataFrame where the "woonplaats" column contains the string "Loppersum"
df_loppersum = df_woonplaats.filter.contains)

With all data in dataframes it becomes a simple step to do spatial selections. The following code snippet& shows how to select the buildings within the boundaries of the Loppersum municipality, and separately makes a selection of buildings that existed throughout the period. This resulted in 1196 buildings, out of the 2492 buildings currently.

# Clip the BAG buildings to the gemeente Loppersum boundary
df_buildings_roi = Clip.run# select only buildings older then AHN data= 2009)
# and with a status in usedf_buildings_roi_select = df_buildings_roi.where&)

The three AHN versions used, further named as T1, T2 and T3 respectively, are then clipped based on the selected building data. The AggregatePoints function can be utilized to calculate, in this case from the heightsome statistics, like the mean per roof, the standard deviation and the number of z-values it is based upon; see the code snippet:

# Select and aggregrate lidar points from buildings within ROI

df_ahn2_result = AggregatePoints\
.setPolygons\
.addSummaryField\
.addSummaryField\
.rundf_ahn3_result = AggregatePoints\
.setPolygons\
.addSummaryField\
.addSummaryField\
.rundf_ahn4_result = AggregatePoints\
.setPolygons\
.addSummaryField\
.addSummaryField\
.runStep 3: Aggregate and Regress

As the GeoAnalytics function Geographically Weighted Regressioncan only work on point data, from the building polygons their centroid is extracted with the centroid function. The 3 dataframes are joined to one, see also the notebook, and it is ready to perform the GWR function. In this instance, it predicts the height for T3based on local regression functions.

# Import the required modules
from geoanalytics_fabric.tools import GWR

# Run the GWR tool to predict AHN4height values for buildings at Loppersum
resultGWR = GWR\
.setExplanatoryVariables\
.setDependentVariable\
.setLocalWeightingScheme\
.setNumNeighbors\
.runIncludeDiagnosticsThe model diagnostics can be consulted for the predicted z value, in this case, the following results were generated. Note, again, that these results cannot be used for real world applications as the data and methodology might not best fit the purpose of subsidence modelling — it merely shows here Fabric GeoAnalytics functionality.

R20.994AdjR20.981AICc1509Sigma20.046EDoF378

Step 4: Visualize results

With the spatial function plot, results can be visualized as maps within the notebook — to be used only with the Python API in Spark. First, a visualization of all buildings within the municipality of Loppersum.

# visualize Loppersum buildings
df_buildings.st.plotHere is a visualization of the height difference between T3and T3 predicted.

# Vizualize difference of predicted height and actual measured height Loppersum area and buildings

axes = df_loppersum.st.plot, alpha=0)
axes.set, ylim=)
df_buildings.st.plot#, color='xkcd:sea blue'
df_with_difference.st.plotSummary

This blog post discusses the significance of geographical data. It highlights the challenges posed by increasing data volumes on Geospatial data systems and suggests that traditional big data engines must adapt to handle geospatial data efficiently. Here, an example is presented on how to use the Microsoft Fabric Spark compute engine and its integration with the ESRI GeoAnalytics engine for effective geospatial big data processing and analytics.

Opinions here are mine.

Footnotes

# in preview

* for modelling the land subsidence with much higher accuracy and temporal frequency other approaches and data can be utilized, such as with satellite InSAR methodology+ Lastools is used here separately, it would be fun to test the usage of Fabric User data functions, or to utilize an Azure Function for this purpose.

& code snippets here are set up for readability, not necessarily for efficiency. Multiple data processing steps could be chained.

References

GitHub repo with notebooks: delange/Fabric_GeoAnalytics

Microsoft Fabric: Microsoft Fabric documentation – Microsoft Fabric | Microsoft Learn

ESRI GeoAnalytics for Fabric: Overview | ArcGIS GeoAnalytics for Microsoft Fabric | ArcGIS Developers

AHN: Home | AHN

BAG: Over BAG – Basisregistratie Adressen en Gebouwen – Kadaster.nl zakelijk

Lastools: LAStools: converting, filtering, viewing, processing, and compressing LIDAR data in LAS and LAZ format

Surface and Object Motion Map: Bodemdalingskaart –

The post The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated appeared first on Towards Data Science.
#geospatial #capabilities #microsoft #fabric #esri

The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated
The saying goes that 80% of data collected, stored and maintained by governments can be associated with geographical locations. Although never empirically proven, it illustrates the importance of location within data. Ever growing data volumes put constraints on systems that handle geospatial data. Common Big Data compute engines, originally designed to scale for textual data, need adaptation to work efficiently with geospatial data — think of geographical indexes, partitioning, and operators. Here, I present and illustrate how to utilize the Microsoft Fabric Spark compute engine, with the natively integrated ESRI GeoAnalytics engine# for geospatial big data processing and analytics. The optional GeoAnalytics capabilities within Fabric enable the processing and analytics of vector-type geospatial data, where vector-type geospatial data refers to points, lines, polygons. These capabilities include more than 150 spatial functions to create geometries, test, and select spatial relationships. As it extends Spark, the GeoAnalytics functions can be called when using Python, SQL, or Scala. These spatial operations apply automatically spatial indexing, making the Spark compute engine also efficient for this data. It can handle 10 extra common spatial data formats to load and save data spatial data, on top of the Spark natively supported data source formats. This blog post focuses on the scalable geospatial compute engines as has been introduced in my post about geospatial in the age of AI. Demonstration explained Here, I demonstrate some of these spatial capabilities by showing the data manipulation and analytics steps on a large dataset. By using several tiles covering point cloud data, an enormous dataset starts to form, while it still covers a relatively small area. The open Dutch AHN dataset, which is a national digital elevation and surface model, is currently in its fifth update cycle, and spans a period of nearly 30 years. Here, the data from the second, third, and forth acquisition is used, as these hold full national coverage, while the first version did not include a point cloud release. Another Dutch open dataset, namely building data, the BAG, is used to illustrate spatial selection. The building dataset contains the footprint of the buildings as polygons. Currently, this dataset holds more than 11 million buildings. To test the spatial functions, I use only 4 AHN tiles per AHN version. Thus in this case, 12 tiles, each of 5 x 6.25 km. Totalling to more than 3.5 billion points within an area of 125 square kilometers. The chosen area covers the municipality of Loppersum, an area prone to land subsidence due to gas extraction. The steps to take include the selection of buildings within the area of Loppersum, selecting the x,y,z-points from the roofs of the buildings. Then, we bring the 3 datasets into one dataframe and do an extra analysis with it. A spatial regression to predict the expected height of a building based on its height history as well as the history of the buildings in its direct surroundings. Not necessarily the best analysis to perform on this data to come to actual predictions* but it suits merely the purpose of demonstrating the spatial processing capabilities of Fabric’s ESRI GeoAnalytics. All the below code snippets are also available as notebooks on github. Step 1: Read data Spatial data can come in many different data formats; we conform to the geoparquet data format for further processing. The BAG building data, both the footprints as well as the accompanied municipality boundaries, come in geoparquet format already. The point cloud AHN data, version 2, 3 and 4, however, comes as LAZ file formats — a compressed industry standard format for point clouds. I have not found a Spark library to read LAZ, and created a txt file, separately, with the LAStools+ first. # ESRI - FABRIC reference: / # Import the required modules import geoanalytics_fabric from geoanalytics_fabric.sql import functions as ST from geoanalytics_fabric import extensions # Read ahn file from OneLake # AHN lidar data source: / ahn_csv_path = "Files/AHN lidar/AHN4_csv" lidar_df = spark.read.options.csvlidar_df = lidar_df.selectExprlidar_df.printSchemalidar_df.showlidar_df.countThe above code snippet& provides the below results: Now, with the spatial functions make_point and srid the x,y,z columns are transformed to a point geometry and set it to the specific Dutch coordinate system, see the below code snippet&: # Create point geometry from x,y,z columns and set the spatial refrence system lidar_df = lidar_df.select.alias) lidar_df = lidar_df.withColumn) lidar_df = lidar_df.select.alias)\ .withColumn) lidar_df.printSchemalidar_df.showBuilding and municipality data can be read with the extended spark.read function for geoparquet, see the code snippet&: # Read building polygon data path_building = "Files/BAG NL/BAG_pand_202504.parquet" df_buildings = spark.read.format.load# Read woonplaats datapath_woonplaats = "Files/BAG NL/BAG_woonplaats_202504.parquet" df_woonplaats = spark.read.format.load# Filter the DataFrame where the "woonplaats" column contains the string "Loppersum" df_loppersum = df_woonplaats.filter.contains) Step 2: Make selections In the accompanying notebooks, I read and write to geoparquet. To make sure the right data is read correctly as dataframes, see the following code snippet: # Read building polygon data path_building = "Files/BAG NL/BAG_pand_202504.parquet" df_buildings = spark.read.format.load# Read woonplaats datapath_woonplaats = "Files/BAG NL/BAG_woonplaats_202504.parquet" df_woonplaats = spark.read.format.load# Filter the DataFrame where the "woonplaats" column contains the string "Loppersum" df_loppersum = df_woonplaats.filter.contains) With all data in dataframes it becomes a simple step to do spatial selections. The following code snippet& shows how to select the buildings within the boundaries of the Loppersum municipality, and separately makes a selection of buildings that existed throughout the period. This resulted in 1196 buildings, out of the 2492 buildings currently. # Clip the BAG buildings to the gemeente Loppersum boundary df_buildings_roi = Clip.run# select only buildings older then AHN data= 2009) # and with a status in usedf_buildings_roi_select = df_buildings_roi.where&) The three AHN versions used, further named as T1, T2 and T3 respectively, are then clipped based on the selected building data. The AggregatePoints function can be utilized to calculate, in this case from the heightsome statistics, like the mean per roof, the standard deviation and the number of z-values it is based upon; see the code snippet: # Select and aggregrate lidar points from buildings within ROI df_ahn2_result = AggregatePoints\ .setPolygons\ .addSummaryField\ .addSummaryField\ .rundf_ahn3_result = AggregatePoints\ .setPolygons\ .addSummaryField\ .addSummaryField\ .rundf_ahn4_result = AggregatePoints\ .setPolygons\ .addSummaryField\ .addSummaryField\ .runStep 3: Aggregate and Regress As the GeoAnalytics function Geographically Weighted Regressioncan only work on point data, from the building polygons their centroid is extracted with the centroid function. The 3 dataframes are joined to one, see also the notebook, and it is ready to perform the GWR function. In this instance, it predicts the height for T3based on local regression functions. # Import the required modules from geoanalytics_fabric.tools import GWR # Run the GWR tool to predict AHN4height values for buildings at Loppersum resultGWR = GWR\ .setExplanatoryVariables\ .setDependentVariable\ .setLocalWeightingScheme\ .setNumNeighbors\ .runIncludeDiagnosticsThe model diagnostics can be consulted for the predicted z value, in this case, the following results were generated. Note, again, that these results cannot be used for real world applications as the data and methodology might not best fit the purpose of subsidence modelling — it merely shows here Fabric GeoAnalytics functionality. R20.994AdjR20.981AICc1509Sigma20.046EDoF378 Step 4: Visualize results With the spatial function plot, results can be visualized as maps within the notebook — to be used only with the Python API in Spark. First, a visualization of all buildings within the municipality of Loppersum. # visualize Loppersum buildings df_buildings.st.plotHere is a visualization of the height difference between T3and T3 predicted. # Vizualize difference of predicted height and actual measured height Loppersum area and buildings axes = df_loppersum.st.plot, alpha=0) axes.set, ylim=) df_buildings.st.plot#, color='xkcd:sea blue' df_with_difference.st.plotSummary This blog post discusses the significance of geographical data. It highlights the challenges posed by increasing data volumes on Geospatial data systems and suggests that traditional big data engines must adapt to handle geospatial data efficiently. Here, an example is presented on how to use the Microsoft Fabric Spark compute engine and its integration with the ESRI GeoAnalytics engine for effective geospatial big data processing and analytics. Opinions here are mine. Footnotes # in preview * for modelling the land subsidence with much higher accuracy and temporal frequency other approaches and data can be utilized, such as with satellite InSAR methodology+ Lastools is used here separately, it would be fun to test the usage of Fabric User data functions, or to utilize an Azure Function for this purpose. & code snippets here are set up for readability, not necessarily for efficiency. Multiple data processing steps could be chained. References GitHub repo with notebooks: delange/Fabric_GeoAnalytics Microsoft Fabric: Microsoft Fabric documentation – Microsoft Fabric | Microsoft Learn ESRI GeoAnalytics for Fabric: Overview | ArcGIS GeoAnalytics for Microsoft Fabric | ArcGIS Developers AHN: Home | AHN BAG: Over BAG – Basisregistratie Adressen en Gebouwen – Kadaster.nl zakelijk Lastools: LAStools: converting, filtering, viewing, processing, and compressing LIDAR data in LAS and LAZ format Surface and Object Motion Map: Bodemdalingskaart – The post The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated appeared first on Towards Data Science. #geospatial #capabilities #microsoft #fabric #esri

TOWARDSDATASCIENCE.COM

The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated

The saying goes that 80% of data collected, stored and maintained by governments can be associated with geographical locations. Although never empirically proven, it illustrates the importance of location within data. Ever growing data volumes put constraints on systems that handle geospatial data. Common Big Data compute engines, originally designed to scale for textual data, need adaptation to work efficiently with geospatial data — think of geographical indexes, partitioning, and operators. Here, I present and illustrate how to utilize the Microsoft Fabric Spark compute engine, with the natively integrated ESRI GeoAnalytics engine# for geospatial big data processing and analytics. The optional GeoAnalytics capabilities within Fabric enable the processing and analytics of vector-type geospatial data, where vector-type geospatial data refers to points, lines, polygons. These capabilities include more than 150 spatial functions to create geometries, test, and select spatial relationships. As it extends Spark, the GeoAnalytics functions can be called when using Python, SQL, or Scala. These spatial operations apply automatically spatial indexing, making the Spark compute engine also efficient for this data. It can handle 10 extra common spatial data formats to load and save data spatial data, on top of the Spark natively supported data source formats. This blog post focuses on the scalable geospatial compute engines as has been introduced in my post about geospatial in the age of AI. Demonstration explained Here, I demonstrate some of these spatial capabilities by showing the data manipulation and analytics steps on a large dataset. By using several tiles covering point cloud data (a bunch of x, y, z values), an enormous dataset starts to form, while it still covers a relatively small area. The open Dutch AHN dataset, which is a national digital elevation and surface model, is currently in its fifth update cycle, and spans a period of nearly 30 years. Here, the data from the second, third, and forth acquisition is used, as these hold full national coverage (the fifth just not yet), while the first version did not include a point cloud release (only the derivative gridded version). Another Dutch open dataset, namely building data, the BAG, is used to illustrate spatial selection. The building dataset contains the footprint of the buildings as polygons. Currently, this dataset holds more than 11 million buildings. To test the spatial functions, I use only 4 AHN tiles per AHN version. Thus in this case, 12 tiles, each of 5 x 6.25 km. Totalling to more than 3.5 billion points within an area of 125 square kilometers. The chosen area covers the municipality of Loppersum, an area prone to land subsidence due to gas extraction. The steps to take include the selection of buildings within the area of Loppersum, selecting the x,y,z-points from the roofs of the buildings. Then, we bring the 3 datasets into one dataframe and do an extra analysis with it. A spatial regression to predict the expected height of a building based on its height history as well as the history of the buildings in its direct surroundings. Not necessarily the best analysis to perform on this data to come to actual predictions* but it suits merely the purpose of demonstrating the spatial processing capabilities of Fabric’s ESRI GeoAnalytics. All the below code snippets are also available as notebooks on github. Step 1: Read data Spatial data can come in many different data formats; we conform to the geoparquet data format for further processing. The BAG building data, both the footprints as well as the accompanied municipality boundaries, come in geoparquet format already. The point cloud AHN data, version 2, 3 and 4, however, comes as LAZ file formats — a compressed industry standard format for point clouds. I have not found a Spark library to read LAZ (please leave a message in case there is one), and created a txt file, separately, with the LAStools+ first. # ESRI - FABRIC reference: https://developers.arcgis.com/geoanalytics-fabric/ # Import the required modules import geoanalytics_fabric from geoanalytics_fabric.sql import functions as ST from geoanalytics_fabric import extensions # Read ahn file from OneLake # AHN lidar data source: https://viewer.ahn.nl/ ahn_csv_path = "Files/AHN lidar/AHN4_csv" lidar_df = spark.read.options(delimiter=" ").csv(ahn_csv_path) lidar_df = lidar_df.selectExpr("_c0 as X", "_c1 as Y", "_c2 Z") lidar_df.printSchema() lidar_df.show(5) lidar_df.count() The above code snippet& provides the below results: Now, with the spatial functions make_point and srid the x,y,z columns are transformed to a point geometry and set it to the specific Dutch coordinate system (SRID = 28992), see the below code snippet&: # Create point geometry from x,y,z columns and set the spatial refrence system lidar_df = lidar_df.select(ST.make_point(x="X", y="Y", z="Z").alias("rd_point")) lidar_df = lidar_df.withColumn("srid", ST.srid("rd_point")) lidar_df = lidar_df.select(ST.srid("rd_point", 28992).alias("rd_point"))\ .withColumn("srid", ST.srid("rd_point")) lidar_df.printSchema() lidar_df.show(5) Building and municipality data can be read with the extended spark.read function for geoparquet, see the code snippet&: # Read building polygon data path_building = "Files/BAG NL/BAG_pand_202504.parquet" df_buildings = spark.read.format("geoparquet").load(path_building) # Read woonplaats data (=municipality) path_woonplaats = "Files/BAG NL/BAG_woonplaats_202504.parquet" df_woonplaats = spark.read.format("geoparquet").load(path_woonplaats) # Filter the DataFrame where the "woonplaats" column contains the string "Loppersum" df_loppersum = df_woonplaats.filter(col("woonplaats").contains("Loppersum")) Step 2: Make selections In the accompanying notebooks, I read and write to geoparquet. To make sure the right data is read correctly as dataframes, see the following code snippet: # Read building polygon data path_building = "Files/BAG NL/BAG_pand_202504.parquet" df_buildings = spark.read.format("geoparquet").load(path_building) # Read woonplaats data (=municipality) path_woonplaats = "Files/BAG NL/BAG_woonplaats_202504.parquet" df_woonplaats = spark.read.format("geoparquet").load(path_woonplaats) # Filter the DataFrame where the "woonplaats" column contains the string "Loppersum" df_loppersum = df_woonplaats.filter(col("woonplaats").contains("Loppersum")) With all data in dataframes it becomes a simple step to do spatial selections. The following code snippet& shows how to select the buildings within the boundaries of the Loppersum municipality, and separately makes a selection of buildings that existed throughout the period (point cloud AHN-2 data was acquired in 2009 in this region). This resulted in 1196 buildings, out of the 2492 buildings currently. # Clip the BAG buildings to the gemeente Loppersum boundary df_buildings_roi = Clip().run(input_dataframe=df_buildings, clip_dataframe=df_loppersum) # select only buildings older then AHN data (AHN2 (Groningen) = 2009) # and with a status in use (Pand in gebruik) df_buildings_roi_select = df_buildings_roi.where((df_buildings_roi.bouwjaar<2009) & (df_buildings_roi.status=='Pand in gebruik')) The three AHN versions used (2,3 and 4), further named as T1, T2 and T3 respectively, are then clipped based on the selected building data. The AggregatePoints function can be utilized to calculate, in this case from the height (z-values) some statistics, like the mean per roof, the standard deviation and the number of z-values it is based upon; see the code snippet: # Select and aggregrate lidar points from buildings within ROI df_ahn2_result = AggregatePoints() \ .setPolygons(df_buildings_roi_select) \ .addSummaryField(summary_field="T1_z", statistic="Mean", alias="T1_z_mean") \ .addSummaryField(summary_field="T1_z", statistic="stddev", alias="T1_z_stddev") \ .run(df_ahn2) df_ahn3_result = AggregatePoints() \ .setPolygons(df_buildings_roi_select) \ .addSummaryField(summary_field="T2_z", statistic="Mean", alias="T2_z_mean") \ .addSummaryField(summary_field="T2_z", statistic="stddev", alias="T2_z_stddev") \ .run(df_ahn3) df_ahn4_result = AggregatePoints() \ .setPolygons(df_buildings_roi_select) \ .addSummaryField(summary_field="T3_z", statistic="Mean", alias="T3_z_mean") \ .addSummaryField(summary_field="T3_z", statistic="stddev", alias="T3_z_stddev") \ .run(df_ahn4) Step 3: Aggregate and Regress As the GeoAnalytics function Geographically Weighted Regression (GWR) can only work on point data, from the building polygons their centroid is extracted with the centroid function. The 3 dataframes are joined to one, see also the notebook, and it is ready to perform the GWR function. In this instance, it predicts the height for T3 (AHN4) based on local regression functions. # Import the required modules from geoanalytics_fabric.tools import GWR # Run the GWR tool to predict AHN4 (T3) height values for buildings at Loppersum resultGWR = GWR() \ .setExplanatoryVariables("T1_z_mean", "T2_z_mean") \ .setDependentVariable(dependent_variable="T3_z_mean") \ .setLocalWeightingScheme(local_weighting_scheme="Bisquare") \ .setNumNeighbors(number_of_neighbors=10) \ .runIncludeDiagnostics(dataframe=df_buildingsT123_points) The model diagnostics can be consulted for the predicted z value, in this case, the following results were generated. Note, again, that these results cannot be used for real world applications as the data and methodology might not best fit the purpose of subsidence modelling — it merely shows here Fabric GeoAnalytics functionality. R20.994AdjR20.981AICc1509Sigma20.046EDoF378 Step 4: Visualize results With the spatial function plot, results can be visualized as maps within the notebook — to be used only with the Python API in Spark. First, a visualization of all buildings within the municipality of Loppersum. # visualize Loppersum buildings df_buildings.st.plot(basemap="light", geometry="geometry", edgecolor="black", alpha=0.5) Here is a visualization of the height difference between T3 (AHN4) and T3 predicted (T3 predicted minus T3). # Vizualize difference of predicted height and actual measured height Loppersum area and buildings axes = df_loppersum.st.plot(basemap="light", edgecolor="black", figsize=(7, 7), alpha=0) axes.set(xlim=(244800, 246500), ylim=(594000, 595500)) df_buildings.st.plot(ax=axes, basemap="light", alpha=0.5, edgecolor="black") #, color='xkcd:sea blue' df_with_difference.st.plot(ax=axes, basemap="light", cmap_values="subsidence_mm_per_yr", cmap="coolwarm_r", vmin=-10, vmax=10, geometry="geometry") Summary This blog post discusses the significance of geographical data. It highlights the challenges posed by increasing data volumes on Geospatial data systems and suggests that traditional big data engines must adapt to handle geospatial data efficiently. Here, an example is presented on how to use the Microsoft Fabric Spark compute engine and its integration with the ESRI GeoAnalytics engine for effective geospatial big data processing and analytics. Opinions here are mine. Footnotes # in preview * for modelling the land subsidence with much higher accuracy and temporal frequency other approaches and data can be utilized, such as with satellite InSAR methodology (see also Bodemdalingskaart) + Lastools is used here separately, it would be fun to test the usage of Fabric User data functions (preview), or to utilize an Azure Function for this purpose. & code snippets here are set up for readability, not necessarily for efficiency. Multiple data processing steps could be chained. References GitHub repo with notebooks: delange/Fabric_GeoAnalytics Microsoft Fabric: Microsoft Fabric documentation – Microsoft Fabric | Microsoft Learn ESRI GeoAnalytics for Fabric: Overview | ArcGIS GeoAnalytics for Microsoft Fabric | ArcGIS Developers AHN: Home | AHN BAG: Over BAG – Basisregistratie Adressen en Gebouwen – Kadaster.nl zakelijk Lastools: LAStools: converting, filtering, viewing, processing, and compressing LIDAR data in LAS and LAZ format Surface and Object Motion Map: Bodemdalingskaart – The post The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated appeared first on Towards Data Science.

·222 Views

Please log in to like, share and comment!
Gizmodo Compartió un vínculo

2025-05-14 19:37:48 ·

This Lenovo Laptop (15.6″, 32GB RAM, 1TB Storage) Is Now 73% Off, Amazon Is Selling It at a Loss

You’re looking for a reliable laptop at a reasonable price point? The Lenovo IdeaPad 1 is currently available on Amazon at a record low, and this is one of the best-sellers among its category right now. This 15″ laptop with 32GB RAM and 1TB storage boasts an impressive 4.8 out of 5 stars from customer reviews, and it has seen its price slashed by a huge 73%, now selling for just down from its original list price. If you’re searching for a great value for money laptop, that’s the one.
See Perfect for Home or Business
The Lenovo IdeaPad 1 is designed to serve home and business users. It is powered by an Intel Celeron N4500 dual-core processor with a speed of up to 2.8 GHz. While this processor is designed for standard use, it’s more than sufficient for web surfing, word processing, streaming media, and light gaming. The processor is also complemented with integrated Intel UHD Graphics to ensure smooth motion when watching videos.
It also has an impressive 32GB DDR4 RAM and multitasking will be very easy. You can open various apps and browser tabs without noticing the lag which is a big advantage for professionals and students. The 1TB PCIe SSD provides you with lightning-fast boot-ups and ample space for all your documents, files, photos, and business information.

The 15.6-inch Full HD anti-glare display screen is another great feature and it offers crisp text and vibrant colors. The thin configuration gives you ultimate screen real estate so you can stream your top shows, do work on papers or virtually attend classes with no compromise. Sound is one aspect that’s often sacrificed on budget notebooks, but IdeaPad 1 excels with its two Dolby Audio speakers. With the HD 720p camera featuring a privacy shutter, you can be sure and have full control over your online meeting or virtual gathering privacy.
You will also enjoy several ports on the laptop, including USB 2.0, USB 3.2 Gen 1, USB-C 3.2 Gen 1 for data transfer, HDMI 1.4b, a card reader, and a headphone / microphone combo jack. It will be easy connect to external monitors, storage devices and other peripherals.
What’s more, Windows 11 Pro is pre-installed and you’ll enjoy a modern, secure, and intuitive operating system perfect for work and play. The English non-backlit keyboard is comfortable to type on and the Cloud Grey color scheme gives it a fashionable, professional look.
Last thing: This deal is unlikely to last long, so if you’re looking for a new laptop, now is the perfect time to act.
See
#this #lenovo #laptop #32gb #ram

This Lenovo Laptop (15.6″, 32GB RAM, 1TB Storage) Is Now 73% Off, Amazon Is Selling It at a Loss
You’re looking for a reliable laptop at a reasonable price point? The Lenovo IdeaPad 1 is currently available on Amazon at a record low, and this is one of the best-sellers among its category right now. This 15″ laptop with 32GB RAM and 1TB storage boasts an impressive 4.8 out of 5 stars from customer reviews, and it has seen its price slashed by a huge 73%, now selling for just down from its original list price. If you’re searching for a great value for money laptop, that’s the one. See Perfect for Home or Business The Lenovo IdeaPad 1 is designed to serve home and business users. It is powered by an Intel Celeron N4500 dual-core processor with a speed of up to 2.8 GHz. While this processor is designed for standard use, it’s more than sufficient for web surfing, word processing, streaming media, and light gaming. The processor is also complemented with integrated Intel UHD Graphics to ensure smooth motion when watching videos. It also has an impressive 32GB DDR4 RAM and multitasking will be very easy. You can open various apps and browser tabs without noticing the lag which is a big advantage for professionals and students. The 1TB PCIe SSD provides you with lightning-fast boot-ups and ample space for all your documents, files, photos, and business information. The 15.6-inch Full HD anti-glare display screen is another great feature and it offers crisp text and vibrant colors. The thin configuration gives you ultimate screen real estate so you can stream your top shows, do work on papers or virtually attend classes with no compromise. Sound is one aspect that’s often sacrificed on budget notebooks, but IdeaPad 1 excels with its two Dolby Audio speakers. With the HD 720p camera featuring a privacy shutter, you can be sure and have full control over your online meeting or virtual gathering privacy. You will also enjoy several ports on the laptop, including USB 2.0, USB 3.2 Gen 1, USB-C 3.2 Gen 1 for data transfer, HDMI 1.4b, a card reader, and a headphone / microphone combo jack. It will be easy connect to external monitors, storage devices and other peripherals. What’s more, Windows 11 Pro is pre-installed and you’ll enjoy a modern, secure, and intuitive operating system perfect for work and play. The English non-backlit keyboard is comfortable to type on and the Cloud Grey color scheme gives it a fashionable, professional look. Last thing: This deal is unlikely to last long, so if you’re looking for a new laptop, now is the perfect time to act. See #this #lenovo #laptop #32gb #ram

GIZMODO.COM

This Lenovo Laptop (15.6″, 32GB RAM, 1TB Storage) Is Now 73% Off, Amazon Is Selling It at a Loss

You’re looking for a reliable laptop at a reasonable price point? The Lenovo IdeaPad 1 is currently available on Amazon at a record low, and this is one of the best-sellers among its category right now. This 15″ laptop with 32GB RAM and 1TB storage boasts an impressive 4.8 out of 5 stars from customer reviews, and it has seen its price slashed by a huge 73%, now selling for just $399, down from its original $1,499 list price. If you’re searching for a great value for money laptop, that’s the one. See at Amazon Perfect for Home or Business The Lenovo IdeaPad 1 is designed to serve home and business users. It is powered by an Intel Celeron N4500 dual-core processor with a speed of up to 2.8 GHz. While this processor is designed for standard use (and not heavy gaming or video editing), it’s more than sufficient for web surfing, word processing, streaming media, and light gaming. The processor is also complemented with integrated Intel UHD Graphics to ensure smooth motion when watching videos. It also has an impressive 32GB DDR4 RAM and multitasking will be very easy. You can open various apps and browser tabs without noticing the lag which is a big advantage for professionals and students. The 1TB PCIe SSD provides you with lightning-fast boot-ups and ample space for all your documents, files, photos, and business information. The 15.6-inch Full HD anti-glare display screen is another great feature and it offers crisp text and vibrant colors. The thin configuration gives you ultimate screen real estate so you can stream your top shows, do work on papers or virtually attend classes with no compromise. Sound is one aspect that’s often sacrificed on budget notebooks, but IdeaPad 1 excels with its two Dolby Audio speakers. With the HD 720p camera featuring a privacy shutter, you can be sure and have full control over your online meeting or virtual gathering privacy. You will also enjoy several ports on the laptop, including USB 2.0, USB 3.2 Gen 1, USB-C 3.2 Gen 1 for data transfer, HDMI 1.4b, a card reader, and a headphone / microphone combo jack. It will be easy connect to external monitors, storage devices and other peripherals. What’s more, Windows 11 Pro is pre-installed and you’ll enjoy a modern, secure, and intuitive operating system perfect for work and play. The English non-backlit keyboard is comfortable to type on and the Cloud Grey color scheme gives it a fashionable, professional look. Last thing: This deal is unlikely to last long, so if you’re looking for a new laptop, now is the perfect time to act. See at Amazon

·246 Views

Please log in to like, share and comment!
Towards AI Compartió un vínculo

2025-05-14 19:17:50 ·

Master the LLM Stack in 60+ hours — learn, code, ship, and certify

Author: Towards AI Editorial Team

Originally published on Towards AI.

Over the past two years, we’ve helped teams design and deploy real-world LLM systems — RAG pipelines, copilots that actually reduce load, PoCs that became products, and cut down hallucinations.
One year ago, we decided to put everything we knew about building architectures around LLMs — the stack, the mistakes, the gotchas, the strategies — into a single guide:
Building LLMs for Production — “the most comprehensive textbook to date on building LLM applications,” as Jerry Liuput it.
The response was amazing. People read it. People built with it. People shared it.
But a few months in, our DMs started filling up:

“Has the book been updated?”
“Does it cover the latest models like o3 or Gemini 2.5?”
“Can I useinstead?”
“How do I choose the right model for my use case?”
“What if I want to dothat isn’t in the book?”

Fair. The landscape’s shifting fast.
Inference got scaled. SLMs showed up. Context windows stretched. Costs dropped. Everything moved.
If AI has taught us anything, it’s to think AI-first — not just to keep up, but to build in ways that scale.
So instead of answering each DM, we took a step back. And we decided to build something that answers all of it, now and as things evolve.
The result?
From Beginner to Advanced LLM Developer
A 60+ hour, hands-on course that takes you from “I can prompt ChatGPT” → to deploying a production-grade RAG system with a real front end.

But we didn’t just pack it with knowledge — we designed it to evolve with the field.
Here’s what you walk away with:

A repeatable pipeline that adapts with tools, not trends
A deep instinct for how to think like an AI engineer
Lifetime access and weekly updates as the ecosystem changes
A private Slack for graduates + a 70,000+ builder community on Discord

Because staying current isn’t enough — you also need confidence that what you ship today still holds tomorrow.
That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date.
The next cohort kicks off June 1 with a live welcome call with our CEO.
Launch price:— zero risk thanks to a 30-day money-back guarantee.
Join the course here
The results speak for themselves:

“The course greatly expanded my knowledge of building and assessing RAG pipelines.” — Eoin McGrath
“Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L.
“From zero to hero as an LLM Developer — a clear path to build LLM applications that can change your career.” — Luca Tanieli

Even industry voices you know have shared support:

“A resource I’ll return to again and again, no matter how fast the AI landscape shifts.” — Tina Huang, Lonely Octopus

This course is for you:

You know Python but haven’t taken an LLM past the notebook.
If you’re frustrated by shallow tutorials and fragmented docs…
If you want to build things that work, not just read about them…
If you’re ready to take LLMs seriously and want a proven structure…

There’s a roadmap. And it’s working.
The next cohort starts June 1st. As soon as you join, you get full access to all course material — no need to wait for the live kickoff. You can start building right away.
What You’ll Learn:
LLM Basics & Prompt Mastery
Transformers, tokenization, and prompting that actually reduces hallucinations
Retrieval-Augmented GenerationChunking, embedding models, re-ranking, query rewriting, eval, and feedback loops
Fine-Tuning
LoRA, adapters, and domain-specific models that actually perform
Tool & API Integration
Function calling, external tools, and chained agent workflows
Deployment & Cost Control
Gradio, Streamlit, latency fixes, caching, logging, monitoring, cost tracking
Capstone Project & Certification
Build and ship your own LLM app — get feedback, and leave with a portfolio-ready build
If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full.
Secure your spot for the June 1st cohort
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI
#master #llm #stack #hours #learn

Master the LLM Stack in 60+ hours — learn, code, ship, and certify
Author: Towards AI Editorial Team Originally published on Towards AI. Over the past two years, we’ve helped teams design and deploy real-world LLM systems — RAG pipelines, copilots that actually reduce load, PoCs that became products, and cut down hallucinations. One year ago, we decided to put everything we knew about building architectures around LLMs — the stack, the mistakes, the gotchas, the strategies — into a single guide: Building LLMs for Production — “the most comprehensive textbook to date on building LLM applications,” as Jerry Liuput it. The response was amazing. People read it. People built with it. People shared it. But a few months in, our DMs started filling up: “Has the book been updated?” “Does it cover the latest models like o3 or Gemini 2.5?” “Can I useinstead?” “How do I choose the right model for my use case?” “What if I want to dothat isn’t in the book?” Fair. The landscape’s shifting fast. Inference got scaled. SLMs showed up. Context windows stretched. Costs dropped. Everything moved. If AI has taught us anything, it’s to think AI-first — not just to keep up, but to build in ways that scale. So instead of answering each DM, we took a step back. And we decided to build something that answers all of it, now and as things evolve. The result? From Beginner to Advanced LLM Developer A 60+ hour, hands-on course that takes you from “I can prompt ChatGPT” → to deploying a production-grade RAG system with a real front end. But we didn’t just pack it with knowledge — we designed it to evolve with the field. Here’s what you walk away with: ✅ A repeatable pipeline that adapts with tools, not trends ✅ A deep instinct for how to think like an AI engineer ✅ Lifetime access and weekly updates as the ecosystem changes ✅ A private Slack for graduates + a 70,000+ builder community on Discord Because staying current isn’t enough — you also need confidence that what you ship today still holds tomorrow. That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date. The next cohort kicks off June 1 with a live welcome call with our CEO. Launch price:— zero risk thanks to a 30-day money-back guarantee. 👉 Join the course here The results speak for themselves: “The course greatly expanded my knowledge of building and assessing RAG pipelines.” — Eoin McGrath “Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L. “From zero to hero as an LLM Developer — a clear path to build LLM applications that can change your career.” — Luca Tanieli Even industry voices you know have shared support: “A resource I’ll return to again and again, no matter how fast the AI landscape shifts.” — Tina Huang, Lonely Octopus This course is for you: You know Python but haven’t taken an LLM past the notebook. If you’re frustrated by shallow tutorials and fragmented docs… If you want to build things that work, not just read about them… If you’re ready to take LLMs seriously and want a proven structure… There’s a roadmap. And it’s working. The next cohort starts June 1st. As soon as you join, you get full access to all course material — no need to wait for the live kickoff. You can start building right away. What You’ll Learn: 🧠 LLM Basics & Prompt Mastery Transformers, tokenization, and prompting that actually reduces hallucinations 🔍 Retrieval-Augmented GenerationChunking, embedding models, re-ranking, query rewriting, eval, and feedback loops 🎨 Fine-Tuning LoRA, adapters, and domain-specific models that actually perform 🤖 Tool & API Integration Function calling, external tools, and chained agent workflows 🚀 Deployment & Cost Control Gradio, Streamlit, latency fixes, caching, logging, monitoring, cost tracking 🏆 Capstone Project & Certification Build and ship your own LLM app — get feedback, and leave with a portfolio-ready build If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full. 👉 Secure your spot for the June 1st cohort Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #master #llm #stack #hours #learn

TOWARDSAI.NET

Master the LLM Stack in 60+ hours — learn, code, ship, and certify

Author(s): Towards AI Editorial Team Originally published on Towards AI. Over the past two years, we’ve helped teams design and deploy real-world LLM systems — RAG pipelines, copilots that actually reduce load, PoCs that became products, and cut down hallucinations. One year ago, we decided to put everything we knew about building architectures around LLMs — the stack, the mistakes, the gotchas, the strategies — into a single guide: Building LLMs for Production — “the most comprehensive textbook to date on building LLM applications,” as Jerry Liu (Founder & CEO, LlamaIndex) put it. The response was amazing. People read it. People built with it. People shared it. But a few months in, our DMs started filling up: “Has the book been updated?” “Does it cover the latest models like o3 or Gemini 2.5?” “Can I use [X vector DB] instead?” “How do I choose the right model for my use case?” “What if I want to do [insert new approach] that isn’t in the book?” Fair. The landscape’s shifting fast. Inference got scaled. SLMs showed up. Context windows stretched. Costs dropped. Everything moved. If AI has taught us anything, it’s to think AI-first — not just to keep up, but to build in ways that scale. So instead of answering each DM (which we still do), we took a step back. And we decided to build something that answers all of it, now and as things evolve. The result? From Beginner to Advanced LLM Developer A 60+ hour, hands-on course that takes you from “I can prompt ChatGPT” → to deploying a production-grade RAG system with a real front end. But we didn’t just pack it with knowledge — we designed it to evolve with the field. Here’s what you walk away with: ✅ A repeatable pipeline that adapts with tools, not trends ✅ A deep instinct for how to think like an AI engineer ✅ Lifetime access and weekly updates as the ecosystem changes ✅ A private Slack for graduates + a 70,000+ builder community on Discord Because staying current isn’t enough — you also need confidence that what you ship today still holds tomorrow. That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date. The next cohort kicks off June 1 with a live welcome call with our CEO. Launch price: $249 (75 % off) — zero risk thanks to a 30-day money-back guarantee. 👉 Join the course here The results speak for themselves: “The course greatly expanded my knowledge of building and assessing RAG pipelines.” — Eoin McGrath “Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L. “From zero to hero as an LLM Developer — a clear path to build LLM applications that can change your career.” — Luca Tanieli Even industry voices you know have shared support: “A resource I’ll return to again and again, no matter how fast the AI landscape shifts.” — Tina Huang, Lonely Octopus This course is for you: You know Python but haven’t taken an LLM past the notebook. If you’re frustrated by shallow tutorials and fragmented docs… If you want to build things that work, not just read about them… If you’re ready to take LLMs seriously and want a proven structure… There’s a roadmap. And it’s working. The next cohort starts June 1st. As soon as you join, you get full access to all course material — no need to wait for the live kickoff. You can start building right away. What You’ll Learn: 🧠 LLM Basics & Prompt Mastery Transformers, tokenization, and prompting that actually reduces hallucinations 🔍 Retrieval-Augmented Generation (RAG) Chunking, embedding models, re-ranking, query rewriting, eval, and feedback loops 🎨 Fine-Tuning LoRA, adapters, and domain-specific models that actually perform 🤖 Tool & API Integration Function calling, external tools, and chained agent workflows 🚀 Deployment & Cost Control Gradio, Streamlit, latency fixes, caching, logging, monitoring, cost tracking 🏆 Capstone Project & Certification Build and ship your own LLM app — get feedback, and leave with a portfolio-ready build If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full. 👉 Secure your spot for the June 1st cohort Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI

·261 Views

Please log in to like, share and comment!
Engadget Compartió un vínculo

2025-05-14 10:09:46 ·

Google I/O 2025: What to expect including Gemini AI, Android 16 updates, Android XR and more
In just a week, Google's annual developer conference will kick off on May 20.
The event is probably the most important on the company's calendar, offering a glimpse at everything it has been working on over the past year.
From the rumors and even information Google has trickled out, I/O 2025 should be one of the more exciting tech keynotes in recent memory.
Plus, for the first time, Google has spun out a dedicated Android showcase planned a whole week earlier.
That just happened today (May 13), and you can check out everything that was announced at the Android Show or go to our liveblog to get a feel for how things played out.
Now that the Android Show is over, it's time to look ahead to I/O, where the focus will almost definitely be about AI.
We've gathered the most credible reports and leaks to put together this roundup of what to expect, and though most of the Android-related announcements have been made, it's still possible that Google shares more details about its mobile platform next week.
If you'd like to tune in from home and follow along as Google makes its announcements, check out our article on how to watch the Google I/O 2025 keynote.
We'll also be liveblogging the event, so you can just come to Engadget for the breaking news.
Android 16
Some of my favorite I/O moments involved watching Dave Burke take to the Shoreline stage to talk about the latest updates for Android.
But for the past couple of years, Android hasn't had much of a spotlight at Google's annual developer conference.
That's about to change, with the company's dedicated showcase during today's Android Show: I/O Edition.
Get a front row seat to The Android Show: I/O Edition May 13, 10 AM PT.
Meet our team and learn about new experiences coming to Android.
Set a reminder and be the first in the know → https://t.co/z8QLNSYkl6" style="color: #0066cc;">https://t.co/z8QLNSYkl6 #TheAndroidShow pic.twitter.com/RTzTOwUtFN— Android (@Android) April 28, 2025

The presentation featured Android Ecosystem President Sameer Samat, who took over for Burke in 2024.
We saw Samat and his colleagues show off the new Material 3 Expressive design, and what we learned confirmed some of the features that were previously leaked, like the "Ongoing notifications" bar.
Material 3 Expressive is also coming to Wear OS 6, and the company is expanding the reach of Gemini by bringing it to its smartwatch platform, Android Auto and Google TV.
Android 16 will also come with new scam-detection features and a refined Find Hub that will see support for satellite connectivity later in the year.
Speaking of timing, Google has already confirmed the new operating system will arrive sometime before the second half of the year.
Though it did not release a stable build of Android 16 today, Samat shared during the show that Android 16 (or at least part of it) is coming next month to Pixel devices.
And though the company did cover some new features coming to Android XR, senior director for Android Product and UX Guemmy Kim said during the presentation that "we'll share more on Android XR at I/O next week."
It clearly seems like more is still to come, and not just for Android XR.
We didn't get confirmation on the Android Authority report that Google could add a more robust photo picker, with support for cloud storage solutions.
That doesn't mean it won't be in Android 16, it might just be something the company didn't get to mention in its 30-minute showcase.
Plus, Google has been releasing new Android features in a quarterly cadence lately, rather than wait till an annual update window to make updates available.
It's possible we see more added to Android 16 as the year progresses.
One of the best places to get an idea for what's to come in Android 16 is in its beta version, which has already been available to developers and is currently in its fourth iteration.
For example, we learned in March that Android 16 will bring Auracast support, which could make it easier to listen to and switch between multiple Bluetooth devices.
This could also enable people to receive Bluetooth audio on hearing aids they have paired with their phones or tablets.
Android XR
Remember Google Glass? No? How about Daydream? Maybe Cardboard? After sending (at least) three XR projects to the graveyard, you would think even Google would say enough is enough.
Instead, the company is preparing to release Android XR after previewing the platform at the end of last year.
This time around, the company says the power of its Gemini AI models will make things different.
We know Google is working with Samsung on a headset codenamed Project Moohan.
Last fall, Samsung hinted that the device could arrive sometime this year.
Whether Google and Samsung demo Project Moohan at I/O, I imagine the search giant will have more to say about Android XR and the ecosystem partners it has worked to bring to its side for the initiative.
This falls in line with what Kim said about more on Android XR being shared at I/O.
AI, AI and more AI
If Google felt the need to split off Android into its own showcase, we're likely to get more AI-related announcements at I/O than ever before.
The company hasn't provided many hints about what we can expect on that front, but if I had to guess, features like AI Overviews and AI Mode are likely to get substantive updates.
I suspect Google will also have something to say about Project Mariner, the web-surfing agent it demoed at I/O 2024.
Either way, Google is an AI company now, and every I/O moving forward will reflect that.
Project Astra
Speaking of AI, Project Astra was one of the more impressive demos Google showed off at I/O 2024.
The technology made the most of the latest multi-modal capabilities of Google's Gemini models to offer something we hadn't seen before from the company.
It's a voice assistant with advanced image recognition features that allows it to converse about the things it sees.
Google envisions Project Astra one day providing a truly useful artificial assistant.
However, after seeing an in-person demo of Astra, the Engadget crew felt the tech needed a lot more work.
Given the splash Project Astra made last year, there's a good chance we could get an update on it at I/O 2025.
A Pinterest competitor
According to a report from The Information, Google might be planning to unveil its own take on Pinterest next week.
That characterization is courtesy ofThe Information, but based on the features described in the article, Engadget team members found it more reminiscent of Cosmos instead.
Cosmos is a pared-down version of Pinterest, letting people save and curate anything they see on the internet.
It also allows you to share your saved pages with others.
Google's version, meanwhile, will reportedly show image results based on your queries, and you can save the pictures in different folders based on your own preferences.
So say you're putting together a lookbook based on Jennie from Blackpink.
You can search for her outfits and save your favorites in a folder you can title "Lewks," perhaps.
Whether this is simply built into Search or exists as a standalone product is unclear, and we'll have to wait till I/O to see whether the report was accurate and what the feature really is like.
Wear OS
Last year, Wear OS didn't get a mention during the company's main keynote, but Google did preview Wear OS 5 during the developer sessions that followed.
The company only began rolling out Wear OS 5.1 to Pixel devices in March.
This year, we've already learned at the Android Show that Wear OS 6 is coming, with Material 3 Expressive gracing its interface.
Will we learn more at I/O next week? It's unclear, but it wouldn't be a shock if that was all the air time Wear OS gets this year.
NotebookLM
Since 2023, Google has offered NotebookLM on desktop.
The note-taking app uses machine learning for features like automated summaries.
Based on App Store and Google Play listings, the company is getting ready to release a mobile version of the service on the first day of I/O 2025.
Everything else
Google has a terrible track record when it comes to preventing leaks within its internal ranks, so the likelihood the company could surprise us is low.
Still, Google could announce something we don't expect.
As always, your best bet is to visit Engadget on May 20 and 21.
We'll have all the latest from Google then along with our liveblog and analysis.
Update, May 5 2025, 7:08PM ET: This story has been updated to include details on a leaked blog post discussing "Material 3 Expressive."
Update, May 6 2025, 5:29PM ET: This story has been updated to include details on the Android 16 beta, as well as Auracast support.
Update, May 8 2025, 3:20PM ET: This story has been updated to include details on how to watch the Android Show and the Google I/O keynote, as well as tweak the intro for freshness.
Update, May 13 2025, 3:22PM ET: This story has been updated to include all the announcements from the Android Show and a new report from The Information about a possible image search feature debuting at I/O.
The intro was also edited to accurately reflect what has happened since the last time this article was updated.This article originally appeared on Engadget at https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss" style="color: #0066cc;">https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss
Source: https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss">https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss">https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss
#google #what #expect #including #gemini #android #updates #and #more

Google I/O 2025: What to expect including Gemini AI, Android 16 updates, Android XR and more
In just a week, Google's annual developer conference will kick off on May 20. The event is probably the most important on the company's calendar, offering a glimpse at everything it has been working on over the past year. From the rumors and even information Google has trickled out, I/O 2025 should be one of the more exciting tech keynotes in recent memory. Plus, for the first time, Google has spun out a dedicated Android showcase planned a whole week earlier. That just happened today (May 13), and you can check out everything that was announced at the Android Show or go to our liveblog to get a feel for how things played out. Now that the Android Show is over, it's time to look ahead to I/O, where the focus will almost definitely be about AI. We've gathered the most credible reports and leaks to put together this roundup of what to expect, and though most of the Android-related announcements have been made, it's still possible that Google shares more details about its mobile platform next week. If you'd like to tune in from home and follow along as Google makes its announcements, check out our article on how to watch the Google I/O 2025 keynote. We'll also be liveblogging the event, so you can just come to Engadget for the breaking news. Android 16 Some of my favorite I/O moments involved watching Dave Burke take to the Shoreline stage to talk about the latest updates for Android. But for the past couple of years, Android hasn't had much of a spotlight at Google's annual developer conference. That's about to change, with the company's dedicated showcase during today's Android Show: I/O Edition. Get a front row seat to The Android Show: I/O Edition 🍿 May 13, 10 AM PT. Meet our team and learn about new experiences coming to Android. Set a reminder and be the first in the know → https://t.co/z8QLNSYkl6 #TheAndroidShow pic.twitter.com/RTzTOwUtFN— Android (@Android) April 28, 2025 The presentation featured Android Ecosystem President Sameer Samat, who took over for Burke in 2024. We saw Samat and his colleagues show off the new Material 3 Expressive design, and what we learned confirmed some of the features that were previously leaked, like the "Ongoing notifications" bar. Material 3 Expressive is also coming to Wear OS 6, and the company is expanding the reach of Gemini by bringing it to its smartwatch platform, Android Auto and Google TV. Android 16 will also come with new scam-detection features and a refined Find Hub that will see support for satellite connectivity later in the year. Speaking of timing, Google has already confirmed the new operating system will arrive sometime before the second half of the year. Though it did not release a stable build of Android 16 today, Samat shared during the show that Android 16 (or at least part of it) is coming next month to Pixel devices. And though the company did cover some new features coming to Android XR, senior director for Android Product and UX Guemmy Kim said during the presentation that "we'll share more on Android XR at I/O next week." It clearly seems like more is still to come, and not just for Android XR. We didn't get confirmation on the Android Authority report that Google could add a more robust photo picker, with support for cloud storage solutions. That doesn't mean it won't be in Android 16, it might just be something the company didn't get to mention in its 30-minute showcase. Plus, Google has been releasing new Android features in a quarterly cadence lately, rather than wait till an annual update window to make updates available. It's possible we see more added to Android 16 as the year progresses. One of the best places to get an idea for what's to come in Android 16 is in its beta version, which has already been available to developers and is currently in its fourth iteration. For example, we learned in March that Android 16 will bring Auracast support, which could make it easier to listen to and switch between multiple Bluetooth devices. This could also enable people to receive Bluetooth audio on hearing aids they have paired with their phones or tablets. Android XR Remember Google Glass? No? How about Daydream? Maybe Cardboard? After sending (at least) three XR projects to the graveyard, you would think even Google would say enough is enough. Instead, the company is preparing to release Android XR after previewing the platform at the end of last year. This time around, the company says the power of its Gemini AI models will make things different. We know Google is working with Samsung on a headset codenamed Project Moohan. Last fall, Samsung hinted that the device could arrive sometime this year. Whether Google and Samsung demo Project Moohan at I/O, I imagine the search giant will have more to say about Android XR and the ecosystem partners it has worked to bring to its side for the initiative. This falls in line with what Kim said about more on Android XR being shared at I/O. AI, AI and more AI If Google felt the need to split off Android into its own showcase, we're likely to get more AI-related announcements at I/O than ever before. The company hasn't provided many hints about what we can expect on that front, but if I had to guess, features like AI Overviews and AI Mode are likely to get substantive updates. I suspect Google will also have something to say about Project Mariner, the web-surfing agent it demoed at I/O 2024. Either way, Google is an AI company now, and every I/O moving forward will reflect that. Project Astra Speaking of AI, Project Astra was one of the more impressive demos Google showed off at I/O 2024. The technology made the most of the latest multi-modal capabilities of Google's Gemini models to offer something we hadn't seen before from the company. It's a voice assistant with advanced image recognition features that allows it to converse about the things it sees. Google envisions Project Astra one day providing a truly useful artificial assistant. However, after seeing an in-person demo of Astra, the Engadget crew felt the tech needed a lot more work. Given the splash Project Astra made last year, there's a good chance we could get an update on it at I/O 2025. A Pinterest competitor According to a report from The Information, Google might be planning to unveil its own take on Pinterest next week. That characterization is courtesy ofThe Information, but based on the features described in the article, Engadget team members found it more reminiscent of Cosmos instead. Cosmos is a pared-down version of Pinterest, letting people save and curate anything they see on the internet. It also allows you to share your saved pages with others. Google's version, meanwhile, will reportedly show image results based on your queries, and you can save the pictures in different folders based on your own preferences. So say you're putting together a lookbook based on Jennie from Blackpink. You can search for her outfits and save your favorites in a folder you can title "Lewks," perhaps. Whether this is simply built into Search or exists as a standalone product is unclear, and we'll have to wait till I/O to see whether the report was accurate and what the feature really is like. Wear OS Last year, Wear OS didn't get a mention during the company's main keynote, but Google did preview Wear OS 5 during the developer sessions that followed. The company only began rolling out Wear OS 5.1 to Pixel devices in March. This year, we've already learned at the Android Show that Wear OS 6 is coming, with Material 3 Expressive gracing its interface. Will we learn more at I/O next week? It's unclear, but it wouldn't be a shock if that was all the air time Wear OS gets this year. NotebookLM Since 2023, Google has offered NotebookLM on desktop. The note-taking app uses machine learning for features like automated summaries. Based on App Store and Google Play listings, the company is getting ready to release a mobile version of the service on the first day of I/O 2025. Everything else Google has a terrible track record when it comes to preventing leaks within its internal ranks, so the likelihood the company could surprise us is low. Still, Google could announce something we don't expect. As always, your best bet is to visit Engadget on May 20 and 21. We'll have all the latest from Google then along with our liveblog and analysis. Update, May 5 2025, 7:08PM ET: This story has been updated to include details on a leaked blog post discussing "Material 3 Expressive." Update, May 6 2025, 5:29PM ET: This story has been updated to include details on the Android 16 beta, as well as Auracast support. Update, May 8 2025, 3:20PM ET: This story has been updated to include details on how to watch the Android Show and the Google I/O keynote, as well as tweak the intro for freshness. Update, May 13 2025, 3:22PM ET: This story has been updated to include all the announcements from the Android Show and a new report from The Information about a possible image search feature debuting at I/O. The intro was also edited to accurately reflect what has happened since the last time this article was updated.This article originally appeared on Engadget at https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss Source: https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss #google #what #expect #including #gemini #android #updates #and #more

WWW.ENGADGET.COM

Google I/O 2025: What to expect including Gemini AI, Android 16 updates, Android XR and more

In just a week, Google's annual developer conference will kick off on May 20. The event is probably the most important on the company's calendar, offering a glimpse at everything it has been working on over the past year. From the rumors and even information Google has trickled out, I/O 2025 should be one of the more exciting tech keynotes in recent memory. Plus, for the first time, Google has spun out a dedicated Android showcase planned a whole week earlier. That just happened today (May 13), and you can check out everything that was announced at the Android Show or go to our liveblog to get a feel for how things played out. Now that the Android Show is over, it's time to look ahead to I/O, where the focus will almost definitely be about AI. We've gathered the most credible reports and leaks to put together this roundup of what to expect, and though most of the Android-related announcements have been made, it's still possible that Google shares more details about its mobile platform next week. If you'd like to tune in from home and follow along as Google makes its announcements, check out our article on how to watch the Google I/O 2025 keynote. We'll also be liveblogging the event, so you can just come to Engadget for the breaking news. Android 16 Some of my favorite I/O moments involved watching Dave Burke take to the Shoreline stage to talk about the latest updates for Android. But for the past couple of years, Android hasn't had much of a spotlight at Google's annual developer conference. That's about to change, with the company's dedicated showcase during today's Android Show: I/O Edition. Get a front row seat to The Android Show: I/O Edition 🍿 May 13, 10 AM PT. Meet our team and learn about new experiences coming to Android. Set a reminder and be the first in the know → https://t.co/z8QLNSYkl6 #TheAndroidShow pic.twitter.com/RTzTOwUtFN— Android (@Android) April 28, 2025 The presentation featured Android Ecosystem President Sameer Samat, who took over for Burke in 2024. We saw Samat and his colleagues show off the new Material 3 Expressive design, and what we learned confirmed some of the features that were previously leaked, like the "Ongoing notifications" bar. Material 3 Expressive is also coming to Wear OS 6, and the company is expanding the reach of Gemini by bringing it to its smartwatch platform, Android Auto and Google TV. Android 16 will also come with new scam-detection features and a refined Find Hub that will see support for satellite connectivity later in the year. Speaking of timing, Google has already confirmed the new operating system will arrive sometime before the second half of the year. Though it did not release a stable build of Android 16 today, Samat shared during the show that Android 16 (or at least part of it) is coming next month to Pixel devices. And though the company did cover some new features coming to Android XR, senior director for Android Product and UX Guemmy Kim said during the presentation that "we'll share more on Android XR at I/O next week." It clearly seems like more is still to come, and not just for Android XR. We didn't get confirmation on the Android Authority report that Google could add a more robust photo picker, with support for cloud storage solutions. That doesn't mean it won't be in Android 16, it might just be something the company didn't get to mention in its 30-minute showcase. Plus, Google has been releasing new Android features in a quarterly cadence lately, rather than wait till an annual update window to make updates available. It's possible we see more added to Android 16 as the year progresses. One of the best places to get an idea for what's to come in Android 16 is in its beta version, which has already been available to developers and is currently in its fourth iteration. For example, we learned in March that Android 16 will bring Auracast support, which could make it easier to listen to and switch between multiple Bluetooth devices. This could also enable people to receive Bluetooth audio on hearing aids they have paired with their phones or tablets. Android XR Remember Google Glass? No? How about Daydream? Maybe Cardboard? After sending (at least) three XR projects to the graveyard, you would think even Google would say enough is enough. Instead, the company is preparing to release Android XR after previewing the platform at the end of last year. This time around, the company says the power of its Gemini AI models will make things different. We know Google is working with Samsung on a headset codenamed Project Moohan. Last fall, Samsung hinted that the device could arrive sometime this year. Whether Google and Samsung demo Project Moohan at I/O, I imagine the search giant will have more to say about Android XR and the ecosystem partners it has worked to bring to its side for the initiative. This falls in line with what Kim said about more on Android XR being shared at I/O. AI, AI and more AI If Google felt the need to split off Android into its own showcase, we're likely to get more AI-related announcements at I/O than ever before. The company hasn't provided many hints about what we can expect on that front, but if I had to guess, features like AI Overviews and AI Mode are likely to get substantive updates. I suspect Google will also have something to say about Project Mariner, the web-surfing agent it demoed at I/O 2024. Either way, Google is an AI company now, and every I/O moving forward will reflect that. Project Astra Speaking of AI, Project Astra was one of the more impressive demos Google showed off at I/O 2024. The technology made the most of the latest multi-modal capabilities of Google's Gemini models to offer something we hadn't seen before from the company. It's a voice assistant with advanced image recognition features that allows it to converse about the things it sees. Google envisions Project Astra one day providing a truly useful artificial assistant. However, after seeing an in-person demo of Astra, the Engadget crew felt the tech needed a lot more work. Given the splash Project Astra made last year, there's a good chance we could get an update on it at I/O 2025. A Pinterest competitor According to a report from The Information, Google might be planning to unveil its own take on Pinterest next week. That characterization is courtesy ofThe Information, but based on the features described in the article, Engadget team members found it more reminiscent of Cosmos instead. Cosmos is a pared-down version of Pinterest, letting people save and curate anything they see on the internet. It also allows you to share your saved pages with others. Google's version, meanwhile, will reportedly show image results based on your queries, and you can save the pictures in different folders based on your own preferences. So say you're putting together a lookbook based on Jennie from Blackpink. You can search for her outfits and save your favorites in a folder you can title "Lewks," perhaps. Whether this is simply built into Search or exists as a standalone product is unclear, and we'll have to wait till I/O to see whether the report was accurate and what the feature really is like. Wear OS Last year, Wear OS didn't get a mention during the company's main keynote, but Google did preview Wear OS 5 during the developer sessions that followed. The company only began rolling out Wear OS 5.1 to Pixel devices in March. This year, we've already learned at the Android Show that Wear OS 6 is coming, with Material 3 Expressive gracing its interface. Will we learn more at I/O next week? It's unclear, but it wouldn't be a shock if that was all the air time Wear OS gets this year. NotebookLM Since 2023, Google has offered NotebookLM on desktop. The note-taking app uses machine learning for features like automated summaries. Based on App Store and Google Play listings, the company is getting ready to release a mobile version of the service on the first day of I/O 2025. Everything else Google has a terrible track record when it comes to preventing leaks within its internal ranks, so the likelihood the company could surprise us is low. Still, Google could announce something we don't expect. As always, your best bet is to visit Engadget on May 20 and 21. We'll have all the latest from Google then along with our liveblog and analysis. Update, May 5 2025, 7:08PM ET: This story has been updated to include details on a leaked blog post discussing "Material 3 Expressive." Update, May 6 2025, 5:29PM ET: This story has been updated to include details on the Android 16 beta, as well as Auracast support. Update, May 8 2025, 3:20PM ET: This story has been updated to include details on how to watch the Android Show and the Google I/O keynote, as well as tweak the intro for freshness. Update, May 13 2025, 3:22PM ET: This story has been updated to include all the announcements from the Android Show and a new report from The Information about a possible image search feature debuting at I/O. The intro was also edited to accurately reflect what has happened since the last time this article was updated.This article originally appeared on Engadget at https://www.engadget.com/ai/google-io-2025-what-to-expect-including-gemini-ai-android-16-updates-android-xr-and-more-203044563.html?src=rss

·278 Views

Please log in to like, share and comment!
Marktechpost AI Compartió un vínculo

2025-05-14 07:32:30 ·

A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain

In this tutorial, we lean hard on Together AI’s growing ecosystem to show how quickly we can turn unstructured text into a question-answering service that cites its sources.
We’ll scrape a handful of live web pages, slice them into coherent chunks, and feed those chunks to the togethercomputer/m2-bert-80M-8k-retrieval embedding model.
Those vectors land in a FAISS index for millisecond similarity search, after which a lightweight ChatTogether model drafts answers that stay grounded in the retrieved passages.
Because Together AI handles embeddings and chat behind a single API key, we avoid juggling multiple providers, quotas, or SDK dialects.
!pip -q install --upgrade langchain-core langchain-community langchain-together
faiss-cpu tiktoken beautifulsoup4 html2text
This quiet (-q) pip command upgrades and installs everything the Colab RAG needs.
It pulls core LangChain libraries plus the Together AI integration, FAISS for vector search, token-handling with tiktoken, and lightweight HTML parsing via beautifulsoup4 and html2text, ensuring the notebook runs end-to-end without additional setup.
import os, getpass, warnings, textwrap, json
if "TOGETHER_API_KEY" not in os.environ:
os.environ["TOGETHER_API_KEY"] = getpass.getpass(" Enter your Together API key: ")
We check whether the TOGETHER_API_KEY environment variable is already set; if not, it securely prompts us for the key with getpass and stores it in os.environ.
The rest of the notebook can call Together AI’s API without hard‑coding secrets or exposing them in plain text by capturing the credentials once per runtime.
from langchain_community.document_loaders import WebBaseLoader
URLS = [
"https://python.langchain.com/docs/integrations/text_embedding/together/"," style="color: #0066cc;">https://python.langchain.com/docs/integrations/text_embedding/together/",
"https://api.together.xyz/"," style="color: #0066cc;">https://api.together.xyz/",
"https://together.ai/blog"" style="color: #0066cc;">https://together.ai/blog"
]
raw_docs = WebBaseLoader(URLS).load()
WebBaseLoader fetches each URL, strips boilerplate, and returns LangChain Document objects containing the clean page text plus metadata.
By passing a list of Together-related links, we immediately collect live documentation and blog content that will later be chunked and embedded for semantic search.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
docs = splitter.split_documents(raw_docs)
print(f"Loaded {len(raw_docs)} pages → {len(docs)} chunks after splitting.")
RecursiveCharacterTextSplitter slices every fetched page into ~800-character segments with a 100-character overlap so contextual clues aren’t lost at chunk boundaries.
The resulting list docs holds these bite-sized LangChain Document objects, and the printout shows how many chunks were produced from the original pages, essential prep for high-quality embedding.
from langchain_together.embeddings import TogetherEmbeddings
embeddings = TogetherEmbeddings(
model="togethercomputer/m2-bert-80M-8k-retrieval"
)
from langchain_community.vectorstores import FAISS
vector_store = FAISS.from_documents(docs, embeddings)
Here we instantiate Together AI’s 80 M-parameter m2-bert retrieval model as a drop-in LangChain embedder, then feed every text chunk into it while FAISS.from_documents builds an in-memory vector index.
The resulting vector store supports millisecond-level cosine searches, turning our scraped pages into a searchable semantic database.
from langchain_together.chat_models import ChatTogether
llm = ChatTogether(
model="mistralai/Mistral-7B-Instruct-v0.3",
temperature=0.2,
max_tokens=512,
)
ChatTogether wraps a chat-tuned model hosted on Together AI, Mistral-7B-Instruct-v0.3 to be used like any other LangChain LLM.
A low temperature of 0.2 keeps answers grounded and repeatable, while max_tokens=512 leaves room for detailed, multi-paragraph responses without runaway cost.
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vector_store.as_retriever(search_kwargs={"k": 4}),
return_source_documents=True,
)
RetrievalQA stitches the pieces together: it takes our FAISS retriever (returning the top 4 similar chunks) and feeds those snippets into the llm using the simple “stuff” prompt template.
Setting return_source_documents=True means each answer will return with the exact passages it relied on, giving us instant, citation-ready Q-and-A.
QUESTION = "How do I use TogetherEmbeddings inside LangChain, and what model name should I pass?"
result = qa_chain(QUESTION)
print("n Answer:n", textwrap.fill(result['result'], 100))
print("n Sources:")
for doc in result['source_documents']:
print(" •", doc.metadata['source'])
Finally, we send a natural-language query through the qa_chain, which retrieves the four most relevant chunks, feeds them to the ChatTogether model, and returns a concise answer.
It then prints the formatted response, followed by a list of source URLs, giving us both the synthesized explanation and transparent citations in one shot.
Output from the Final Cell
In conclusion, in roughly fifty lines of code, we built a complete RAG loop powered end-to-end by Together AI: ingest, embed, store, retrieve, and converse.
The approach is deliberately modular, swap FAISS for Chroma, trade the 80 M-parameter embedder for Together’s larger multilingual model, or plug in a reranker without touching the rest of the pipeline.
What remains constant is the convenience of a unified Together AI backend: fast, affordable embeddings, chat models tuned for instruction following, and a generous free tier that makes experimentation painless.
Use this template to bootstrap an internal knowledge assistant, a documentation bot for customers, or a personal research aide.
Check out the Colab Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.
Asif RazzaqWebsite | + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc..
As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good.
His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience.
The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Agent-Based" style="color: #0066cc;">https://www.marktechpost.com/author/6flvq/Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue LocalizationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A" style="color: #0066cc;">https://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaXAsif Razzaqhttps://www.marktechpost.com/author/6flvq/OpenAI" style="color: #0066cc;">https://www.marktechpost.com/author/6flvq/OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in HealthcareAsif Razzaqhttps://www.marktechpost.com/author/6flvq/PrimeIntellect" style="color: #0066cc;">https://www.marktechpost.com/author/6flvq/PrimeIntellect Releases INTELLECT-2: A 32B Reasoning Model Trained via Distributed Asynchronous Reinforcement Learning

Source: https://www.marktechpost.com/2025/05/14/step-by-step-guide-to-build-a-fast-semantic-search-and-rag-qa-engine-on-web-scraped-data-using-together-ai-embeddings-faiss-retrieval-and-langchain/" style="color: #0066cc;">https://www.marktechpost.com/2025/05/14/step-by-step-guide-to-build-a-fast-semantic-search-and-rag-qa-engine-on-web-scraped-data-using-together-ai-embeddings-faiss-retrieval-and-langchain/
#stepbystep #guide #build #fast #semantic #search #and #rag #engine #webscraped #data #using #together #embeddings #faiss #retrieval #langchain

A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain
In this tutorial, we lean hard on Together AI’s growing ecosystem to show how quickly we can turn unstructured text into a question-answering service that cites its sources. We’ll scrape a handful of live web pages, slice them into coherent chunks, and feed those chunks to the togethercomputer/m2-bert-80M-8k-retrieval embedding model. Those vectors land in a FAISS index for millisecond similarity search, after which a lightweight ChatTogether model drafts answers that stay grounded in the retrieved passages. Because Together AI handles embeddings and chat behind a single API key, we avoid juggling multiple providers, quotas, or SDK dialects. !pip -q install --upgrade langchain-core langchain-community langchain-together faiss-cpu tiktoken beautifulsoup4 html2text This quiet (-q) pip command upgrades and installs everything the Colab RAG needs. It pulls core LangChain libraries plus the Together AI integration, FAISS for vector search, token-handling with tiktoken, and lightweight HTML parsing via beautifulsoup4 and html2text, ensuring the notebook runs end-to-end without additional setup. import os, getpass, warnings, textwrap, json if "TOGETHER_API_KEY" not in os.environ: os.environ["TOGETHER_API_KEY"] = getpass.getpass("🔑 Enter your Together API key: ") We check whether the TOGETHER_API_KEY environment variable is already set; if not, it securely prompts us for the key with getpass and stores it in os.environ. The rest of the notebook can call Together AI’s API without hard‑coding secrets or exposing them in plain text by capturing the credentials once per runtime. from langchain_community.document_loaders import WebBaseLoader URLS = [ "https://python.langchain.com/docs/integrations/text_embedding/together/", "https://api.together.xyz/", "https://together.ai/blog" ] raw_docs = WebBaseLoader(URLS).load() WebBaseLoader fetches each URL, strips boilerplate, and returns LangChain Document objects containing the clean page text plus metadata. By passing a list of Together-related links, we immediately collect live documentation and blog content that will later be chunked and embedded for semantic search. from langchain.text_splitter import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100) docs = splitter.split_documents(raw_docs) print(f"Loaded {len(raw_docs)} pages → {len(docs)} chunks after splitting.") RecursiveCharacterTextSplitter slices every fetched page into ~800-character segments with a 100-character overlap so contextual clues aren’t lost at chunk boundaries. The resulting list docs holds these bite-sized LangChain Document objects, and the printout shows how many chunks were produced from the original pages, essential prep for high-quality embedding. from langchain_together.embeddings import TogetherEmbeddings embeddings = TogetherEmbeddings( model="togethercomputer/m2-bert-80M-8k-retrieval" ) from langchain_community.vectorstores import FAISS vector_store = FAISS.from_documents(docs, embeddings) Here we instantiate Together AI’s 80 M-parameter m2-bert retrieval model as a drop-in LangChain embedder, then feed every text chunk into it while FAISS.from_documents builds an in-memory vector index. The resulting vector store supports millisecond-level cosine searches, turning our scraped pages into a searchable semantic database. from langchain_together.chat_models import ChatTogether llm = ChatTogether( model="mistralai/Mistral-7B-Instruct-v0.3", temperature=0.2, max_tokens=512, ) ChatTogether wraps a chat-tuned model hosted on Together AI, Mistral-7B-Instruct-v0.3 to be used like any other LangChain LLM. A low temperature of 0.2 keeps answers grounded and repeatable, while max_tokens=512 leaves room for detailed, multi-paragraph responses without runaway cost. from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vector_store.as_retriever(search_kwargs={"k": 4}), return_source_documents=True, ) RetrievalQA stitches the pieces together: it takes our FAISS retriever (returning the top 4 similar chunks) and feeds those snippets into the llm using the simple “stuff” prompt template. Setting return_source_documents=True means each answer will return with the exact passages it relied on, giving us instant, citation-ready Q-and-A. QUESTION = "How do I use TogetherEmbeddings inside LangChain, and what model name should I pass?" result = qa_chain(QUESTION) print("n🤖 Answer:n", textwrap.fill(result['result'], 100)) print("n📄 Sources:") for doc in result['source_documents']: print(" •", doc.metadata['source']) Finally, we send a natural-language query through the qa_chain, which retrieves the four most relevant chunks, feeds them to the ChatTogether model, and returns a concise answer. It then prints the formatted response, followed by a list of source URLs, giving us both the synthesized explanation and transparent citations in one shot. Output from the Final Cell In conclusion, in roughly fifty lines of code, we built a complete RAG loop powered end-to-end by Together AI: ingest, embed, store, retrieve, and converse. The approach is deliberately modular, swap FAISS for Chroma, trade the 80 M-parameter embedder for Together’s larger multilingual model, or plug in a reranker without touching the rest of the pipeline. What remains constant is the convenience of a unified Together AI backend: fast, affordable embeddings, chat models tuned for instruction following, and a generous free tier that makes experimentation painless. Use this template to bootstrap an internal knowledge assistant, a documentation bot for customers, or a personal research aide. Check out the Colab Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Asif RazzaqWebsite | + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue LocalizationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaXAsif Razzaqhttps://www.marktechpost.com/author/6flvq/OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in HealthcareAsif Razzaqhttps://www.marktechpost.com/author/6flvq/PrimeIntellect Releases INTELLECT-2: A 32B Reasoning Model Trained via Distributed Asynchronous Reinforcement Learning Source: https://www.marktechpost.com/2025/05/14/step-by-step-guide-to-build-a-fast-semantic-search-and-rag-qa-engine-on-web-scraped-data-using-together-ai-embeddings-faiss-retrieval-and-langchain/ #stepbystep #guide #build #fast #semantic #search #and #rag #engine #webscraped #data #using #together #embeddings #faiss #retrieval #langchain

WWW.MARKTECHPOST.COM

A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain

In this tutorial, we lean hard on Together AI’s growing ecosystem to show how quickly we can turn unstructured text into a question-answering service that cites its sources. We’ll scrape a handful of live web pages, slice them into coherent chunks, and feed those chunks to the togethercomputer/m2-bert-80M-8k-retrieval embedding model. Those vectors land in a FAISS index for millisecond similarity search, after which a lightweight ChatTogether model drafts answers that stay grounded in the retrieved passages. Because Together AI handles embeddings and chat behind a single API key, we avoid juggling multiple providers, quotas, or SDK dialects. !pip -q install --upgrade langchain-core langchain-community langchain-together faiss-cpu tiktoken beautifulsoup4 html2text This quiet (-q) pip command upgrades and installs everything the Colab RAG needs. It pulls core LangChain libraries plus the Together AI integration, FAISS for vector search, token-handling with tiktoken, and lightweight HTML parsing via beautifulsoup4 and html2text, ensuring the notebook runs end-to-end without additional setup. import os, getpass, warnings, textwrap, json if "TOGETHER_API_KEY" not in os.environ: os.environ["TOGETHER_API_KEY"] = getpass.getpass("🔑 Enter your Together API key: ") We check whether the TOGETHER_API_KEY environment variable is already set; if not, it securely prompts us for the key with getpass and stores it in os.environ. The rest of the notebook can call Together AI’s API without hard‑coding secrets or exposing them in plain text by capturing the credentials once per runtime. from langchain_community.document_loaders import WebBaseLoader URLS = [ "https://python.langchain.com/docs/integrations/text_embedding/together/", "https://api.together.xyz/", "https://together.ai/blog" ] raw_docs = WebBaseLoader(URLS).load() WebBaseLoader fetches each URL, strips boilerplate, and returns LangChain Document objects containing the clean page text plus metadata. By passing a list of Together-related links, we immediately collect live documentation and blog content that will later be chunked and embedded for semantic search. from langchain.text_splitter import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100) docs = splitter.split_documents(raw_docs) print(f"Loaded {len(raw_docs)} pages → {len(docs)} chunks after splitting.") RecursiveCharacterTextSplitter slices every fetched page into ~800-character segments with a 100-character overlap so contextual clues aren’t lost at chunk boundaries. The resulting list docs holds these bite-sized LangChain Document objects, and the printout shows how many chunks were produced from the original pages, essential prep for high-quality embedding. from langchain_together.embeddings import TogetherEmbeddings embeddings = TogetherEmbeddings( model="togethercomputer/m2-bert-80M-8k-retrieval" ) from langchain_community.vectorstores import FAISS vector_store = FAISS.from_documents(docs, embeddings) Here we instantiate Together AI’s 80 M-parameter m2-bert retrieval model as a drop-in LangChain embedder, then feed every text chunk into it while FAISS.from_documents builds an in-memory vector index. The resulting vector store supports millisecond-level cosine searches, turning our scraped pages into a searchable semantic database. from langchain_together.chat_models import ChatTogether llm = ChatTogether( model="mistralai/Mistral-7B-Instruct-v0.3", temperature=0.2, max_tokens=512, ) ChatTogether wraps a chat-tuned model hosted on Together AI, Mistral-7B-Instruct-v0.3 to be used like any other LangChain LLM. A low temperature of 0.2 keeps answers grounded and repeatable, while max_tokens=512 leaves room for detailed, multi-paragraph responses without runaway cost. from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vector_store.as_retriever(search_kwargs={"k": 4}), return_source_documents=True, ) RetrievalQA stitches the pieces together: it takes our FAISS retriever (returning the top 4 similar chunks) and feeds those snippets into the llm using the simple “stuff” prompt template. Setting return_source_documents=True means each answer will return with the exact passages it relied on, giving us instant, citation-ready Q-and-A. QUESTION = "How do I use TogetherEmbeddings inside LangChain, and what model name should I pass?" result = qa_chain(QUESTION) print("n🤖 Answer:n", textwrap.fill(result['result'], 100)) print("n📄 Sources:") for doc in result['source_documents']: print(" •", doc.metadata['source']) Finally, we send a natural-language query through the qa_chain, which retrieves the four most relevant chunks, feeds them to the ChatTogether model, and returns a concise answer. It then prints the formatted response, followed by a list of source URLs, giving us both the synthesized explanation and transparent citations in one shot. Output from the Final Cell In conclusion, in roughly fifty lines of code, we built a complete RAG loop powered end-to-end by Together AI: ingest, embed, store, retrieve, and converse. The approach is deliberately modular, swap FAISS for Chroma, trade the 80 M-parameter embedder for Together’s larger multilingual model, or plug in a reranker without touching the rest of the pipeline. What remains constant is the convenience of a unified Together AI backend: fast, affordable embeddings, chat models tuned for instruction following, and a generous free tier that makes experimentation painless. Use this template to bootstrap an internal knowledge assistant, a documentation bot for customers, or a personal research aide. Check out the Colab Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Asif RazzaqWebsite | + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue LocalizationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaXAsif Razzaqhttps://www.marktechpost.com/author/6flvq/OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in HealthcareAsif Razzaqhttps://www.marktechpost.com/author/6flvq/PrimeIntellect Releases INTELLECT-2: A 32B Reasoning Model Trained via Distributed Asynchronous Reinforcement Learning

·372 Views

Please log in to like, share and comment!
The Verge Compartió un vínculo

2025-05-14 07:31:53 ·

A lofi journaling kit for the digital age
I recently took up travel journaling as an honest alternative to performative social media.
My kit cobbles together the best pen, paper, and photo printer I could find to document vanlife adventures for my spawn and some version of my future self that I’ve yet to meet.My father left behind a typewritten memoir that I’ve returned to again and again since his passing so many years ago.
Oh, how I wish there was a hand-written version instead of an impersonal Microsoft Word file — his all-capped lettering forged by a career as an engineer, replete with scribbles and smears made by his ever-callused right hand.
Even better if those pages had been adorned with photos of the people, places, and things he wanted me to see.My journaling kit consists of three main parts: a Pilot V7 pen, a notebook from the Traveler’s Company, and a Fujifilm Instax Mini Link 3 photo printer.
The journal can be modified to your whims.
Photo by Thomas Ricker / The VergeI opted for this transparent pocket to carry tape and other journaling accessories.
Photo by Thomas Ricker / The VergeRubber bands allow you to add additional inserts.
I carry two notebooks in mine.
Photo by Thomas Ricker / The VergeI also opted for this pen holder to keep everything together.
Photo by Thomas Ricker / The VergeThe pen choice was easy — I just listened to Liz.
I value her opinion over Sam Altman’s, so the first thing I did was buy a pack of four Pilot V7 pens for less than $10.The paper journal was a bit tougher decision and resulted in a few false starts.
I should have known better than to cheap out on something purchased from Amazon.
After being disappointed by a few different “leather” covers and mismatched paper inserts, I decided to visit a physical office-supply store to press flesh to actual product.
It was there that I reveled in the discovery of journals from Japan’s Traveller’s Company.Traveller’s Company makes the leather-bound travel journals your mind likely conjures when considering the topic.
The rough-cut leather cover is made by hand in Thailand, while the paper notebook inserts are made in Japan.
The sound, feel, and smell that comes with scratching ink into this journal can bring on a meditative bliss.Small, but not too small.I purchased the regular-sized Traveler’s notebook for $55, which includes the leather cover, blank no-line notebook, cotton bag, and spare rubber band.
I don’t use the bag because I want the cover to be as patinated as my worn face when time runs out.
I also added a $16 pen clip, a second $5.50 blank notebook, a pack of $6.40 connecting bands, and a $9.20 transparent zipper case where I carry a roll of tape, extra retention bands, and miscellaneous mementos like receipts and ticket stubs.
The notebook measures 4.8(H) x 3.5(W) x 0.15(D) inches and is entirely concealed by the 5.2 x 3.8-inch cover.
It’s small, but not so small that it can be easily lost inside a cluttered van.
And it’s large enough that the two open pages can easily hold a single day’s written entry, including a miniature photograph or two.Fujifilm’s $99 Instax mini link 3 is perhaps my favorite journaling accessory.
I got the idea to print out photos alongside my journal entries from Charles Liu’s YouTube channel.
It takes about two minutes to power on the rechargeable device, find a picture that best represents my day in my iPhone’s photo library, and print it out wirelessly.
A twin pack of replacement film that’s good for 20 pics costs $15.
Each 62 × 46mm photo saves me a thousand words of written text, they say, reducing each night’s journaling session to about 5 to 10 minutes.
A holy union of analog and digital.All in, this kit cost me $216 and change, and it’s worth every penny.
It makes journaling so easy and rewarding that it’s something I look forward to — not dread — at the end of almost every travel day.
I even miss the ritual after returning home.
Solid indicators that I’ve created a journaling solution I’ll stick with, hopefully resulting in an intimate library of notebooks that chronicles my earthly journey.Photos by Thomas Ricker / The VergeSee More:
Source: https://www.theverge.com/reviews/666546/modern-travellers-journal-instax-photo-review" style="color: #0066cc;">https://www.theverge.com/reviews/666546/modern-travellers-journal-instax-photo-review
#lofi #journaling #kit #for #the #digital #age

A lofi journaling kit for the digital age
I recently took up travel journaling as an honest alternative to performative social media. My kit cobbles together the best pen, paper, and photo printer I could find to document vanlife adventures for my spawn and some version of my future self that I’ve yet to meet.My father left behind a typewritten memoir that I’ve returned to again and again since his passing so many years ago. Oh, how I wish there was a hand-written version instead of an impersonal Microsoft Word file — his all-capped lettering forged by a career as an engineer, replete with scribbles and smears made by his ever-callused right hand. Even better if those pages had been adorned with photos of the people, places, and things he wanted me to see.My journaling kit consists of three main parts: a Pilot V7 pen, a notebook from the Traveler’s Company, and a Fujifilm Instax Mini Link 3 photo printer. The journal can be modified to your whims. Photo by Thomas Ricker / The VergeI opted for this transparent pocket to carry tape and other journaling accessories. Photo by Thomas Ricker / The VergeRubber bands allow you to add additional inserts. I carry two notebooks in mine. Photo by Thomas Ricker / The VergeI also opted for this pen holder to keep everything together. Photo by Thomas Ricker / The VergeThe pen choice was easy — I just listened to Liz. I value her opinion over Sam Altman’s, so the first thing I did was buy a pack of four Pilot V7 pens for less than $10.The paper journal was a bit tougher decision and resulted in a few false starts. I should have known better than to cheap out on something purchased from Amazon. After being disappointed by a few different “leather” covers and mismatched paper inserts, I decided to visit a physical office-supply store to press flesh to actual product. It was there that I reveled in the discovery of journals from Japan’s Traveller’s Company.Traveller’s Company makes the leather-bound travel journals your mind likely conjures when considering the topic. The rough-cut leather cover is made by hand in Thailand, while the paper notebook inserts are made in Japan. The sound, feel, and smell that comes with scratching ink into this journal can bring on a meditative bliss.Small, but not too small.I purchased the regular-sized Traveler’s notebook for $55, which includes the leather cover, blank no-line notebook, cotton bag, and spare rubber band. I don’t use the bag because I want the cover to be as patinated as my worn face when time runs out. I also added a $16 pen clip, a second $5.50 blank notebook, a pack of $6.40 connecting bands, and a $9.20 transparent zipper case where I carry a roll of tape, extra retention bands, and miscellaneous mementos like receipts and ticket stubs. The notebook measures 4.8(H) x 3.5(W) x 0.15(D) inches and is entirely concealed by the 5.2 x 3.8-inch cover. It’s small, but not so small that it can be easily lost inside a cluttered van. And it’s large enough that the two open pages can easily hold a single day’s written entry, including a miniature photograph or two.Fujifilm’s $99 Instax mini link 3 is perhaps my favorite journaling accessory. I got the idea to print out photos alongside my journal entries from Charles Liu’s YouTube channel. It takes about two minutes to power on the rechargeable device, find a picture that best represents my day in my iPhone’s photo library, and print it out wirelessly. A twin pack of replacement film that’s good for 20 pics costs $15. Each 62 × 46mm photo saves me a thousand words of written text, they say, reducing each night’s journaling session to about 5 to 10 minutes. A holy union of analog and digital.All in, this kit cost me $216 and change, and it’s worth every penny. It makes journaling so easy and rewarding that it’s something I look forward to — not dread — at the end of almost every travel day. I even miss the ritual after returning home. Solid indicators that I’ve created a journaling solution I’ll stick with, hopefully resulting in an intimate library of notebooks that chronicles my earthly journey.Photos by Thomas Ricker / The VergeSee More: Source: https://www.theverge.com/reviews/666546/modern-travellers-journal-instax-photo-review #lofi #journaling #kit #for #the #digital #age

WWW.THEVERGE.COM

A lofi journaling kit for the digital age

I recently took up travel journaling as an honest alternative to performative social media. My kit cobbles together the best pen, paper, and photo printer I could find to document vanlife adventures for my spawn and some version of my future self that I’ve yet to meet.My father left behind a typewritten memoir that I’ve returned to again and again since his passing so many years ago. Oh, how I wish there was a hand-written version instead of an impersonal Microsoft Word file — his all-capped lettering forged by a career as an engineer, replete with scribbles and smears made by his ever-callused right hand. Even better if those pages had been adorned with photos of the people, places, and things he wanted me to see.My journaling kit consists of three main parts: a Pilot V7 pen, a notebook from the Traveler’s Company, and a Fujifilm Instax Mini Link 3 photo printer. The journal can be modified to your whims. Photo by Thomas Ricker / The VergeI opted for this transparent pocket to carry tape and other journaling accessories. Photo by Thomas Ricker / The VergeRubber bands allow you to add additional inserts. I carry two notebooks in mine. Photo by Thomas Ricker / The VergeI also opted for this pen holder to keep everything together. Photo by Thomas Ricker / The VergeThe pen choice was easy — I just listened to Liz. I value her opinion over Sam Altman’s, so the first thing I did was buy a pack of four Pilot V7 pens for less than $10.The paper journal was a bit tougher decision and resulted in a few false starts. I should have known better than to cheap out on something purchased from Amazon. After being disappointed by a few different “leather” covers and mismatched paper inserts, I decided to visit a physical office-supply store to press flesh to actual product. It was there that I reveled in the discovery of journals from Japan’s Traveller’s Company.Traveller’s Company makes the leather-bound travel journals your mind likely conjures when considering the topic. The rough-cut leather cover is made by hand in Thailand, while the paper notebook inserts are made in Japan. The sound, feel, and smell that comes with scratching ink into this journal can bring on a meditative bliss.Small, but not too small.I purchased the regular-sized Traveler’s notebook for $55, which includes the leather cover, blank no-line notebook, cotton bag, and spare rubber band. I don’t use the bag because I want the cover to be as patinated as my worn face when time runs out. I also added a $16 pen clip, a second $5.50 blank notebook, a pack of $6.40 connecting bands, and a $9.20 transparent zipper case where I carry a roll of tape, extra retention bands, and miscellaneous mementos like receipts and ticket stubs. The notebook measures 4.8(H) x 3.5(W) x 0.15(D) inches and is entirely concealed by the 5.2 x 3.8-inch cover. It’s small, but not so small that it can be easily lost inside a cluttered van. And it’s large enough that the two open pages can easily hold a single day’s written entry, including a miniature photograph or two.Fujifilm’s $99 Instax mini link 3 is perhaps my favorite journaling accessory. I got the idea to print out photos alongside my journal entries from Charles Liu’s YouTube channel. It takes about two minutes to power on the rechargeable device, find a picture that best represents my day in my iPhone’s photo library, and print it out wirelessly. A twin pack of replacement film that’s good for 20 pics costs $15. Each 62 × 46mm photo saves me a thousand words of written text, they say, reducing each night’s journaling session to about 5 to 10 minutes. A holy union of analog and digital.All in, this kit cost me $216 and change, and it’s worth every penny. It makes journaling so easy and rewarding that it’s something I look forward to — not dread — at the end of almost every travel day. I even miss the ritual after returning home. Solid indicators that I’ve created a journaling solution I’ll stick with, hopefully resulting in an intimate library of notebooks that chronicles my earthly journey.Photos by Thomas Ricker / The VergeSee More:

·250 Views

Please log in to like, share and comment!

Participar

Idiomas

A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX

The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated

This Lenovo Laptop (15.6″, 32GB RAM, 1TB Storage) Is Now 73% Off, Amazon Is Selling It at a Loss

Master the LLM Stack in 60+ hours — learn, code, ship, and certify

Google I/O 2025: What to expect including Gemini AI, Android 16 updates, Android XR and more

A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain

A lofi journaling kit for the digital age