| A community of more than 1,600,000 database professionals and growing |
| | Unstructured Data Is unstructured data a bad term? I saw some data professionals complaining about this, saying all data is structured. That's usually true. A CSV, even a ragged one has structure. XML and JSON have structure, even if it might vary node to node. Certainly our relational tables are structured and some formats can be rigidly mandated between organizations (like EDI). Even data in PDF, Word, MP3, MP4 or other audio/video mediums is structured in that we know the format. Given that, is it a misnomer to use the term, unstructured, when describing flexible formats, such as XML? Is it OK for a PDF? I have had a presentation called Unstructured Data in SQL Server. This is primarily about FileStream, FileTable, and searching those objects. In the talk, I classify data in known formats as structured. These would be SQL Server tables and similar objects. At any point in time, we know what all data in the table looks like, even though we can have NULLs or missing data in rows. I call XML and JSON semi-structured formats. We can certainly determine the format for any node or section, but we wouldn't know without querying or examining the data. It's semi-structured in that there is a hierarchy, but the structured from section to section (essentially row to row) can vary. There can even be depths to hierarchies that vary. In many ways, that makes these great formats for flexibility in data exchange. I tend to view data in Word, PDF, MP4s, as unstructured. We don't necessarily know where the data is, or how to separate it. We can get pages in Word or PDF, but those can vary and don't necessarily help us extract information. They are XML, but the XML tags don't relate to the content, unlike many other XML documents. Scenes or tracks in audio/video files might be separators, but those aren't necessarily helpful in gathering information. Instead, we need other tools that can help deal with that data, finding words, concepts, or more inside of the binary stream. I like the term unstructured data because it helps me understand where the information is. While the tables in a database might be full of nonsensical information in some rows, or be poorly designed with data combined into text fields, at least I know where the fields are. Actually, in that case, I'd argue the data in varchar(max) text fields is really unstructured. You might disagree, but give me a better term to describe there the information is stored in a data format. Steve Jones from SQLServerCentral.comJoin the debate, and respond to today's editorial on the forums |
| The Voice of the DBA Podcast Listen to the MP3 Audio ( 3.7MB) podcast or subscribe to the feed at iTunes and Libsyn. The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. | |
|
|
| ADVERTISEMENT | | SQL Clone: Now supporting databases up to 64TB Create copies of production databases and SQL backups in seconds and save up to 99% of disk space using SQL Clone. Redgate’s new tool removes much of the time and resource needed to create and manage database copies, allowing teams to work on local environments to develop, test and diagnose issues faster. Try it free |
| | The industry standard for comparing and deploying SQL Server database schemas Trusted by 71% of Fortune 100 companies, SQL Compare is the fastest way to compare changes, and create and deploy error-free scripts in minutes. Plus you can easily find and fix errors caused by database differences. Download your free trial |
|
|
|
| | | Frank A. Banin from SQLServerCentral.com A look at SQL-On-Hadoop systems like PolyBase, Hive, Spark SQL in the context Distributed Computing Principles and new Big Data system design approach like the Lambda Architecture. More » |
| This quick tip will review how SQL Monitor 7 has incorporated SQL Code Guard’s built-in set of Performance Rules for static code analysis. These rules are designed to highlight SQL syntax that could potential cause performance problems, and so indicate ways to improve the overall quality and performance of the workload, over time. More » |
| Kenneth Fisher from SQLServerCentral Blogs Temporal tables are one of those new (2016+) cool features that recently came across my desk. Basically, a temporal table... More » |
| Andy Galbraith from SQLServerCentral Blogs https://memegenerator.net/img/instances/400x/59507872/you-get-a-deadlock-everybody-gets-a-deadlock.jpgWe all get them at one point or another - deadlocks. I have Object 1 and need Object 2, you have Object... More » |
|
|
| | Today's Question (by Steve Jones): In CosmosDB (as of Oct 2017), I have the chocie of a few different types of data APIs. Which of these is not available in CosmosDB? |
Think you know the answer? Click here, and find out if you are right. We keep track of your score to give you bragging rights against your peers. This question is worth 1 point in this category: CosmosDB. We'd love to give you credit for your own question and answer. To submit a QOTD, simply log in to the Contribution Center. |
|
|
| |
ADVERTISEMENT | Design and configure SQL Server instances and databases in support of high-throughput applications that are mission-critical and provide consistent response times in the face of variations in user numbers and query volumes. Learn to configure SQL Server and design your databases to support a given instance and workload. Pick up your copy of this great book today at Amazon today. |
|
|
|
|
|
| Yesterday's Question of the Day |
| Yesterday's Question (by Steve Jones): I have a file containing some data for the top NFL Quarterbacks from 2016. The data is stored in a tab separated format in the file 2016QB.tsv. How can I read this in? Answer: read.table("2016QB.tsv", sep="\t") Explanation: The read.table function will read data from a file. The SEP column is used to list the type of separator. The \t is a tab separator. Ref: Reading a tab separated file - click here » Discuss this question and answer on the forums |
|
|
| Database Pros Who Need Your Help |
| Here's a few of the new posts today on the forums. To see more, visit the forums. script to verify index fragmentation - Dear all, Can someone help me with t-sql code on how can I see the fragmentation of my indexes (relevant ones),... Create maintenance Jobs (Data Warehouse) - Dear all, I was requested to create a maintance plan keep our data warehouses working properly. This means that I will create... Nested insert statements problem - Hello all. fairly new to SQL. but not that new. I have 3 stored procedures. The first one has a statement like... Help with date ranges - Hi all, I'd like to know if anyone can help me with a good suggestion on how to achieve this. I have... How to only select 11pm to 7am in DateTime (many months of data) DDM&DDL included :) - Looking to run a report for a few years but only between 11pm and 7am. CREATE TABLE mytable( i_ticket_id INTEGER NOT... Set up job to convert word document to pdf - I'm working on a SQL job to transfer some data along with some word documents. After copying the word documents... Trying to speed up this 'GetWorkingDays' function. - I'm trying to redesign one of our working days functions. The idea is to calculate weekends (easy enough) but holidays... Out of memory exception - If I run DBCC MEMORYSTATUS, it fails to finish and throws this error. An error occurred while executing batch. Error message... User Not able to access Schema owned tables - Hi Experts, Need a Small information about user access. one of our users is in a group and the group is... SSIS : "The value could not be converted because of a potential loss of data.". - Dear all, I have a data conversion task and I am getting the following error message: while converting column "VALUE_IN_FUND_CURR" (488) to... Compare 2 rows using tsql code - Hi All, A questions which is there with me a long long time. Comparing 2 rows in a table. How... Select vaue column for one date and another for another date - Hi Masters, how can I select a value from a column for a date and another value from another column... Parameter in sp_delete_jobsteplog don't work - Hi, in our organisation we normally set up maintenance plans for backups and so on. As we cannot deploy packages... How to continue our SQL agent scheduled daily backup in Azure? - Hi, we just migrated our SQL database to Azure. However, I have no idea how to continue our SQL agent scheduled... Getting started with Powershell - Hi all I'm just getting started with PowerShell and following the series by Ben Miller here. I'm trying to do things... Execute Process Task: How SSIS loads executable files at runtime - Please help me understand how SSIS loads executable files at runtime. Does it copy the file to RAM at runtime... Design considerations for a tabular model - I have a star schema already in existence (SQL Server fact table and numerous dimension tables). For a given tabular... Table Index - I have a potential table CREATE TABLE dbo.Preferences ( EmailAddress VARCHAR(100) NOT NULL, ChannelID INTEGER NOT NULL, ContactTypeID INTEGER NOT NULL, AllowContact BIT NOT NULL, GDPR... Query taking long time - Hi, The below query is taking around 1000 Seconds. SELECT ar.MOVE_QTY,ar.DM12,ar.AL_OR,ar.OR_CODE,ar.CAI_CODE,ar.CAI_DESCRIPTION,ar.CST_CODE,ar.AL_NO AL_NO,ar.OPEN_QTY,ar.WH_CODE,ar.ETA_DATE, ar.CATEGORY,ar.AL_DATE,ar.OD_CODE, ar.SHIPTO_CODE,ar.AL_LINE AL_LINE,ar.ORIG_QTY,ar.CNCL_QTY,b.SHIP_QTY SHIP_QTY,ar.LOCAL_CODE,0 AL_TXT_LINE,b.AVIEXP_NO, b.UC_NUMBER, b.AVIEXP_DAT Project Plan - I'm looking for a decent Project Plan (in any format) for moving from SQL Server 2000 to a SQL Server... |
|
| This email has been sent to [email protected]. To be removed from this list, please click here. If you have any problems leaving the list, please contact the [email protected]. | This newsletter was sent to you because you signed up at SQLServerCentral.com. Feel free to forward this to any colleagues that you think might be interested. If you have received this email from a colleague, you can register to receive it here. | This transmission is ©2017 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved. Contact: [email protected] |
|
|