|

Using AI to generate data…but double check!

I think ChatGPT was announced at the end of November, so it has been a year since its announcement. The biggest changes I’m seeing is Microsoft’s aggressive movement to include AI in a lot of its product. Some of its new products are Bing AI, Image Creator, Designer, CoPilot. They may have also revamped the Data Analysis tool in Excel, but I’m not sure.

And Microsoft wasn’t the only one: LinkedIn is starting to embed some AI features which I have yet to try out, although I do want to. Canva has embedded some AI features such as Magic Edit (I think that’s what it is called) and its own version of Image Creator (I don’t remember what it is officially called).

As far as work or jobs though, I don’t think we have seen the changes yet, largely because companies have been cautious for security and privacy reasons. You don’t want to put sensitive data through ChatGPT because the LLM will then retain it as part of its training materials. I believe we will start the changes once companies have successfully created their own LLMs designed specifically around their internal data.

Right now I really can’t use ChatGPT for really interesting things, but I have played around with it. One thing I’ve done is asked it to create a list of 316 fake names.

I’m working on a project and I do not want to include real names as part of its data, but the idea of creating 316 unique names just was daunting. So, I asked Bing AI to create a list and it did it pretty quickly – much more quickly than I would have created a list.

By the way, it provided me more than 316 fake names; it gave me 352. Okay, I’ll take it. The more the better for me.

I deployed that fake list of names in the project UNIQUE, RANDARRAY, and XLOOKUP.

A little detour on the mechanics of doing this is found in the section below.

So, maybe I should first describe how I arrived a randomly assigning fake names from a list to real names. I’m sure there are other ways of generating a method of replacing real names with fake names or whatever you are trying to do.

Building an array of random numbers

First there is RANDARRAY where you tell the formula how many rows and how many columns there will be for randomly created numbers. In my case, I wanted 352 rows and only 1 column. In the formula, I will also denote the minimum and the maximum, which in my case would be 1 and 352. Finally, I used TRUE for the fifth argument for integer rather than decimal number.

= RANDARRAY(352, 1, 1, 352, TRUE)

The first argument is the rows, the second is for columns, the third is the minimum, the fourth is the maximum and the last one is for integer.

I end up with a column of numbers ranging from 1 to 352 in random order. The problem is the result may have some numbers repeated again, rather than each number once.

But at least I’m getting a random list of numbers.

Making that array contain unique numbers

To force the RANDARRAY to create a list of unique numbers, I apply UNIQUE.

= UNIQUE(RANDARRAY(352, 1, 1, 352, TRUE))

Except, there is a little issue. I find that this combination of functions is rather finicky so rather than stating the number of rows to be 352, I make it huge such as 3352, so that way I can at least get a “spill” that covers all of the real names. At least, that is what seems to work.

Number the list of fake names

The list of fake names was numbered sequentially.

List of real names has the UNIQUE-RANDARRAY formula

The list of real names will have the unique and randarray functions applied to it and Excel will “spill” the list of random numbers next to the list of real names. You just enter the formula once and the RANDARRAY portion will generate an array. Just make sure you situate the formula properly so the random numbers align with the real names.

Use XLOOKUP to look up numbers against fake names list

XLOOKUP allows you to look up the number next to the real name and find that number applied to the fake name list.

To make that last sentence a little more understandable, here’s an image of an Excel file with real names listed on the left-hand side and fake names on the right. The image is blown up to make it a bit more readable but it is still blurry. Not sure what’s going on here.

The real names are under column D and the fake names are under P.

The random sequencing of numbers against the real names are under column E. Note that instead of listing 4 rows, I put in 1000. That’s the little fluke that made this work.

Then under column J you will see the XLOOKUP formula that finds corresponding random generated numbers found in E under column O and then pull the fake name with that associated number. If you are familiar with VLOOKUP, you know what I’m talking about.

All of this is aside of what I wanted to talk about.

These formulas were kind of new and creating some problems that I started to do some checking that the formulas were really creating unique numbers so that I can end up with a one-to-one relationship between the real name and the fake name.

But something was not working. It turns out the AI gave me duplicate names!

So, remember, never take what the AI gives you at face value. Always double check it answers. While it’s amazing and can produce results quickly, you do have to scrutinize it and adjust it.

Similar Posts