按数字排序字符串列表并过滤重复项?

2020-02-15 python list sorting duplicates

给出以下格式的字符串列表:

[
    "464782,-100,4,3,1,100,0,0"
    "465042,-166.666666666667,4,3,1,100,0,0",
    "465825,-250.000000000001,4,3,1,100,0,0",
    "466868,-166.666666666667,4,3,1,100,0,0",
    "467390,-200.000000000001,4,3,1,100,0,0",
    "469999,-100,4,3,1,100,0,0",
    "470260,-166.666666666667,4,3,1,100,0,0",
    "474173,-100,4,3,1,100,0,0",
    "474434,-166.666666666667,4,3,1,100,0,0",
    "481477,-100,4,3,1,100,0,1",
    "531564,259.011439671919,4,3,1,60,1,0",
    "24369,-333.333333333335,4,3,1,100,0,0",
    "21082,410.958904109589,4,3,1,60,1,0",
    "21082,-250,4,3,1,100,0,0",
    "22725,-142.857142857143,4,3,1,100,0,0",
    "23547,-166.666666666667,4,3,1,100,0,0",
    "24369,-333.333333333335,4,3,1,100,0,0",
    "27657,-200.000000000001,4,3,1,100,0,0",
    "29301,-142.857142857143,4,3,1,100,0,0",
    "30123,-166.666666666667,4,3,1,100,0,0",
    "30945,-250,4,3,1,100,0,0",
    "32588,-166.666666666667,4,3,1,100,0,0",
    "34232,-250,4,3,1,100,0,0",
    "35876,-142.857142857143,4,3,1,100,0,0",
    "36698,-166.666666666667,4,3,1,100,0,0",
    "37520,-250,4,3,1,100,0,0",
    "42451,-142.857142857143,4,3,1,100,0,0",
    "43273,-166.666666666667,4,3,1,100,0,0",
]

如何使用python每行中的第一个数字对列表进行排序? 然后,排序后,删除所有重复项(如果有)?

列表的排序标准是每行中第一个逗号之前的数字,始终为整数。

我尝试使用list.sort(),但是这会按词法顺序而不是数字顺序对项目进行排序。

Answers

我会尝试以下两种方法之一:

def sort_list(lis):
    nums = [int(num) if isdigit(num) else float(num) for num in lis]

    nums = list(set(nums))
    nums.sort()

    return [str(i) for i in nums]  # I assumed you wanted them to be strings.

如果lis中的所有项目都不是数字的intsfloats或字符串表示形式,则第一个将引发TypeError 。第二种方法没有这个问题,但是有点奇怪。

def sort_list(lis):
    ints = [int(num) for num in lis if num.isdigit()]
    floats = [float(num) for num in lis if not num.isdigit()]

    nums = ints.copy()
    nums.extend(floats)
    nums = list(set(nums))
    nums.sort()

    return [str(i) for i in nums]  # I assumed you wanted them to be strings.

希望这可以帮助。

您可以为此使用字典。密钥将是第一个逗号之前的数字,而值则是整个字符串。重复项将被消除,但仅存储最后一次出现的特定数字的字符串。

l = ['464782,-100,4,3,1,100,0,0',
'465042,-166.666666666667,4,3,1,100,0,0',
'465825,-250.000000000001,4,3,1,100,0,0',
'466868,-166.666666666667,4,3,1,100,0,0',
'467390,-200.000000000001,4,3,1,100,0,0',
...]

d = {int(s.split(',')[0]) : s for s in l}
result = [d[key] for key in sorted(d.keys())]

你可以试试看

首先,我们需要使用set()删除列表中的重复项

removed_duplicates_list = list(set(listr))

然后我们将字符串列表转换为元组列表

list_of_tuples = [tuple(i.split(",")) for i in removed_duplicates_list]

然后我们使用sort()对其进行排序

list_of_tuples.sort()

以下是完整的代码示例:

listr = [
    "464782,-100,4,3,1,100,0,0"
    "465042,-166.666666666667,4,3,1,100,0,0",
    "465825,-250.000000000001,4,3,1,100,0,0",
    "466868,-166.666666666667,4,3,1,100,0,0",
    "467390,-200.000000000001,4,3,1,100,0,0",
    "469999,-100,4,3,1,100,0,0",
    "470260,-166.666666666667,4,3,1,100,0,0",
    "474173,-100,4,3,1,100,0,0",
    "474434,-166.666666666667,4,3,1,100,0,0",
    "481477,-100,4,3,1,100,0,1",
    "531564,259.011439671919,4,3,1,60,1,0",
    "24369,-333.333333333335,4,3,1,100,0,0",
    "21082,410.958904109589,4,3,1,60,1,0",
    "21082,-250,4,3,1,100,0,0",
    "22725,-142.857142857143,4,3,1,100,0,0",
    "23547,-166.666666666667,4,3,1,100,0,0",
    "24369,-333.333333333335,4,3,1,100,0,0",
    "27657,-200.000000000001,4,3,1,100,0,0",
    "29301,-142.857142857143,4,3,1,100,0,0",
    "30123,-166.666666666667,4,3,1,100,0,0",
    "30945,-250,4,3,1,100,0,0",
    "32588,-166.666666666667,4,3,1,100,0,0",
    "34232,-250,4,3,1,100,0,0",
    "35876,-142.857142857143,4,3,1,100,0,0",
    "36698,-166.666666666667,4,3,1,100,0,0",
    "37520,-250,4,3,1,100,0,0",
    "42451,-142.857142857143,4,3,1,100,0,0",
    "43273,-166.666666666667,4,3,1,100,0,0",
]

removed_duplicates_list = list(set(listr))
list_of_tuples = [tuple(i.split(",")) for i in removed_duplicates_list]
list_of_tuples.sort()
print(list_of_tuples) # the output is a list of tuples

输出:

    [('21082', '-250', '4', '3', '1', '100', '0', '0'),
    ('21082', '410.958904109589', '4', '3', '1', '60', '1', '0'),
    ('22725', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
    ('23547', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('24369', '-333.333333333335', '4', '3', '1', '100', '0', '0'),
    ('27657', '-200.000000000001', '4', '3', '1', '100', '0', '0'),
    ('29301', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
    ('30123', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('30945', '-250', '4', '3', '1', '100', '0', '0'),
    ('32588', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('34232', '-250', '4', '3', '1', '100', '0', '0'),
    ('35876', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
    ('36698', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('37520', '-250', '4', '3', '1', '100', '0', '0'),
    ('42451', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
    ('43273', '-166.666666666667', '4', '3', '1', '100', '0', '0'),  
    ('464782','-100','4','3','1','100','0'),
    ('465042','-166.666666666667','4','3','1','100','0','0'),
    ('465825', '-250.000000000001', '4', '3', '1', '100', '0', '0'),
    ('466868', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('467390', '-200.000000000001', '4', '3', '1', '100', '0', '0'),
    ('469999', '-100', '4', '3', '1', '100', '0', '0'),
    ('470260', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('474173', '-100', '4', '3', '1', '100', '0', '0'),
    ('474434', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
    ('481477', '-100', '4', '3', '1', '100', '0', '1'),
    ('531564', '259.011439671919', '4', '3', '1', '60', '1', '0')]

我希望这会有所帮助。 我将所有列表元素放在一个名为lista.txt的单独文件中 在此示例中,我将从文件中获取列表...我希望组织得更整洁,也可以在python中创建单独的文件,但是您的想法是您需要一张一张地获取列表中的所有元素( 函数或for函数),然后通过检查新项目是否已经存在,是否存在通过将它们添加到临时列表中,然后可以使用.sort()进行采样,因为这样做可以解决问题并使用数字。

# Global variables
file = "lista.txt"
tempList = []

# Logic get items from file
def GetListFromFile(fileName):
    # Local variables
    showDoneMsg = True

    # Try to run this code
    try:
        # Open file and try to read it
        with open(fileName, mode="r") as f:
            # Define line
            line = f.readline()
            # For every line in file
            while line:
                # Get out all end white space (\n, \r)
                item = line.rstrip()

                # Check if this item is not allready in the list
                if item not in tempList:
                    # Append item to a temporar list
                    tempList.append(item)
                # Show me if a itmes allready exist
                else:
                    print("Dublicate >>", item)

                # Go to new line
                line = f.readline()
        # This is optional because is callet automatical
        # but I like to be shore
        f.close()

    # Execptions
    except FileNotFoundError:
        print("ERROR >> File do not exist!")
        showDoneMsg = False

    # Sort the list
    tempList.sort()
    # Show me when is done if file exist
    if showDoneMsg == True:
        print("\n>>> DONE <<<\n")

# Logic show list items
def ShowListItems(thisList):
    if len(thisList) == 0:
        print("Temporary list is empty...")
    else:
        print("This is new items list:")
        for i in thisList:
            print(i)

# Execute function
GetListFromFile(file)
# Testing if items was sorted
ShowListItems(tempList)

输出:

========================= RESTART: D:\Python\StackOverflow\help.py =========================
Dublicate >> 43273,-166.666666666667,4,3,1,100,0,0

>>> DONE <<<

21082,-250,4,3,1,100,0,0
21082,410.958904109589,4,3,1,60,1,0
22725,-142.857142857143,4,3,1,100,0,0
...
474434,-166.666666666667,4,3,1,100,0,0
481477,-100,4,3,1,100,0,1
531564,259.011439671919,4,3,1,60,1,0
>>> 

Related